Community Tech/Wikisource support notes
Notes from Berlin conference, April 23Edit
Talked with Andrea (User:Aubrey) about Wikisource process and tech needs
At the Jerusalem hackathon and following, people have been working on wish #23: Visual editor adapted for Wikisource (T48580).
Structure of Wikisource pagesEdit
Namespaces: Zero, Page, Index
VE is working for namespace 0, but that's not very helpful -- on Wikisource, there's no text on namespace 0, it's just the header and metadata. The important namespace is Page:, which is where the transcription and proofreading is done.
(Note: namespace 0 referred to in these notes as "zero page", bc I don't know what else to call it.)
Zero is the page that you read. Example: Kopal-Kundala. Zero has a header with metadata, then a <pages> tag. The pages tag indicates which pages from the Page namespace are transcluded on the zero you're looking at.
On zero: there are little numbers at the top left -- these are links to the Page.
On a Page -- you can see the scan on the right side, transcription on the left side. The transcription goes through several "Page status" changes, from Empty --> Not proofread --> Proofread --> Validated (or Not proofread --> Problematic --> Proofread --> Validated). Buttons to change the Page status are at the bottom of the edit window.
On each zero page, there's a tab called "Source", this goes to the Index: namespace. (so why doesn't it say Index??) example: Index:Kopal-Kundala.djvu. Index has a guide to the Page status of every Page in the book.
(These may be different on different languages)
Talking with Alex about how to make Wikisource support happen.
Wikisource's structure is very clever and productive, but it's tough to learn and it's built on hacks and templates. It would be great to build a real UI, based on design research. That may be in the cards in the future, but right now, we should concentrate on achievable asks.
The top three Wikisource requests from the Wishlist Survey:
- #23. Visual Editor adapted for Wikisource (T48580) -- currently being worked on by Coren et al, although I believe it's only on namespace 0, rather than the Page namespace, which is where it's needed.
- #25. Tool to use Google OCRs in Indic language Wikisource (T120788) -- currently being worked on by volunteer devs with Niharika. Sheree is talking to Google about getting actual access.
- #35. Better support for djvu files (T120784) -- This is a vague proposal that asks for a number of different things. It's basically saying, look at the djvu system and do something interesting.
Any Wikisource project would probably include an investigation on the Proofread Page extension.
Ideally, if we can work on Proofread Page -- an overall goal is to get more people to upload and proofread, by making the structure easier to understand, and more efficient to use. We'll need to talk to Wikisource users in a bunch of languages to get ideas and input.