Wikisource across projects

This document attempts to provide a unifying workflow of the user experience (ux) when interacting with Wikisource across projects. This is relevant to the three GsoC 2013 grantees and their mentors, and also to gather feedback from the community. The GsoC projects related to this document are:

UploadWizard: Book upload customization
Refactoring of ProofreadPage extension
Improve support for book structures
And also, to a lesser extent, Prototyping inline comments

Scan file upload process

The UploadWizard is being modified by Nazmul Chowdhury (proposal, mentor's notes), with the following goals:

dynamic generation of metadata forms from templates according to user selection: Template:Information (as we already do), Template:Book, Template:Artwork, Template:Photograph, and more if time permits.
import metadata from external trusted open data (CC0) providers (Europeana, DPLA, Internet Archive)
import file+metadata from such providers
[optional] allow direct file+metadata transfers initiated from trusted webpages

Access points

There are several ways the user can access the upload interface:

upload link on Wikisource
upload link on Commons
[future] direct transfer from trusted external websites

The upload link on Wikisource lets the user choose between uploading the file locally on Wikisource or through Commons UploadWizard. The local upload of files on Wikisource is normally discouraged. Thus, this link should be modified to summon the UploadWizard with the (upcoming) "book campaign" parameter (see upload campaigns) and the appropriate language on the Wikisources that want to use this method.

This "book upload" parameter will activate extra metadata fields specific for books. If not accessed through Wikisource, it still will be possible to select a book template on the UploadWizard. There are talks to enable the transfer of external files initiated from trusted sources (i.e. Europeana) directly to Commons. This is still in talks, but the API can be prepared for that. The other option is to integrate tools for importing files from Internet Archive into Commons.

Metadata

For books there are different types of metadata taken into consideration:

Work metadata (properties)
Edition metadata (properties)
File metadata (file source, description, creator, license, categories), currently the closest could be Template:Information

Ideally the Template:Book in Commons should be modified to reflect these metadata types (three colors, or separators) or alternatively there could be three different templates on the same Commons page, since each metadata group will (possibly) become an item in Wikidata and they might be easier to link individually to each template.

[FUTURE] It is not very clear yet how will be the support of Wikidata for Commons (~~soon there should be a proposal~~ proposal: commons:Commons:Wikidata for media info), in any case the template will have to allow to link different data items instead of storing the information as plain text. This might be work to do by a next GsoC.

Upload successful

When the book file has been uploaded successfully, it is the time to offer some options to the user:

Start "Index" page in Wikisource (with a language selector)
View File page in Commons

The UploadWizard should be able to initiate a Wikisource “Index:” page on request with the information provided by the user. On this “Index” page there should be the option to initiate the logical structure of the book(s) contained in the uploaded file. .

[FUTURE] The metadata on the "Index page" will be linked through Wikidata. Changes done in Wikisource will show up in Commons, and vice versa.

Proofreading the uploaded file scan

Text proofreading is done with Extension:ProofreadPage. This extension is currently being updated by Aarti Kumari Dwivedi (proposal). Additionally there will be the need to create a VisualEditor plugin to support text formatting and section labels, which might be done by the VE development team as stated in their FAQ.

Mapping file pages to book pages

The numeration of pages in the scan files is usually mapped using <pagelist /> to the numbering (decimal or roman) printed on the page of the book. An user-friendly way of mapping them could be a button on the Index page that would invoke a two column (file page / book page) window that by typing a number would auto-complete the rows with incremental values. A method will be needed to find out a file page number and its section given a book page number. The Lua module by Alex Brollo might give some idea about how to do this (Wikisource mailing list thread).

Transclusion and section label information

It will be needed to display which part of the page is being transcluded and where is being used that part of the text.

Ideally the section labeling (currently done with the tags: ### section_name ###) should be part of the VisualEditor with some feedback to the user when hovering on the labeled text section (similar to the highlight that happens when on the transcluded page the mouse is hovered on the page number).

Formatting options

Wikisource has higher formatting requirements than Wikipedia. Some common formatting should be supported through a VisualEditor plugin:

Text size, color, font
Ident (gap)
text type: poem, small caps, initial letter.

[TODO] Check with the VE team if they would be deploying these features for WS. Elaborate a wish-list sorted by priority of the formatting needs of Wikisource.

Invoking the book manager

At some point, whether the file has been totally proofread or not, the user will want to start the logical structure(s) that belong to the file scan. Some files contain several books, some books require of several scans, so a universal solution might be difficult. It is proposed, however, that in the simple case that the scan still doesn't have a "book structure", a button allows to start it from the "Index page".

Managing the book

An extension to manage books is being created by Molly White (proposal, notes, request for comments) This extension will take care of creating the book structure and assigning scan files (normally one per book structure, but there can be more) and pages to each section (which is currently done manually using <pages index="filename.djvu" from=X to=X />).

Metadata

The metadata needs are shared across projects, therefore the same data fields should be the same (Work, Edition, File). If possible the code used to generate the forms dynamically from a template should be shared too in Commons and Wikisource.

The main page of the book should also represent the metadata as schema.org/book styled html. User:Aubrey already started mapping the Wikidata properties to schema.org.

Creating the book structure

The user should be able to assign a page range or individual pages to a section, plus specifying a label in case there are section transclusions present on the group of pages. Ideally the range of pages should be queried to find out which labels have users applied during the proofreading and then be able to select the one(s) that apply.

There are some books that have a massive amount of sections (example), for those kind of books it would make more sense to generate automatically the sections from the labels that the user has added to each page (future). There is a script by User:Phe that does this.

View options

Proposed view options:

Standard view: as now, transcluded pages with navigable templates
[future] Print view: all pages in one, using the dumped ocr text if the page has not been started
[future] Scan+Text view: see book2scroll for an example of this
[future] Scan view: it would require adapting the html/javascript Internet Archive BookReader to be used in mediawiki

Export options

Exporting to various file formats is not a part of the main Summer of Code project, but should be supported for the entire book later on. Proposed export options:

epub, odt (already supported)
scan file (supported, but hidden in Commons)
[future] (cached) scan file as pdf
[future] (cached) scan file with merged proofread text

Book actions and progress report

Actions and progress reports are not a part of the main Summer of Code project, but should be supported later on. Proposed managment options:

Watchlist/Unwatchlist all
Rename/remove
Automatic global status book/chapter, with an "exportable" status (yes, no)

Books without supporting scan

There are books and documents that do not have supporting scan:

Wikibooks: this is the case for all documents.
Wikisource: either because the scans are missing, or because it is a born-digital document (OpenAccess books and articles, grey literature, laws, etc.).

Starting a book without scan

There will be need to set up a method for creating a book main page for entering the metadata and managing its sections. The proposed method is to add another link to the actions toolbar, resulting in:

Upload file
Start digital book

Quality levels

Consider having the option to change the pagequality for an entire section when there are no supporting scans.

[future] EPUB import into Wikisource

At some point it should be possible to import an EPUB into Wikisource and have automatically both the text and the structure of the book. This can be very interesting for a certain kind of texts (ie. OpenAccess literature, books that have been proofread, formatted and structured as epub, etc). This can be the subject of a future GsoC or a volunteer project.

Annotations

Richa Jain is working in the conversion of the OKFN Annotator into a Mediawiki extension (project). This will allow to highlight part of the text and add comments.

Activating/deactivating comments

It will be necessary to add an interface switch to activate/deactivate the comments and show the number of them. It can be something like the ArticleFeedback tool, which shows an icon with the number of comments. However, it has to be noted that a future project could be to support semantic annotations on Wikisource (thepund.it), so the interface should allow this addition later on.

Anchoring/display of comments

Comments should be anchored at page level, but they should also display when the page has been transcluded into a section containing all pages of the chapter. This might need some testing on Wikisource when the extension is more mature.

[future] Integration with Wikidata

This is the proposed workflow with Wikidata integration:

The uploading is going to be clear through the UploadWizard. If Wikidata finally supports Commons, then there will be 3 different wikidata items to create automatically (file item, work item, edition item) plus the links between them. It is possible that the file being uploaded has already a "work item" and also an "edition item", for this reason there will be the need of providing an interface method for linking to such existing items, maybe with a radio button "link to item/create item". It can be that the author item, or the publisher item don't exist, they will also have to be created at this point, this depends on the solution proposed to resolve bugzilla:49068
- See commons:Commons:Wikidata for media info for a proposal for managing media meta data using Wikidata technology.
After finishing the upload, an option to start an Index page from the UploadWizard will be given after the upload. Initially the metadata will be hard-copied as text but at some point it has to be linked through Wikidata. Maybe only the "edition item" is needed and from that one extract the information from the other items.
When the Index page has been created, the user will be transferred to this page in Wikisource to proofread the file.
At some point the "book main page" (one or several) has to be created and the link inserted into Wikidata's "edition item" (<Wikisource page> [link to the book main page] with possible qualifier <exportable> or <exportable as>). Maybe the Index page will need a "start book" button to summons the book manager form and to create the link in Wikidata.
If the book has no supporting scans, then the items for "work item" and "edition item" will have to be either created through the interface in Wikisource, or alternatively, if the item in Wikidata exists then it has to be modified to add the link to the corresponding edition item, so the book becomes indexed (similar to when uploading a book).

[future] Integration with Wikipedia

When all this happens, the integration with Wikipedia should be straight-forward. There are some ideas, mockups and examples about how to do this.

Wikisource across projects

Contents

Scan file upload process

Access points

Metadata

Upload successful

Proofreading the uploaded file scan

Mapping file pages to book pages

Transclusion and section label information

Formatting options

Invoking the book manager

Managing the book

Metadata

Creating the book structure

View options

Export options

Book actions and progress report

Books without supporting scan

Starting a book without scan

Quality levels

[future] EPUB import into Wikisource

Annotations

Activating/deactivating comments

Anchoring/display of comments

[future] Integration with Wikidata

[future] Integration with Wikipedia

See also