Wikimedia Foundation GLAM team/Office Hours/November 2020

Other languages:

November meeting: WikisourceEdit

The third meeting was on Monday 23 November 3.30-4.30pm UTC and was repeated on Tuesday 24 November 11.00am-12.00pm UTC. We presented our work coordinating across movement stakeholders to improve Wikisource infrastructure. On Monday, we were joined by PanLex, who shared their grant-funded project to develop a Balinese palm-leaf transcription platform on Wikisource. On Tuesday, we were joined by Wikimedia UK who presented their project with National Library of Scotland staff (in English).



Satdeep opened his remarks by saying that he believes Wikisource is a major part of the essential infrastructure for free knowledge. It is imperative to have a really good transcription platform, especially so that underrepresented languages can have their own digital library. Wikisource hasn’t had a lot of investment in its infrastructure and has been mainly volunteer built.

Satdeep shared an overview of the Wikisource workflow, with an overlay of projects that are being worked on this year. These small projects have been supported in different ways—some via the Community Wishlist Survey 2020, others by Wikimedia Foundation grants, and another was achieved through a Google Summer of Code mentorship.

Satdeep shared the recently launched Pagelist Widget, which improves the visualization of files and pages, as well as user experience and editing. It has already been enabled on 25 Wikisources. He also previewed Wikisource Export and encouraged people to give their feedback on the proposed designs.

David’s presentation was about a grant-funded project to develop a Balinese palm-leaf transcription platform on Wikisource.

The starting point for this project was the Balinese Digital Library, which was created in 2011 by the Internet Archive in partnership with major Balinese collections. It made available digital photographs of 3,000 works, containing 130,000 leaves, and covering all aspects of Balinese culture for centuries. However, it turned out that images alone were not enough. They were hard to use, read, and share and the Internet Archive wanted to create something more useful and engage the community.

PanLex applied for a Wikimedia Foundation project grant to have a new Balinese Wikisource as the long-term home for the transcription platform and its works. The project encompassed:

  • Uploading scans to Commons
  • Importing existing work from Palm Leaf Wiki
  • Adding Balinese fonts to the Universal Language Selector
  • ProofreadPage improvements for content language
  • Balinese Language Converter for transliteration
  • User script to activate transcription interface and transliteration

Interestingly, the Balinese Wikisource editing interface uses IIIF to retrieve a high resolution tile for only the part of the image that is being transcribed, reducing data usage in low resource contexts.

Sara’s presentation, Responding to Covid: The National Library of Scotland & Wikisource introduced Wikisource:Wikisource:WikiProject_NLS.

In 2020, the National Library of Scotland’s building closed due to the Covid-19 pandemic and they wanted a productive and valuable work-from-home activity for staff. They had a longstanding collaboration with Wikimedia UK and had already considered using Wikisource as an alternative to their in-house OCR, which is auto-generated with no facility to correct it. They decided to correct the OCR for a collection of over 3,000 Scottish chapbooks, which had been recently digitized and made available on the Library's Digital Gallery. The chapbooks covered a wide range of topics and at just 10-20 pages per book, could be transcribed in a day.

It was one of the largest staff cohorts that Wikimedia UK has ever worked with, with 70 staff members, and most of them hadn’t engaged with Wikimedia projects before. The library wanted to complete all 3,000 books so they worked with two members of the Wikisource community to agree on a more limited use of Wikisource templates, striking a balance between completeness and speed.

Library staff reported enjoying the work and the project brought this important collection to a broader audience. Sara concluded that Wikisource probably isn’t a replacement for a better in-house OCR, and ultimately the main benefit of the project was staff learning how to use Wikimedia platforms.

More than 30 participants joined the November meetings, including British Library staff who expressed an interest in learning more about the Balinese palm leaf project by PanLex. Staff from the Foundation were able to address some of the specific issues encountered on the National Library of Scotland project, noting that Google OCR limits can be removed, and committing to fixing the txt export issue. The new PageList widget presented by Satdeep addressed another of the challenges.

You can also review the collective notes in Etherpad.


Phabricator ticketsEdit