Wikimedia Blog/Drafts/Bengali Wikisource making rapid strides

Title ideasEdit

  • Bengali Wikisource making rapid strides
  • ...

SummaryEdit

Slowly and silently the Bengali Wikisource has made its way into the top three Wikisources in terms of content. A plan is underway to increase the content by five-fold which would provide opportunity for more volunteers

  • ...

BodyEdit

 
A copyright free book being scanned at the Wkipedia stall at the Greater Behala Book Fair. Image by Biswarup Ganguly, freely licensed under CC-BY-3.0.

Bengali Wikisource is the now the third most voluminous in terms of content. Beaten only by the French and the English Wikisources, it boasts of a total of 452,191 pages in various states of OCR, proofreading and verification. Slowly and silently it has been making rapid strides, recently surpassing the Tamil Wikisource, the erstwhile leading Wikisource among the Indic languages.

Though the Bengali Wikisource began its journey in 2007, it didn’t make much progress till a couple of years ago when Jayanta Nath and Bodhisattwa Mandal, long time contributors in Bengali Wikipedia and the only two Bengali Wikipedia administrators from India, realized that the only way to substantially improve quality and quantity of Bengali Wikipedia content was to get its base ready. Thus began the current phase of Bengali Wikisource activity which has seen nearly 2,000 Bengali books being uploaded. Jayanta and Bodhisattwa were joined by a trio consisting of army doctor Hrishikes Sen, mountaineer Sumita Roy Dutta and trekker Sujay Chandra, all having the passion for Wikipedia common among them. Tanvir Rahman and Nasir Khan from Bangladesh have joined hands as well. One of the many outcomes of the initiative, something that Wikimedia stats won’t tell you, is that the complete works of India’s national poet and Nobel laureate Rabindranath Tagore is now on Wikisource.

The journey however was not smooth. The absence of proper tooling support in Indic languages turned out to be the biggest bottleneck in the transformation of the zeal in the volunteers to precious binaries in the Wikisource. The need for tooling was recognized as a high priority need at the first international Wikisource conference in Vienna in November 2015 followed by overwhelming support at the community wishlist survey. Subsequently T. Shrinivasan from Tamil Wikipedia developed OCR4wikisource to solve the tooling problem.

So what’s next for the Bengali Wikisource team? Jayanta is planning to upload a whopping 8,000 Bengali books using bots while Sujay is working on training Tessaract. Others are busy in proofreading. With a projected five-fold increase in the work volume, clearly the team needs more volunteers. The Bengali Wikimedia community is planning a brainstorming session in their next meet up to address this vital issue.

Kalyan Sarkar, Bengali Wikipedian

NotesEdit