The main objective of this project is to add significant content to Gujarati Wikisource using OCR. This project will be executed in collaboration with Gujarati Wikimedia community, WMIN Chapter, Forbes Gujarati Sabha (which will provide access to copyright free Gujarati content to be put up on Gujarati Wikisource) and The Maharaja Sayajirao University, Baroda (which will provide a copy of Gujarati OCR software for conversion of scanned images into searchable text). The duration of this project is for 6 months.

Work done so far edit

  • M.S. University, Baroda has developed a Gujarati OCR that it feels is very effective. So far OCR is trained on 25 different books, each about 250 pages. The results prove that the OCR conversion is robust. CIS-A2K was advised by Gujarati Wikimedians Sushant Savla and Dhaval Vyas that the OCR conversion is decent and could be beneficial to have it used to enrich Gujarati Wikisource.
  • CIS-A2K has discussed with Prof. Rama Mohan of M.S. University, who heads the Gujarati OCR development project funded by Government of India, to get a licensed copy of the OCR to be used for growing Gujarati OCR. CIS-A2K has received an in principle agreement.
  • Gujarati Wikimedian Sushant Savla has good contacts with Forbes Gujarati Sabha and has expressed willingness to collaborate with CIS-A2K to help bring some of the key copyright free Gujarati encyclopedic content of Forbes Gujarati Sabha on Wikisource.

Implementation Plan edit

There are two key activities under this project.

Gujarati Content Digitization and Conversion edit


Bring 100 copyright free Gujarati books on Gujarati Wikisource by converting them into a searchable text.

  1. CIS-A2K and WMIN Chapter will work to get into a formal agreement with M.S. University to secure the license for Gujarati OCR.
  2. CIS-A2K and WMIN Chapter will work to get into a formal agreement with Forbes Gujarati Sabha to get 100 copyright free books scanned from their collection.
  3. Help organize necessary training programs on OCR and scanning.
  4. Using the Gujarati OCR these 100 books will be converted into searchable texts and put up on Wikisource.
  5. Support Gujarati Wikimedia community with scanning equipment and minor personnel support to execute the task in a time-bound manner.
  6. Organize proofreading sprints in collaboration with Gujarati Wikimedia community to proofread all the 100 books.
  7. Review and evaluation of the project
  1. Gujarati Wikisource to be enriched with 100 Gujarati books (i.e. about 10,000 folios).
  2. Could result in improving and creating 500 quality articles on Gujarati Wikipedia.
  3. Growth in the number of contributors on Gujarati Wikisource.

Offline distribution edit


To distribute offline copies of the books digitized across 500 schools in Gujarat.

  1. Create offline version of the Gujarati Wikisource
  2. Organize a formal function for the release of the offline Gujarati Wikisource.
  3. Department of Information and Public Relation, Education and Culture Department of the Government of Gujarat will be actively involved to build future partnerships and support to Wikimedia growth in Gujarati.
  4. Publicize the event in print, electronic media and on social media.
  5. Distribute the offline version in schools and introduce students to Gujarati WikiSource.
  1. Creation and distribution of about 500 DVDs of Gujarati Wikisource.
  2. Distribution in 500 government schools across Gujarat
  3. Formal release of the offline version to bring more visibility to the Wikimedia projects in the state of Gujarat and among Gujarati speaking population.
  4. About 50,000 children exposed to the presence of Gujarati Wikisource, other Gujarati Wikimedia projects and the concept of Open Knowledge

Budget edit

Expenditure Item FDC Support (INR) FDC Support (US$) Other Sources and in kind Support (INR) Other Sources and in kind Support (US$)
1 PD x 5% * 120,000/- 1,941.69730 - -
1 PM x 10% ** 132,000/- 2,135.86703 - -
1 PA x 50% *** - - 240,000/- 3,883.39460
Travel and Stay **** 50,000/- 809.04054 100,000/- 1,618.08109
Volunteer Support ***** - - 150,000/- 2,427.12163
Events/Meetups/Workshops - - 60,000/- 970.84865
Consumables/Printing/Stationery/Swag - - 20,000/- 323.61622
Equipments/License costs - - 200,000/- 3,236.16217
Total 302,000/- 4,886.60488 770,000/- 12,459.22436

* 5% of T. Vishnu Vardhan’s time as Program Director, CIS-A2K
** 10% time of a Program Manager at CIS-A2K
*** During the consultations with the Gujarati Wikimedians, it was decided to have a Program Assistant level person to work on this project full time for 6 months period. The PA will be taken on job in consultation with the Gujarati Wikimedia community.
**** Expenses incurred by CIS-A2K towards executing this plan.
***** Expenses incurred by the PA, Gujarati Wikimedians and WMIN Chapter representative towards executing the plan.

List of contributors edit