WikiConference India 2016/Submissions/Introduction to OCR4WikiSource
Main page | Hackathon | Programs | Edit-a-thon | Press coverage | FAQ | Sitemap |
- Title of the submission
Introduction to OCR4WikiSource
- Your Username (For the submission author)
Tshrinivasan (Link)
- Type of presentation
Talk
- Abstract (in about 300 words)
Recently. Tamil Virtual Academy in TamilNadu released around 2000 nationalized ebooks in creative commons license in PDF format. To add them all in Tamil WikiSource, we need in plaintext format. Shrinivasan wrote a script to use google's OCR and paste the text in relevant wiki source page. This script is being used by many indian wikisource communities. It is a great collaborative development project as many indian wiki source communities participated on development, testing, reporting issues and enhancements.
So far, around 4 lakh pages are uploaded in Tamil wiki Source using this script. This is being using by Bengali, Telugu, Sanskrit, Odiya language wiki sources
He will explain and demonstrate this tool.
Links : Source URL : https://github.com/tshrinivasan/OCR4wikisource
- Result
Accepted
Interested attendees and comments
edit- I hv used this software in Bengali Wikisource and OCRed more than 300 books. Want to know more.