Indic-TechCom/Requests/IWCC2018

Wikisource community in India had an Indic Wikisource Community consultation(IWCC) programme in Kolkata on November 24th and 25th. It was organized by CIS-A2K. Indic-Tech com was requested to take part in this event and help the Wikisource community in finxing the broken gadgets, creating new Java scripts which will help in Wikisource editing, creating new tools etc. During this event Indic-Tech com launched an OCR tool which was created User:Jayprakash12345. The following requests were made by the Wikisource community during the event.

Wscontest tool edit

To prepare a tool which will help show the number of pages proofread, validated etc by a user in a particular time. This tools will help our community to have a Proofread-a-thon and see and who has contributed more for the Proofread-a-thon. This tool will also help the community members to look at their own contribution and bost up there energy and contribute more to Wikisource.

Temporary Quarry Tool edit

I need Temporary Quarry Tool like https://quarry.wmflabs.org/query/32994 for Wikisource contest. I need stats of User action ( proofread/Validate/Problematic/Outwith text) at specific book within specific time-frame. Also I don't know how do the pawikisource team organizes with this quarry, when users every action counted, including NS0 page edit, category page edit,Index page editing , not specific to specific book or proodreading counts. Regards. Jayantanth (talk) 08:09, 9 February 2019 (UTC)[reply]

Jayantanth जी, इस समय ऐसा क्वॉरी नहीं बनाया जा सकता हैं। क्यूकिं किसी भी डेटाबेस टेबल में proofread/Validate/Problematic जैसे ऐक्शन स्टोर ही नहीं हैं। यहाँ phab:T172408 देखें। अगर यह स्टोर होते तो बहुत पहले ही विकीसोर्स कॉंटेस्ट टूल बन गया होता। मैंने विकीसोर्स कॉंटेस्ट टूल पर थोड़ा बहुत काम किया हैं। लेकिन इसके जल्दी शुरू होने के कोई चांस नहीं हैं। लेकिन हाँ ऐसा क्वॉरी बनाया जा सकता हैं जो निश्चित समयांतराल में यूज़र द्वारा किए गए निश्चित namespace में एडिट काउंट बताए। और क्वॉरी 32994 में, केवल मुझसे 50+ एडिट काउंट की लिस्ट माँगी थी। इसलिए मैंने किसी namespace का clause नहीं लगाया था।--Jayprakash >>> Talk 11:10, 9 February 2019 (UTC)[reply]
Thanks for explaining the situation. @Jayprakash12345:. Jayantanth (talk) 15:41, 9 February 2019 (UTC)[reply]

Wiki source stat edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section.
Tracked in Phabricator:
Task T212517 resolved

To create a Wikisource stats page for Indic Wikisource communities. This page should contain the stats of Page namespace, Main namespace and their subcategories like not proofread, proofread, validated, with the scan, without the scan and etc. This list should be sortable and this tool should be only for Indic languages. Currently, we have a tool which is running based on the number of articles in categories like to proofread, validates and etc but this tool is but manually and it's for all the languages.

@Jayantanth, Titodutta, Bodhisattwa, and Ananth subray:,   https://tools.wmflabs.org/indic-wsstats/ --Jayprakash >>> Talk 05:37, 26 December 2018 (UTC)[reply]
Tito Dutta Ji, Changed--Jayprakash >>> Talk 07:52, 26 December 2018 (UTC)[reply]
Thank you. -- Tito Dutta (talk) 08:36, 26 December 2018 (UTC)[reply]
Awesome. Jayantanth (talk) 17:16, 26 December 2018 (UTC)[reply]

The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.

hOCR edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section.

hOCR is a format for representing OCR output, which includes the layout information, text and style information. It embeds this information invisibly in standard HTML format. Previous OCR formats, the recognized text, and OCR-related information co-exist in the same file and when you upload the books to commons and create the INdex page on Wikisource the OCRed text will be avaliable. Which reduces the community members effort in running the OCR. To get this tool working tesseract should be updated from 3.0 to 4.0.

@Jayantanth, Titodutta, Bodhisattwa, and Ananth subray:,   Not done I have talked with Tpt See phab:T208711#4780257. There is no need to change in tools's repository. The problem is that Toolsforge provides us Ubuntu 14.04 But tesseract-ocr 4.0 will need ubuntu 18.04. We can't do anything from our side. The WMF Cloud team will provide us Debian Jessie after some time (around 4-6 months). So closing it as not done. because It needs to run only one command. Which Tpt will take care of. If You need it any cost right now. then we have to host this tool at third-party hosting. Which may cost 800 Rs per month. And I think 1st option will better to wait 4-6 month. Thanks--Jayprakash >>> Talk 11:34, 21 December 2018 (UTC)[reply]
Thanks @Jayprakash12345:, for your effort. I can under stand the the main issue is. As of now really we don't need it urgently, because we have two option now for OCR ( Google Drive OCR and Fusion OCR). If google have stopped this service, it will required urgently. Lets wait for WMF Cloud service to update Ubuntu 18 at their server. Thanks again. Jayantanth (talk) 16:40, 21 December 2018 (UTC)[reply]

The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.

Android app for reading Wikisource books edit

To create an Andriod app which will help the readers to download the books or read online from the Wikisource website. This will increase the number of readers in every langauges.

Image Crop edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section.

This is a tool which will help all the people to add the images in the actual text. When this tool is enabled it will allow you to select the part of the page which you want. Currently, this is having some issue and not able to use it properly.

@Jayantanth, Titodutta, Bodhisattwa, and Ananth subray:,   wikisource:kn:ಸದಸ್ಯ:Jayprakash12345/Cropimage.js, I have made some other changes as well like remove percentage encoding and change its position from advance to the main tool. Thanks :)--Jayprakash >>> Talk 15:30, 21 December 2018 (UTC)[reply]

The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.

Qr code generator edit

This is a request to create a tool which will generate a QR code of HD quality of each book on Wikisource. Which can be given to different libraries.

  Work in Progress--Jayprakash >>> Talk 11:18, 21 December 2018 (UTC)[reply]
@Jayantanth, Titodutta, Bodhisattwa, and Ananth subray:, Can you checkout Indic-TechCom/Tools/qrCodeGenerator?--Jayprakash >>> Talk 07:47, 1 February 2019 (UTC)[reply]
Hi @Jayprakash12345:, we need the qrcode in svg format and a option to upload the qrcode to Commons. -- Bodhisattwa (talk) 08:12, 1 February 2019 (UTC)[reply]
Bodhisattwa Ji, Generating svg on client side is no big deal. I had generate svg by qrcode-svg. But there are two problems.
  1. It is not support utf-8 characters, which means it will work only for english text.
  2. Even after Generating svg, It is very hard to create Download button for it. Because everytime there is some text missing in svg file. Which makes file broken. And gives file is not vaild.

So problems 1 is a big blocker, I can handle problem 2. But even doing that, We can't add the option to upload directly to commons. because it needs server-side scripting like in PHP, python etc. And on the wiki, we can only customize client side (Javascript). So here we need to create a tool on tools forge.--Jayprakash >>> Talk 13:17, 1 February 2019 (UTC)[reply]

@Jayprakash12345:, for problem 1, we have shorturls extension installed for each pages in Bengali Wikisource. The shorturl dont have utf-8 characters. Please check https://bn.wikisource.org/s/94i as an example. -- Bodhisattwa (talk) 13:32, 1 February 2019 (UTC)[reply]

Voice typing button edit

Similar to OCR, Voice typing is working completely fine on a Documents. Community members requested to integrate this tool on Wikisource os that it will save a lot of energy and time of the volunteers.