Grants talk:IEG/Tools for Armenian Wikisource and beyond

Active discussions


Hi, when you say these tools will be standalone widgets, do you mean they'll be hosted off-wiki on a third party website? If so, who will host them (tool labs?). Or are they more like userscripts and preferences gadgets? Cheers, Ocaasi (talk) 21:38, 2 April 2014 (UTC)

Hi, and thanks for question. I meant to say "Gadgets", my bad, will correct that. Half of the tools AutoHinter, SectionMarker, SectionHarvester don't need any additional server side components, and will be JavaScript only, so will be implemented as User Scripts, which can be turned into Preference Gadgets by local sysops. SectionHarvester can be even standalone web page.
ZoomProof will need a server side script to be run just once for a every DjVu file, put meta-data on subpage(s), and then again User Script/Gadget model only.
Illustration Cropper will need a server side components, with source available and using standard languages and libraries, it can be run on Wikimedia Tool labs, or other servers/workstations, if proofed to be useful and interesting to community, re-implementing it as MediaWiki extension may be the right thing to do in future, outside of this project scope.
LST Guard, will be mainly or only server side script (quite possible PyWikipediaBot script, but that may change) and again can be run on WM tool labs, or on servers/hosting/workstations which are currently being used by community to run bots. There is possibility to make it with JavaScript only, but it will be either slow, or will need redundant data to be kept on WikiSource pages, and unless that script is being turned only for all users, without option to disable it, it will do the job only partially. Thus server side monitoring tool is the much more efficient way to implement this. Best, --Xelgen (talk) 12:31, 3 April 2014 (UTC)
Thanks for sharing these details, Xelgen. In order for this project to be eligible for an IEG, we need this confirmation that you're planning to develop these tools as gadgets or other stand-alone applications that don't have a hosting dependency or otherwise rely on WMF engineering to integrate or deploy. It looks like you're largely planning to develop community tools within this eligible scope, which is great! For any tool that has a hosting dependency beyond what any normal volunteer can expect from Wikimedia Tool labs, you'll need to demonstrate that you've got your own sustainable hosting solution, so please feel free to update your proposal if any of the tools in your plan should be considered out of scope for this 6-month project. Best wishes Siko (WMF) (talk) 01:06, 8 April 2014 (UTC)
Yes, Siko, to the best of my understanding of "no hosting dependency" requirement, all tools can and will meet that requirement. They can be run on without requiring any special, non-standard involvement from WMF staff, and if individuals will be interested they can host it on other hosting platforms, or even their own machines if they choose to, for whatever reasons. For illustration cropper, good & real examples are CropBot and CropTool intended for Commons, which have similar architecture and run on And for LST Guard, well many bots running on wmflabs server, or even Raspberry PI's. --Xelgen (talk) 00:58, 9 April 2014 (UTC)

I'd like to point out that Xelgen's use of "server-side" above may be confusing, and to call out the distinction, for any less technical readers: The items Xelgen identified as needing to be "server-side" are "server-side" in the sense that they depend on code that would not be running in the Wikisource user's browser (that would be "client-side").

However, the "server-side" code we do not allow IEGs to require is actually to be understood as "code running on Wikimedia's production servers, e.g. as core extensions deployed on wikis".

But the code needed for Xelgen's ideas, while not running in the user's browser, does not need to run on the Wikimedia core cluster either. It can be termed perhaps "bot-side" or "third party", in that it can, and should, run on a server that's neither the user's device nor the Wikimedia core cluster; e.g. it can (and probably should) run on the Tool Labs servers. Asaf Bartov (WMF Grants) talk 17:37, 15 April 2014 (UTC)

Eligibility confirmed, round 1 2014Edit

This Individual Engagement Grant proposal is under review!

We've confirmed your proposal is eligible for round 1 2014 review. Please feel free to ask questions here on the talk page and make changes to your proposal as discussions continue during this community comments period.

The committee's formal review for round 1 2014 begins on 21 April 2014, and grants will be announced in May. See the schedule for more details.

Questions? Contact us.

Eligibility is based on the assumption that all tools to be considered in-scope for this project are gadgets or can otherwise be completed without a WMF engineering dependency. Cheers! Siko (WMF) (talk) 23:31, 8 April 2014 (UTC)

further expansion of the toolEdit

Hello and thanks for proposal, I have no major comments or questions to it, just wanted to clarify one moment: what are the language dependents for these tools? I want to understand how much effort would it require if editors of other projects would like to adapt those tools for their projects rubin16 (talk) 17:13, 13 April 2014 (UTC)

Hi and sorry for late reply. Short answer is, it will depend on tool, and specifics of language, but our goal is to ensure they all can be used in majority of Indo-European languages, after few hours of configuring and coding by native speaker, having some light coding experience.
I mention Indo-European languages, as it's hard for me to comment on structures of other language groups, but I really hope it can be used by all languages in which there are Wikisources, after reasonable time to reconfigure. We'll for example do our best to consider specifics of RTL languages as well, during development. That's especially true, after we've seen interest from around the globe.
To elaborate a bit more, let me divide tools by how language-dependendt they are.
  • Light or no-language dependency:
    • Illustration Cropper - almost no language dependency, just some localization of UI and Category names, should be enough.
    • Section Harvester - light language dependency, necessary for validating results. Core functionality, will work for any language, right our of the box.
  • Average dependency on language
    • SectionMarker - while essential functionality is not language dependent, to validate results, prevent duplicates, etc, code needs to be aware of language specifics. For example there are more than 2 ways to capitalize word in Armenian. Georgian on the other hand doesn't have concept of uppercase/lowercase letters, at all. Some coding may be required for some of the languages, while others won't need any. Very rough estimate is that half won't need, and half might need.
    • LST Guard - same goes with this tool, but as changes happen in automatic manner, we have less right to make a wrong thing.
  • Significant, dependency on languages:
    • AutoHinter - is all about language and OCR specifics mistakes. On the other hand it uses lists of simple RegEx rules, so it should be very easy to implement for any language, by native speakers, or with help of those.
    • ZoomProof - that's the trickiest at this moment to comment on. We have to map original DjVu text and text on Wikisource pages which is proofread, reformatted and contains Wiki syntax, templates, etc. Code needs to be aware of such modifications, to be able to pin-point and highlight the word, with precision. We have few approaches for solving that problem in mind, and we'll need to test them on real examples, to see which method gives us best results. We'll try to avoid "hardcoding" language specific things, but at this point I'm not sure if this is something we'll be able to avoid completely. Thus some coding may be required, for achieving good results, for other languages. But even then, at least 90% of code, should be re-usable.
So again, language dependency can't be avoided completely, but we'll plan and code with internationalization in mind, comment code enough and will provide technical documentation in English, to make things as easy as possible for those who will be interested in implementing tools on other language projects. --Xelgen (talk) 00:49, 22 April 2014 (UTC)

Make it work for everyoneEdit

Thanks, Xelgen, for this interesting proposal!

I'd like to encourage you to commit to making these tools, if this gets funded, [at least more likely to] work for everyone across our different communities. This means adhering to community standards and look for integration opportunities. For example:

  • All user interface strings should be put up for translation via the excellent community service and process in
  • Documentation on how to use the features, including clear instructions for possibly-less-technical sysops of other Wikisource editions on how to expose the gadgets as preferences to their users, should be provided on in English, as a basis for translation.
  • Existing tools should be reviewed and integrated with (or into) if possible, and community expertise should be leveraged. Famous Wikisource hero Tpt would no doubt be a valuable ally for you, and it would be great to see an affirmation that you intend to benefit from his expertise.

Thanks for considering these suggestions. Asaf Bartov (WMF Grants) talk 17:45, 15 April 2014 (UTC)

Thanks, for suggestions and encouragement, Asaf.
  • I'll recheck with i18n team, but as far as I know, we don't yet have a straight-forward way to localise User Scripts/Gadgets in For server side parts, we'll use
  • Absolutely, will be done that way. Integrating as Gadget is easy, and can be described in few lines. Modifications may require more skills (language and technical) for some of the tools. We'll try to minimize and ease it, as much as possible.
  • I do affirm. I also contacted Tpt later. We are both interested in working toward integration with exisiting extensions in most optimal way. Links to other tools, were also quite helpful. --Xelgen (talk) 03:19, 22 April 2014 (UTC)

Comments from the Wikisource communityEdit

On the wikisource-l mailing list this project was regarded very positively. Aubrey recommends to take a look to Alex Brollo's Crop Image tool. It has some problems when exporting as epub but other than that it works well and it can be improved. Tpt recommended to build ZoomProof as part of the Extension:ProofreadPage, perhaps as a beta feature. I would also recommend to take a look to Phe's tool called "hocr.js" which had a similar functionallity and his other tools. I fully endorse this project, since it will simplify the Wikisource workflow of many complicated or time-consuming tasks.--Micru (talk) 14:07, 18 April 2014 (UTC)

Comments from the wubEdit

This looks like a great project, and the enthusiastic response from Wikisource users is very encouraging. It's clear that Xelgen has a lot of experience in developing similar tools.

I do wonder if working on six different tools is a bit too fragmented. Perhaps it would make sense to drop one or two of them, or have them as "stretch goals" to only work on if the others get done faster than expected. This would allow more focus on the remaining tools, especially on making them available and useful to other language Wikisources, which is the real chance to increase the impact of this project. the wub "?!" 22:19, 19 April 2014 (UTC)

I see your point and agree that it is somewhat fragmented, indeed. But note that 2 out of 6 tools are already working (but for a specific case in one language) and simply need to be improved in order to be more universal and reusable on other language Wikisource projects. Three labeled section related tools (Section Marker, Section and LST Gaurd) are all neccesary for hassle free marking, getting index and maintaining it. Without any of those 3, tool set will be still incomplete. And ZoomProof and IllustrationCropper are the ones, which were of the most interest to community. I also certainly agree, that if those tools will only work on Armenian Wikisource, impact will be too low. That's why I expect to see those tools on Wikisources of other language, and Armenian is more of a starting point - something we can focus on and realistically expect to be completely accomplished, but not something we are going to be satisfied with. Actually in beginning we had idea to make it global, but we were suggested that we better try to focus on one project, to keep the focus (and still keeping the global vision). I've also updated project metrics few hours ago, and added "stretch goal" of having every tool implemented in at least one other Wikisource project, till the end of the project.--Xelgen (talk) 01:56, 22 April 2014 (UTC)


Thanks, I like the idea. How does this scale to other languages? Gryllida 22:37, 19 April 2014 (UTC)

Hi, and thanks for comments. While some tools will only require, translating of two dozen strings, some tools may require some additional coding, to implement language specifics, and provide good results. And amount of work heavily depends on specifics of language. Some have simple grammar rules, and in 5 minutes you'll be done. Other have complicated rules, with lots of exceptions, and may be challenging. Besides Armenian, I speak English and Russian, and I see no major obstacles for those 2 languages. In this project we focus on Armenian, but our "unofficial" goal is to see those tools used in other language versions of Wikisource, and the more, the better. For more details, please see my answer above. --Xelgen (talk) 01:11, 22 April 2014 (UTC)

Aggregated feedback from the committee for Tools for Armenian Wikisource and beyondEdit

Scoring criteria (see the rubric for background) Score
1=weak alignment 10=strong alignment
(A) Impact potential
  • Does it fit with Wikimedia's strategic priorities?
  • Does it have potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
(B) Innovation and learning
  • Does it take an Innovative approach to solving a key problem?
  • Is the potential impact greater than the risks?
  • Can we measure success?
(C) Ability to execute
  • Can the scope be accomplished in 6 months?
  • How realistic/efficient is the budget?
  • Do the participants have the necessary skills/experience?
(D) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
  • Does it support diversity?
Comments from the committee:
  • The tools being developed in this project would be of great benefit to those involved in transcription, indexing and OCR of paper-based sources. Access to the information in these sources could improve many articles, and allow indexing of information in these articles to scholars and others around the world.
  • Scalable, has potential, will help to increase quality and quantity in Wikimedia projects
  • The plan ensures that codes will be reusable by other language editions with adequate licensing and good documentation.
  • The measures of success are reasonable and specific. The approach is clear and aims to directly address identified problems.
  • The budget is reasonable given that several tools are being developed.
  • The participants appear to have appropriate experience to complete the project.
  • Solid community support and local involvement. Has been endorsed by a variety of editors who appear ready to use the tools right away. It appears that the software can be adapted to a number of different languages. We see high expectations and support from the community.

Thank you for submitting this proposal. The committee is now deliberating based on these scoring results, and WMF is proceeding with it's due-diligence. You are welcome to continue making updates to your proposal pages during this period. Funding decisions will be announced by the end of May. — ΛΧΣ21 23:56, 12 May 2014 (UTC)

Round 1 2014 DecisionEdit


Congratulations! Your proposal has been selected for an Individual Engagement Grant.

The committee has recommended this proposal and WMF has approved funding for the full amount of your request, $7600

Comments regarding this decision:
We look forward to seeing these tools used by Wikisourcers in Armenia and beyond, and we hope to see lasting impact for the global Wikisource community as a result.

Next steps:

  1. You will be contacted to sign a grant agreement and setup a monthly check-in schedule.
  2. Review the information for grantees.
  3. Use the new buttons on your original proposal to create your project pages.
  4. Start work on your project!

Questions? Contact us.

Return to "IEG/Tools for Armenian Wikisource and beyond" page.