Grants:Project/Rapid/-jem-/Table import

statusfunded
Table import
Completing/improving a bot module and creating a web tool for managing the import and wiki formatting of tables in external web pages to content pages, allowing configuration for each external site or table.
targetes.wikipedia initially; others in the future
start dateAugust 18
end dateOctober 20
budget (local currency)950 EUR
budget (USD)1066.12 USD
grant typeIndividual
grantee-jem-
contact(s)• joseemiliomori(_AT_)wikimedia.es


Review your report

Project Goal edit

Briefly explain what are you trying to accomplish with this project, or what do you expect will change as a result of this grant. Example goals include, "recruit new editors", "add high quality content", or "train existing editors on a specific skill".

As a result of my day-to-day Spanish Wikipedia consults, and also knowing the priority that the Wikimedia movement wants to give to reducing biases and the lack of diversity in the content of projects, I detected that there is work to be done on articles about minority sports leagues, and I thought of a technical help to facilitate this work in many cases. For that reason, I programmed a test module for my bot that could import and convert into wikitext the table-format classifications/rankings published on the official websites of those leagues, as they are probably the most important content in those articles, so that they could be easily incorporated and then be kept up to date, as users often do not perform that task, which greatly reduces the usefulness of those articles.
I set up manually the configuration and parameters for six test articles in es.wikipedia: es:Liga Femenina de Baloncesto 2018-19, es:División de Honor de balonmano femenino 2018-19, es:Liga ASOBAL 2018-19, es:Anexo:Primera División de fútbol sala 2018-19, es:Primera División Femenina de España 2018-19 and es:División de Honor de Rugby 2018-2019. My bot Jembot has been updating those classifications during the last months with basic functionality and hand-adjusted parameters, and after positive feedback I think this idea could get really useful if expanded with a more efficient and parameterizable code and a web interface open to more users, so that articles of all kinds (not just sports) that include tables can be updated, and style, order, links, etc. of the tables can be centrally controlled and personalized.
So, the final goal is to improve many content pages (first and foremost Wikipedia articles in any language) by having them completed and updated with external tables, specially when editors won't do that work (because of frequent changes, large number of rows or columns, complex data, etc.), and to allow editors to spend their time in more profitable tasks.
The code will include the possibility of working on projects other than es.wikipedia with very little extra work.

Project Plan edit

Activities edit

Tell us how you'll carry out your project. What will you and other organizers spend your time doing?

The project will be split into two complementary parts:
1. Web tool. It will be available in the Toolforge servers under my tool Jembot, and it will include:
  • List of work pages with the web source of its table(s), allowing to add more pages and sources and to group several articles under a name.
  • Configuration of wiki-side options, allowing global-wide, group-wide and page-wide settings: wikitext pattern to identify the table(s) location, font size, background row colors, column order and format, table caption contents and format, possible additional columns generated by extracting, joining or calculating (added after positive feedback), etc.
  • Configuration of source-side options, allowing global-wide and site-wide settings: HTML pattern to identify the table(s) location, headings and/or columns to be excluded, corrected/translated/rewritten (when contents are known and limited to a certain set) or parsed to include wikilinks (with integrated creation of redirections when the source doesn't use the Wikipedia/Wikimedia title for the team/entity/object), etc.
  • Management of the frequency and moments in time when tables/pages will be updated, when needed.
  • Wikitext preview of the imported tables.
  • Communication with the bot module and the server job list to pass the configuration and scheduling data.
  • Statistics and useful information.
2. Bot module. It will be integrated with the functions and modules already programmed by me for the Wikimedia Projects, in use under the account Jembot. The improvements will include:
  • Use of configuration files read from the web interface and applying of all the new settings defined in them.
  • (Possibly) use of Wikidata descriptions or properties when translating/rewriting/link-parsing table columns from the source to the wikitext.
  • Support of POST parameters needed to access the table data in some cases.
  • Optimization.

How will you let others in your community know about your project (please provide links to where relevant communities have been notified of your proposal, and to any other relevant community discussions)? Why are you targeting a specific audience?

The project, together with other ideas presented for discussion and evaluation for past and future grant proposals, has been announced in several Wikimedia projects:
All announcements refer to this page in es.wikipedia for the discussion of all my ideas, or this page in meta for other Wikimedia projects to be included.
I have targeted specifically other projects in Spanish and the Wikipedias in the other languages spoken in Spain because they are nearby communities with which I can communicate easily and which probably already know me or my work. Anyway, the idea can be carried on to any other project which has/may have tables that can be updated/imported from external sources in their contents, and I don't exclude anyone, but when getting down to the announcements, time and practical limitations made me choose only those ones.

What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

After programming and testing, the web tool will be available and operational at https://tools.wmflabs.org/jembot/ti and users will have the possibility to manage the current active table imports, group them, define new ones, and configure all their options. For every table, the tool will allow a preview of the wikitext result with the current settings, and all operations will be logged. In the projects, the pages will be updated with the requested tables by the Jembot account, according to their configuration.
The project will stay open indefinitely to further feedback and improvements with my volunteer time, and anyone interested in getting involved will be welcome to do so.

Impact edit

How will you know if the project is successful and you've met your goals? Please include the following targets and feel free to add more specific to your project:

  1. Number of total participants: N/A (the tool will be anonymous, but this could change in the future)
  2. Number of articles improved: 20 at least, for the test/startup phase in es.wikipedia (potentially, hundreds of articles or annexes could benefit from future uses, not counting other projects)
  3. Number of media uploaded to Wikimedia Commons: N/A
  4. Number of media used on Wikimedia projects: N/A

Resources edit

What resources do you have? Include information on who is the organizing the project, what they will do, and if you will receive support from anywhere else (in-kind donations or additional funding).

The programming and testing will be carried on totally by myself, with feedback of any wikimedians interested, of course. The code will be published, so other contributions can come through direct code improvements in the future. There are no additional donations or funding.

What resources do you need? For your funding request, list bullet points for each expense:

  • Previous analysis and design work (2 h)
  • Addition of web tool code for managing pages, sources, options and scheduling (24 h)
  • Improvement and optimization of bot code for user-defined configuration and other details (5 h)
  • Selection and testing of articles and sites to ensure proper operation (5 h)
  • Documentation and integration in my framework for logging, source code publishing, etc. (2 h)
Total: 38 hours, at a rate of 25 euros/hour (as in my previous grant request): 950 euros
That would be completed with my volunteer time in complementary tasks and future maintenance.

Endorsements edit