Grants:Programs/Wikimedia Community Fund/Rapid Fund/QA tools to improve the quality, reliability, and consistency of Wiktionary (ID: 22282613)/Final Report

QA tools to improve the quality, reliability, and consistency of Wiktionary
Rapid Fund Final Report

Report Status: Accepted

Due date: 2024-04-30T00:00:00Z

Funding program: Rapid Fund

Report type: Final

Application

This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the grantmaking web service of Wikimedia Foundation where the user has submitted their midpoint report. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.

General information edit

  • Applicant username: Tbm
  • Organization name: N/A
  • Amount awarded: 5000
  • Amount spent: 5000 USD, 5000

Part 1: Project and impact edit

1. Describe the implemented activities and results achieved. Additionally, share which approaches were most effective in supporting you to achieve the results. (required)

1) Created a Python module to interact with and modify entries from the English Wiktionary. This is a set of building blocks that can be refined further for other work in the future. (I've already used this module to add hundreds of Wikipedia links, thumbnails and categories to Swahili entries on English Wiktionary; this is not counted in the metrics as it was outside the scope of this project and done in my spare time. This shows that these building blocks will make other work easier.)

2) Created a set of Python functions to extract hyphenation information from Wiktionary and to deal with different hyphenation rules for a number of languages (such as German and Hungarian). I have then created a list of words where the hyphenation pattern does not match the word. I invited editors from those languages to fix the identified issues. (One complication is that English Wiktionary employs different ways to express hyphenation information. There is a main "hyphenation" template, but several languages have their own, such as "fi-p" for Finnish and "es-pr" for Spanish.)

3) Created Python code to represent basic information about Yiddish words (such as noun genders and plurals) from English Wiktionary and Swedish Wiktionary. This was the basis for scripts to compare information between the two in order to find discrepancies. (The templating systems employed by English Wiktionary and Swedish Wiktionary are completely different, requiring different code to make sense of the data.)

4) Implemented a number of consistency checks for Yiddish entries on Swedish Wiktionary, for example to make sure the headword listed in the entry matches the entry and to ensure gender information expressed in different ways matches. (Some information is duplicated in various places, which is a potential source of errors.)

5) Created a Python module to interact with data from a Swedish-Yiddish dictionary published by the Swedish Institute or Language and Folklore (ISOF). This dictionary is available in computer-readable format (JSON) and distributed under the CC0 license, which makes it suitable for Wiktionary. Additionally, I created scripts to compare this information with Yiddish entries from the English Wiktionary and Swedish Wiktionary. (The same needs to be done for other Wiktionary projects that have Yiddish entries; in my view, this makes a good argument that common data about words need to be migrated to a shared database, for example based on Wikidata.)

2. Documentation of your impact. Please use space below to share links that help tell your story, impact, and evaluation. (required)

Share links to:

  • Project page on Meta-Wiki or any other Wikimedia project
  • Dashboards and tools that you used to track contributions
  • Some photos or videos from your event. Remember to share access.

You can also share links to:

  • Important social media posts
  • Surveys and their results
  • Infographics and sound files
  • Examples of content edited on Wikimedia projects

1) The impact of the Python module for modifying Wiktionary entries is that it makes it easier to create other QA tools to improve Wiktionary. I've already used it for some personal projects (leading to several hundreds of edits) and intend to implement more cleanup tools.

2) The hyphenation QA tools have identified a number of problems in different languages, some of which have been fixed by editors of those languages already. Additionally, during my QA work on hyphenation I noticed a number of ambiguities and mistakes in the use of Wiktionary's hyphenation template. I started a conversation on Wiktionary's Beer Parlour (discussion board) on how to address these. Hopefully, the outcome will be clearer documentation and policies. Finally, these tools can be run periodically to identify new issues.

3) I have generated a list of discrepancies in Yiddish words from English Wiktionary and Swedish Wiktionary. This is a great start but the final impact remains to be seen because editors need to manually review the list to fix issues. Yiddish in particular lacks a large number of editors.

4) The consistency checks for Yiddish on Swedish Wiktionary have already led to a number of fixes from the main editor.

5) I have used the Yiddish information from the ISOF dictionary and English Wiktionary to add some missing information (in particular noun genders) on Swedish Wiktionary. The potential impact of this work is huge as the ISOF dictionary contains many Yiddish words currently missing from Wiktionary (English Wiktionary, Swedish Wiktionary and probably others). However, editors who know Yiddish are needed to incorporate the information. The Python code can support this work (e.g. we can create a list of words from ISOF missing on Wiktionary).

Links

Hyphenation discussion (Wiktionary's Beer Parlour) https://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2024/April#Ambiguity_with_usage_of_Template:Hyph

Yiddish on Swedish Wiktionary https://sv.wiktionary.org/wiki/Anv%C3%A4ndardiskussion:Frodlekis#Adding_Yiddish_plurals_based_on_English_Wiktionary https://sv.wiktionary.org/wiki/Anv%C3%A4ndardiskussion:Frodlekis#Yiddish_headword_not_matching_page_name https://sv.wiktionary.org/wiki/Anv%C3%A4ndardiskussion:Frodlekis#Yiddish_entry_name_differences_to_enwikt https://sv.wiktionary.org/wiki/Anv%C3%A4ndardiskussion:Frodlekis#Yiddish:_comparing_gender_info_and_subst_gender_info

Detailed metrics https://en.wiktionary.org/wiki/User:Tbm/Reports/QA_tools_to_improve_the_quality,_reliability,_and_consistency_of_Wiktionary#Metrics

Additionally, share the materials and resources that you used in the implementation of your project. (required)

For example:

  • Training materials and guides
  • Presentations and slides
  • Work processes and plans
  • Any other materials your team has created or adapted and can be shared with others

The Python source code developed as part of this grant is fully open source (available under the GPLv3+ license) and available online on GitHub.

3. To what extent do you agree with the following statements regarding the work carried out with this Rapid Fund? You can choose “not applicable” if your work does not relate to these goals. Required. Select one option per question. (required)

Our efforts during the Fund period have helped to...
A. Bring in participants from underrepresented groups Agree
B. Create a more inclusive and connected culture in our community Agree
C. Develop content about underrepresented topics/groups Agree
D. Develop content from underrepresented perspectives Agree
E. Encourage the retention of editors Agree
F. Encourage the retention of organizers Not applicable to your fund
G. Increased participants' feelings of belonging and connection to the movement Agree
F. Other (optional)

Part 2: Learning edit

4. In your application, you outlined some learning questions. What did you learn from these learning questions when you implemented your project? How do you hope to use this learnings in the future? You can recall these learning questions below. (required)

You can recall these learning questions below: While there a number of QA tools for Wiktionary, a lot of work is needed in this area. I'm curious if the creation of these tools will prompt the community to build more tooling.

Furthermore, I'd like to see if these tools will lead to more cooperation among the different Wiktionary communities.

Finally, we will see if this will prompt a discussion about moving some Wiktionary data to Wikidata in order to remove duplication among the different Wiktionary communities.

1) I believe it's too early to tell whether the creation of these tools will prompt similar work. However, a number of community members recently started a related QA effort, so there's room for cooperation and shared infrastructure (see https://en.wiktionary.org/wiki/Wiktionary:Todo/Lists). Based on this grant, I certainly have many ideas for future work and I intend to apply for more grants. (I believe it's a serious gap that the "General Support Fund" does not allow software development projects whereas the Rapid Grants are limited in scope. There is substantial scope for QA tooling work.)

2) We have already seen some cooperation between Wiktionary communities in the Yiddish work with Swedish Wiktionary. It remains to be seen whether this can be expanded.

3) Personally, this work has further supported my belief that there's too much duplication of information on Wiktionary and that all the different ways of doing things (different templating systems, etc) makes things hard (QA work, using information form Wiktionary, etc). I believe we need a project that explores the use of Wikidata for Wiktionary.

5. Did anything unexpected or surprising happen when implementing your activities? This can include both positive and negative situations. What did you learn from those experiences? (required)

For me personally, this project was both challenging and rewarding (the former first, then the latter). It's one thing to work on Wiktionary as a volunteer. It's completely different to work on a paid grant. While I always give my best as a volunteer, I can do whatever work I want when I want and there's no strict obligation or deadline. With a funded rapid, you make a commitment and this changes the perception of the work. It took me a while to get started on this project because of this, but, fortunately, once I started the work, I immensely enjoyed it and I believe I created something of great value.

I often argue we need more paid opportunities to work on Wiktionary, but it's important for people to think about all pros and cons of volunteer vs paid work (and mixing the two).

(Ironically, I procrastinated on this paid Wiktionary project by doing other, unpaid Wiktionary work.)

6. What is your plan to share your project learnings and results with other community members? If you have already done it, describe how. (required)

I have written several reports about this work and started discussions (e.g. Wiktionary Beer Parlour, Swedish Wiktionary on Yiddish, some private conversations). I intend to keep the conversation going.

Part 3: Metrics edit

7. Wikimedia Metrics results. (required)

In your application, you set some Wikimedia targets in numbers (Wikimedia metrics). In this section, you will describe the achieved results and provide links to the tools used.

Target Results Comments and tools used
Number of participants 20 10
Number of editors 20 10
Number of organizers 3 1
Wikimedia project Target Result - Number of created pages Result - Number of improved pages
Wikipedia
Wikimedia Commons
Wikidata
Wiktionary 500 20 1300
Wikisource
Wikimedia Incubator
Translatewiki
MediaWiki
Wikiquote
Wikivoyage
Wikibooks
Wikiversity
Wikinews
Wikispecies
Wikifunctions or Abstract Wikipedia

8. Other Metrics results.

In your proposal, you could also set Other Metrics targets. Please describe the achieved results and provide links to the tools used if you set Other Metrics in your application.

Other Metrics name Metrics Description Target Result Tools and comments

9. Did you have any difficulties collecting data to measure your results? (required)

No

9.1. Please state what difficulties you had. How do you hope to overcome these challenges in the future? Do you have any recommendations for the Foundation to support you in addressing these challenges? (required)


Part 4: Financial reporting edit

10. Please state the total amount spent in your local currency. (required)

5000

11. Please state the total amount spent in US dollars. (required)

5000

12. Report the funds spent in the currency of your fund. (required)

Provide the link to the financial report https://docs.google.com/spreadsheets/d/1Huom7q64mNYYWLBWhN-wIGMWlzk9Bgi29VLGpfz1V_o/edit?usp=sharing


12.2. If you have not already done so in your financial spending report, please provide information on changes in the budget in relation to your original proposal. (optional)


13. Do you have any unspent funds from the Fund?

No

13.1. Please list the amount and currency you did not use and explain why.

N/A

13.2. What are you planning to do with the underspent funds?

N/A

13.3. Please provide details of hope to spend these funds.

N/A

14.1. Are you in compliance with the terms outlined in the fund agreement?

Yes Your response to the review feedback. 14.2. Are you in compliance with all applicable laws and regulations as outlined in the grant agreement?

Yes

14.3. Are you in compliance with provisions of the United States Internal Revenue Code (“Code”), and with relevant tax laws and regulations restricting the use of the Funds as outlined in the grant agreement? In summary, this is to confirm that the funds were used in alignment with the WMF mission and for charitable/nonprofit/educational purposes.

Yes

15. If you have additional recommendations or reflections that don’t fit into the above sections, please write them here. (optional)


Review notes edit

Review notes from Program Officer:

N/A

Applicant's response to the review feedback.

N/A