Research:Understanding How Editors Use Machine Translation in Wikipedia: A Case Study in African Languages

Contact
Duration:  2023-July – 2025-June
Grant ID: G-RS-2302-12035

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


Introduction

edit

Wikipedia is the largest multilingual encyclopedia, with its English edition surpassing any printed encyclopedia in size. As of March 2023, Wikipedia covers 321 languages, yet 94% of these editions contain fewer than a million articles. Translating content between languages offers a potential solution to bridge content gaps across languages, but this shifts the burden of content creation to multilingual editors. To ease this process, the Content Translation tool was introduced in January 2015 as an opt-in feature for article translation (Laxström et al., 2015). This tool automates many of the laborious steps involved in translating Wikipedia articles. More recently, with the integration of the NLLB-200 service (team et al., 2022), hundreds of translations have been published in previously underrepresented African languages, including Igbo, Hausa, Yoruba, Swahili, and Zulu.

Despite the widespread use of the tool, machine translation (MT) has faced criticism from several Wikimedia communities. The general Wikipedia consensus is that "unedited machine translation is worse than nothing." To address these concerns, many Wikipedia editions implement filters to check whether editors are modifying machine-translated content. However, these filters often fail to differentiate between cases where the MT quality is high enough to be used as is and cases where human editors allow translation errors to slip through. Consider the following example of an English sentence (EN), its machine translation (MT), and a post-edited (PE) version provided by a Wikipedia editor in Yoruba:

  • English(EN): He married Bisi Towry-Coker but they are now separated. He has three children including a son, Olaotan.
  • Machine Translation (MT): O ti fẹ Bisi Towry-Coker ṣugbọn wọn ti pin bayi. O ni awọn ọmọde mẹta pẹlu ọmọ kan, Olaotan
  • Post-Edited Translation (PE): Ó fẹ́ ìyàwó rẹ̀ Bísífẹ Towry-Coker ṣùgbọ́n wọ́n ti pín yà báyìí. Ó bí àwọn ọmọ obìnrin mẹ́ta pẹ̀lú ọkùnrin kan Ọláòtá.

The post-edited version, although an improvement over the machine translation in terms of linguistic fluency and readability, introduces content that does not exist in the original source. For instance:

"[ìyàwó rẹ̀]" translates to "his wife," while the original English sentence does not specify "his" or indicate that the wife belongs to him. "[obìnrin]" translates to "female," which is not present in the source sentence either. Additionally, the post-edited text contains orthographic errors, which further highlights the challenges in relying on machine translation for accurate and culturally appropriate content in African languages.

This proposal aims to understand how editors use machine translation in Wikipedia and to design tools and data to help them make more informed decisions when editing machine translated texts in African languages.

Preliminary analysis

edit

This section presents a preliminary analysis of the use of machine translation (MT) in African language Wikipedias, followed by a description of the proposed research to be funded by the Wikimedia Research Fund 2023.

Preliminary Analysis.

As of March 2023, Wikipedia's Content Translation tool supports machine translation for five African languages: Igbo (IG), Hausa (HA), Zulu (ZU), Swahili (SW), and Yorùbá (YO). Our preliminary examination of translation activity from the Content Translation dumps (collected in December 2022 and March 2023) shows that most of the translated content in these languages involves editing machine-translated text rather than translating manually from scratch (Figure 2).

A key observation emerged when analyzing the extent of edits made on top of machine-translated content. We measured the percentage of edits—i.e., how much Wikipedia users modify the MT output. The results show that the average level of editing across these languages is generally low (<20%), except for Yorùbá, which stands out with a significantly higher editing rate of over 40% (Figure 3).

Interestingly, we know machine translation quality varies across these languages. The NLLB (No Language Left Behind) machine translation service that powers these translations reports BLEU scores (a metric used to assess translation quality by comparing MT output to human translations). The reported BLEU scores are as follows: Igbo: 25.8 Hausa: 33.6 Zulu: 36.3 Swahili: 37.9 Yorùbá: 13.8 (team et al., 2022).

Based on those scores, one would expect Igbo translations to be more heavily edited compared to Hausa, Zulu, and Swahili, though the reported differences in Figure 3 are small.

Motivated by the above observations, we proposed to conduct a more systematic analysis of the use of machine translation for African languages, as described below.

Completed Research

edit

We proposed a three-step approach that brings together Wikipedia users, native speakers, and NLP researchers to understand the use of machine translation in Wikipedia. We propose studying the use of machine translation for African languages. As we will see in the discussion below, our findings led us to broaden the scope of investigation, recognizing that error detection is one of many aspects of supporting the use of the Content Translation tool.

Our first goal is to understand the annotation workflow and needs of Wikipedia editors who contribute to translating Wikipedia pages for the studied languages. We connected with Wikipedia User Groups to understand their goals, the issues they face, and any tensions that arise when they use the Content Translation tool. We first conducted an online survey to gain a general overview of the problem space, and then conducted semi-structured interviews to gain a deeper understanding of people’s experiences. We describe each of those below.

Surveys

edit

Methods. We invited Wikipedia editors from the Hausa, Igbo and Yoruba user groups to participate in an online survey to get an initial understanding of how and why they use the content translation tool. At a lower level, we asked them questions about the types of machine translation errors they usually encounter; the edits they typically make; their trust in the tool; their interest in automated quality assessment as a workflow aid; their use of external tools when editing, among others. The full survey is found at: https://umdsurvey.umd.edu/survey-builder/SV_837qGvGzSfOYuCW/edit

Findings. We received 30 complete responses, with participants contributing to Wikipedia in our languages of interest (18 Yoruba, 11 Hausa, 8 English, Igbo 1). We summarize the most important findings below:

  1. Experienced Editors Use the Content Translation Tool: The Content Translation tool is used by experienced editors, all spending several hours a week editing Wikipedia, to expand content in their language.
  2. Translation Issues: Editors acknowledged translation issues such as grammatical errors, meaning shifts, or inaccuracies in entities or cultural context. Despite these imperfections, translations are perceived as useful.
  3. Mixed trust in the tool: While generally the tool is deemed useful, confidence and predictability of the tool received mixed ratings.
  4. Supplemental tools preferred over generative AI: Editors relied on dictionaries, search engines, and other machine translation tools, which are generally more preferable than other generative AI tools, such as ChatGPT. This suggests editors retain control over the editing process, and use external tools to verify rather than generate content.
  5. Low Interest in Automated Quality Assessment: The most notable finding is that users showed low interest in automated quality systems—the dominant framing of automatic assessment of translation quality in the MT literature.

Overall, this survey suggests that the Content Translation tool is useful and that participants use it critically, with an eye for different types of translation errors. It also suggests that there is potential to improve their translation workflow and to design strategies to improve the trustworthiness of the tool. The diversity of responses around the perception of the tool confirms that gaining a deeper understanding of editors needs is necessary to design effective interventions.

Interviews

edit

Procedures. Participants from balanced English>Hausa and Hausa>English were recruited from the Wikipedia User Groups. Participants were asked to take part in an interview spanning about 45 min to 1 hour. Participants were given an honorarium of $100 after the interview. The interviews were conducted on zoom and recorded for research purposes. Participants were asked to share their computer screen on zoom to show us their workflow. We conducted a total of six interviews.

The interviews were divided into two sections. In the first part, participants were asked to translate an article of their choice using the Content Translation tool. During this online process, they were encouraged to follow a think-aloud mindset to get insights into what they are editing and why. This process lasts for about 30 minutes. In the second part, we proceeded with some questions about their processes.

Our goals throughout those interviews were to identify:

  • Additional quality assessment annotations, beyond the ones covered by current natural language processing tools, to better support Wikipedia editors when using the Content Translation tool.
  • Ways of assisting editors through means other than quality assessment tools.


Findings. We start by summarizing the most important points that arose from the interviews below and then briefly mention how those factor into our next steps.

What are the users editing? One notable finding was that editors were not only correcting fluency and adequacy errors, as expected, but also frequently adapting machine translations to reflect cultural norms of the target language. This included adjusting date formats, adding new words to reflect cultural norms of speaking, and changing words to better convey their contextual meaning.

What difficulties do editors face? Furthermore, the interviews also surfaced two main areas where technical improvements could enhance editors' workflows:

  1. Interface: Some interviewers mentioned the lack of mobile-friendliness, particularly when creating templates and references. Editors often resort to copying translated content and manually creating new pages to ensure correct formatting.
  1. Lack of Tool Features: Translating technical terms poses a challenge for editors. This highlights the potential value of integrating dictionaries to highlight technical terms within the tool.

Implications. While current MQM labels can capture instances where a translation segment does not conform to target language conventions through fluency annotations, they do not explicitly support decisions around cultural adaptation. Given that localization and cultural adaptation may require deviations from the source text, it is important that annotation tools can also flag cases where MT is too literal and unfocalized as "errors" or "areas for potential improvements." Furthermore, desired features for improvement include integrated dictionaries for technical terms, highlighting of these terms, and a more mobile-friendly interface.

Ongoing and Future Research

edit

The goal of future research is to explore ways to increase editors' awareness of potential areas where machine translation (MT) needs to be corrected or adapted. To achieve this, we plan to recruit annotators who will annotate not only machine translation errors in the MT outputs but also potential areas where localization or adaptation is necessary. Given the time consuming process of collecting annotations, we expect this stage to be completed by June 2025.

Expected Outcome: The expected outcome is a dataset that can help analyze MT errors and areas where MT adaptation is needed, tailored to Wikipedia's needs and data.