Research:Collaborative Translation Research

Tracked in Phabricator:
Task T371414
This page documents a completed research project.
Final report for a Wikimedia Foundation Research team study of collaborative translation processes

Understanding group processes and needs around translation

As part of the Wiki Experiences 2 (WE2) “Encyclopedic content” objective from the annual plan, the WE2.1 key result is intended to support organizers, contributors, and institutions to increase the coverage of quality content in key topic areas. Organizations such as the Healthcare Translation Task Force, amongst others, have increased outputs thanks to Content Translation. Indeed, Content Translation is used in campaigns, wikiprojects, and University Courses and similar collaborative settings. Historically, the main focus has been to support translation needs of individuals; the goal of this project is to more thoroughly investigate the specific needs of groups, which through their size and collaborative efforts can have big impacts on closing knowledge gaps.

Background & Goals

edit

The Language and Product Localization (LPL) Team plan to expand the capabilities of Content Translation to empower communities to define and translate content to cover specific knowledge gaps. Thus, they are currently working on expanding capabilities associated with suggestions and lists for translations. We know that often individuals will join forces through group activities such as WikiProjects (e.g., Healthcare Translation Task Force), campaigns, competitions, and edit-a-thons, for which there is a common shared goal to reduce certain knowledge gaps. Unlike individual translators, these groups may coordinate and track efforts in their pursuit of shared goals. As such, there is an opportunity to learn how groups select and track work around translation activities because of product opportunities to better support these organizing activities.

Research questions

edit

(Organizers) Motivation, activities and processes: How do groups of individuals select areas of work and focus, especially in regard to identifying, shaping (and potentially tracking) others’ translation activities?

  • What are their goals?
  • What does the process of organizing (i.e., identifying, organizing, tracking, etc) translation efforts look like? How do they select areas of work and generate work lists? How do they identify gaps?
  • What, if any, activities do they engage in with regard to monitoring progress and/or quality of work?
  • What strategies and workflows have organizers developed?
  • How do organizer needs vary based on the nature of the event - a project, campaign, competition or edit-a-thon? In-person vs online?
  • What gaps and pain points exist in these workflows, and where are there opportunities to better support the organizing work?

(Organizers) Tools and collaboration:

  • What tools (for organizing as well as translating) do they use, and to what extent do these tools meet and serve their purpose?
  • What are their views on the role of AI/machine translation in contexts like this? Where do they see the limits of AI in their own work? I.e., for what parts of their own work is AI still "not good enough" for?
  • What is their awareness and usage of Content Translation? What purpose does it serve and what are the tool’s limitations?
  • Regarding quality of translations and editing outputs, do organizers prefer to oversee such aspects or leave them to usual community processes? Why?
  • What, if any, external supports are organizers using and why?
  • To what degree is bringing in newcomers prioritized? How so?
  • What tools and guidance are needed for these individuals to contribute content that is perceived as high quality and valuable by the Wikipedia community?

(Non-organizer group participants): Motivations and interests: In addition to the perspective of organizers, why and how do group participants engage in these processes?

  • What motivates these individuals to participate in a group/collaborative setting (compared to engaging in individual work alone)?
  • What else do they do online that "looks like" what they're doing in the collaboration? Where did that interest come from? What is their educational history and status?
  • What is their awareness of Wikipedia? What did they learn about it through participation in a collaboration?
  • Are there any barriers to contributing to Wikipedia or participating in these collaborative activities that stopped them from joining earlier?
  • What are their views on the role of AI/machine translation in contexts like this? Where do they see the limits of AI in their own work? I.e., for what parts of their own work is AI still "not good enough" for?

(Non-organizer group participants) Activities and tools:

  • Relative to organizing and translation activities, what do participant workflows look like?
  • To what extent are they involved in the group activity? Why do they prefer to engage in this way/to this extent?
  • What gaps and pain points exist in the (non-organizer) participant experiences? (e.g., in terms of tools and workflows, as well as content, topics, and language)

General opportunities: What opportunities are there to better support this work and integrate translation support into collaborative workflows?

Approach

edit

Methods

edit
  • Basic desk research and informal interviews with select WMF staff
    • There is topically-related work and efforts happening with the WMF Campaigns Team. Some preliminary, informal conversations/interviews will be needed with the Campaign Team’s PM and designer.
    • Some basic discussions will be helpful with the LPL Team, in order to better understand current parameters of the team’s work, and input and alignment around the proposed research questions and approach.
  • Contextual inquiries
    • Semi-structured contextual inquiries, involving a combination of workflow observations, task analysis, and interviews, will be conducted with both organizers and non-organizing group participants (the latter of which should be segmented according to both general experience editing Wikipedia and experience translating on Wikipedia).
    • To determine participant segments, we will need to develop a selection criteria for ‘groups’ we will focus on for recruitment and involvement (some possibilities include wiki projects, campaigns, competitions, and edit-a-thons).

Participants

edit
  • Participants are in part determined by the type of groups selected above.
  • For each group, we’ll need to recruit ‘organizers’ as well as (non-organizing) ‘group participants’.

Phases & Timeline

edit

Additional details may be available at phabricator task tracking this work.

Phase 0 - Scoping (complete)

  • Finalize research brief
  • Receive input form LPL on project planning

Phase 1 - Preparation (10-28 February)

  • Complete desk research and WMF interviews
  • Discussion guide development
  • Recruitment materials development
  • Translation of discussion guide and other materials
  • Secure supporting resources, such as language support
  • Recruitment plan

Phase 2 - Data collection (3-21 March)

  • Pilot and refine discussion guide
  • Ongoing recruitment of participants
  • Conduct contextual inquiries

Phase 3 - Analysis and reporting (24 March - 18 April)

  • Analysis of contextual inquiries
  • Preparation of report and/or slide deck
  • Presentation to stakeholders
  • Discussion and report revisions

Results

edit

Below you can find a summary, key findings, and recommendations. For full reporting, please refer to the project's final report available on Commons.

Executive summary

edit

Event organizers and participants share a strong interest in group translation activities that create a sense of community, fill content gaps, and allow them to contribute to an activity with a larger impact than their individual actions. These group translation activities are centered around expanding their language version of Wikipedia and/or attracting new editors. In order to aid this goal and to maximize the impact of these events, organizers provide ample support to participants, both newcomers and experienced, in order to ensure that they remain engaged and contribute good quality articles. Key activities for organizers are identifying content to be translated, creating an article list and tracking/reviewing submissions. Key activities for participants are selecting an article for translation and translation for which they use the Content Translation (or Section Translation) Tool. Though it’s considered a useful tool, organizers and participants face multiple challenges such as frustration with having to modify a good translation to adhere to modification limits, newcomers being unaware about formatting wikitext after translation, and abandonment of articles in drafts. Lastly, while organizers and participants want to contribute to a large goal, they take very few actions to actually understand or track the impact of their activities. This suggests that while they are interested in understanding their impact, they lack the resources or knowledge to do so.

Key findings

edit
  • Organizer and participant motivations for group translation activities align on key aspects of filling content gaps, an activity which is larger than individual contributions and also bringing together the community and creating a sense of belonging.
  • Goals for collaborative translation events were typically centered around increasing article count or attracting new editors.
  • Identifying content to be translated and creating a list are important activities and organizers use multiple tools (PetScan, Page Views, Article List Generator, List-Building Tool, Wikidata Query) and the help of other organizers to aid them in this effort.
    • For example, PetScan is a very popular tool but doesn’t assist with prioritization for which organizers use Page Views. Additionally, if multiple organizers are working on a list, they use some note-taking documents to curate the list and avoid repetitions.
  • PetScan is also seen as a complex tool used by experienced editors and to provide flexibility and support to participating wikis, one organizing team built the Article List Generator.
  • Articles are generally translated from English Wikipedia mainly because they tend to be well-written, well-referenced and won’t get deleted or flagged after translation.
  • Finding good quality source articles can be a challenge, especially on more niche topics for smaller language wikis. (for example, Latvian architects). Since these articles don’t exist and a tool can’t help find these articles, organizers prefer to build a list manually.
  • As long as the translation is edited and fits the theme of the event, participants are given a fair bit of autonomy in terms of article selection.
  • All events included in this study used the Content Translation tool, but there were concerns regarding newcomer challenges and the blocking of good quality machine translation because it didn’t meet the translation thresholds.
  • Organizers were interested in sharing their lists of articles for a broader audience, but remained concerned about tracking participant entries in a contest.
  • In order to increase the impact and engagement of their events, organizers employed various mechanisms around event length, relaxed rules, and prizes. This also included providing support to newcomers.
  • Both participants and organizers rarely re-visited previously translated content to update, expand, or improve the article.

Recommendations

edit

Looking at the organizer and participant experiences, there were certain activities that show potential for interventions, some outside the scope of the content translation tool.

Two common experiences were the lack of interaction with the article post translation and rarely measuring or tracking impact of their activities.

  • Both participants and organizers rarely interacted with the article content post translation. In a few cases, there were some article improvement campaigns or they added articles to their watchlist if they were really interested in the topic. This lack of interaction is an opportunity to remind them to check or update previously translated content so that articles remain updated.
  • There was an interest in understanding and tracking the impact of their contributions but a lack of awareness as to how to go about it. Organizers sometimes checked impact, for example by tracking the gender gap, and participants by checking their page views and edit count. Since this study skewed towards experienced editors, it's possible that their motivations are more intrinsic but there is scope to explore this further.

Recommendations for organizers experience

edit
  • Since curation is a challenging task for organizers, they would benefit from some assistance, whether it’s working collaboratively on a list or helping to evaluate source articles.
  • Organizers' have concerns about abandoned articles that stay in the participants’ drafts folder. Could some sort of indication be given to participants to remind them to either complete the translation or relinquish the article.
  • Some offline events required sign-off from the organizer on the articles to ensure that it was well-translated. How could this be replicated online to reduce burden on the jury and patrollers and avoid mistaken deletion of content?
  • While this study didn’t find any examples of participants working collaboratively on the translation of an article, Project Med is working on a functionality to give credit to multiple translators. It may be worthwhile to keep on any such need arising in language wikis.

Recommendations for participants experience

edit
  • One of the concerns around making the article list publicly accessible on the CX tool was about any editors mistakenly participating in an event, campaign/competition lists could be segregated until the event runs its course.
  • A request from organizers was assistance for participants on info box templates for the language they’re translating into. If there’s a way to automate it or possibly give some more context to the issue raised. It was the same for any content that was not being translated. together in an article. This didn’t arise from this research, it could be something to watch out for.
  • There are requests to integrate models such as DeepL that could potentially improve translations.
  • Organizers recommend newcomers to start editing in desktop mode when on their mobile. This is to counteract the tendency of newcomers to publish prematurely. This suggests that newcomers don’t pay sufficient attention to the warnings issued by the tool on modifying the translation or even if they modify the content, they’re not aware that they haven’t completed translating the article. Newcomers may not be aware that their article is incomplete and has been added to the tracking category for the community to review and may not get published. There needs to be some reinforcement of these warnings to notify the participants of the consequences of prematurely publishing.

Resources

edit

A note on terminology - ‘Collaborations’ and ‘Collections’. In general, there may be a lot of terms used for this type of thing, and it’s not clear what terminology community groups may use. That said, “collaborations” was a WMF renaming of “community lists” to refer to collaborative editing group activities (e.g., edit-a-thons). Currently the LPL Team is referring to sets of contents that could be worked on in connection with an event or organized effort as “collections”. Collections are lists of content, which aligns with the notion of knowledge gaps; that is, they are content-focused, whereas collaborations describe human activities and efforts. Collections may be used by collaborations.