Wikimedia Blog/Drafts/Content Translation - 50000 articles and the year in review

Title ideas edit

  • Content Translation - 50000 articles and the year in review
  • ...

Summary edit

Content Translation has been available for a year now and more than 50000 new Wikipedia articles were created with this tool. In this post we share how the tool advanced through the year.

  • ...

Body edit

 
Articles from the Medical Translation Project being translated using Content Translation on the Persian Wikipedia.Image by Runa Bhattacharjee, freely licensed under Creative Commons CC0 1.0 Universal Public Domain Dedication..

Last year around this time, we had announced the arrival of a new tool that evolved out of an experiment aimed at making the editing process easier for our users. The tool in question—Content Translation—was initially enabled for 8 languages: Catalan, Danish, Esperanto, Indonesian, Malay, Norwegian Bokmal, Spanish and Portuguese. Today, 12 months later, this article creating tool has been used by more than 11000 editors across 289 Wikipedias to create more than 50,000 new articles.

Content Translation introduced a simple way to create Wikipedia articles through translation. Many editors have used this method for years in an effort to enrich content in Wikipedias where creation of high quality articles has been an uphill struggle due to many reasons. However, translating a Wikipedia article included several cumbersome steps like copying content across multiple browser tabs, manual adaption of links and references, etc. Content translation abstracts all these steps and provides a neat interface that is easy to use and provides a much faster method of creating a new article.

Content Translation is a beta feature. As part of the beta program, it is available for all logged-in users on 289 Wikipedias to try and provide us with their feedback.

Progress during the year

Over the last year, we have regularly documented the progress of the tool and how it was being adapted. Feedback from the users of Content Translation through many interactions helped us ascertain the features that had been helpful, or lacking and needed more attention. Also, we relied heavily on trends determined through the statistics that were being captured every day. For instance, during initial days we found that many users were unaware of the existence of this tool. To make it easier we surfaced several access points where the tool may be needed, including the contributions page, the list of interwiki languages on an article and other easily accessible spots. Sometime during the middle of 2015, we found that many users had not used the tool after 1 or 2 times. During conversations, users cited several reasons, like lack of machine translation support for their language, technical difficulties with some features, greater effort to find articles that needed translation etc. As a result, we focused on two key aspects:

  1. continued engagement with our returning users, and
  2. increased reliability and stability of the tool.

While working on Content Translation, we also made simultaneous improvements to the Statistics page. This page displays the weekly and total figures related to articles translated and deleted, as well as information related to the active languages. The statistics page (Special:ContentTranslationStats) is available in all wikis where Content Translation extension exists. Several interesting information is surfaced through the statistics page. For instance:

  • 64% of all articles have been translated from the English Wikipedia. Spanish is the second popular choice (12%).
  • more than 1000 new articles have been created in 15 languages, of which 6000 individual articles have been written in both Catalan and Spanish Wikipedias.
  • highest number of individual translators have used Content Translation in the Spanish Wikipedia (more than 2000).
  • the highest number of articles created during a single week is 1968. Over 1900 articles are created using Content Translation every week—up from about 1000 per week in August 2015, the first month when it was enabled in all languages.
  • weekly deletion rates have been found to be between 6 to 8% of the total articles created

Besides this regular set of data, occasionally we have observed some interesting trends related to specific events. For example, when a machine translation system was enabled on the Russian Wikipedia in early November, the weekly article translation numbers doubled and has continued to grow.

Engagement and Stability

 
Comparison between articles created in Content Translating with and without the suggestions feature Image by Runa Bhattacharjee, freely licensed under Creative Commons CC0 1.0 Universal Public Domain Dedication..

One of the major outcomes in recent months is the addition of the ‘Suggestions’ feature. Instead of searching for what to do, users can view a list of articles that they can translate. This is an ongoing collaboration between the Language and Research teams at the Wikimedia Foundation. Users are displayed a list comprising of articles on topics determined on the basis of various factors like their past translations, popular topics in the language, etc. Additionally, topic-based targeted campaigns with predetermined article lists have also been introduced. The first of these was proposed by the Medical Translation Project and completed for translating a set of articles from English to Persian. A month after this feature was introduced, we found that suggestions have been used to start about 16% of the translations.

In terms of stability, increased usage of the tool has thrown up some of the technical challenges that need further attention. These include better handling of translation saving and publishing errors, reducing wikitext errors in published articles and uninterrupted service uptime through better monitoring of services. As a development team, constant interactions with users of Content Translation have been valuable as a source of information regards the performance of the tool and its shortcomings.

Coming up next

The main focus at the moment continues to be improving the wikitext sanity of the published content, reducing publishing and saving errors, and an overall improvement in stability of the article translation workflow.

Besides this, we will continue improvements of a feature that is an important aspect of this project. Content Translation uses third-party machine translation systems for several languages. To help benefit the wider machine translation development community, we recently completed the initial development of the parallel corpora API that provides an easy access to the human-modified translations. This is an open repository compiling examples of translated content and the corrections users had to make. It will be a valuable resource in improving quality and language coverage in all new and existing machine translation systems.

We would like to sincerely thank everyone for comments, feedback, encouragement and wholehearted participation that provided direction to this project. We look forward to many new things in the next 12 months.

You can share your comments and feedback about the Content Translation tool with the Wikimedia Language team at the project talk page. You can also follow us on twitter (@whattotranslate) for updates and other news.

Runa Bhattacharjee, Language team (Editing), Wikimedia Foundation

Notes edit