Learning patterns/Developing Apertium MT for your language in Content Translation

A learning pattern forproject management
Developing Apertium MT for your language in Content Translation
problemCreating a machine translator for your language
solutionMaking an Apertium language pair with the help of the Apertium community
created on12:23, 16 March 2016 (UTC)

What problem does this solve?


Content Translation is a great tool which is made even better for the language pairs where there is machine translation support – if the languages you translate between are not included, and you're a bit technically minded, you can contribute to create a new Apertium machine translator for your language pair!

Screenshot of Content Translation Bokmål→Nynorsk

What is the solution?

  1. Read the Contributing guide, and decide how you'd like to contribute
  2. Join the IRC channel and ask about starting a new language pair (or getting a beta pair up to release quality)
  3. Follow the New Language Pair HOWTO under the guidance of a mentor
  4. Talk to other Wikimedia users from the relevant language communities to get more people involved

Things to consider

  • There are many ways to help getting support for your language pair
    • Simply voicing that you want it might make others motivated to contribute
    • You can help find and document resources (language data like word lists, corpora, dictionaries and grammars, or simply record mistakes in an existing beta translator)
    • And finally, you can edit the Apertium language data itself and ask for SVN commit access to Apertium

The Apertium machine translation platform is especially geared towards getting high-quality translations out of closely-related language pairs, where there may not be enough parallel data available to create high-quality statistical systems. Apertium developers thus tend to get more excited about translation between closely-related (and typically non-English) languages, which is where Apertium really shines.

When to use


When Content Translation does not support your favoured translation directions, or the current translator there makes too many mistakes.

This pattern was created as part of the IEG project "Pan-Scandinavian Machine-assiste Content Translation".[1]



See also