L10n tools

This page aims to discuss some of the current shortcomings of the MediaWiki localization system ("l10n" - the process of translating the interface into other languages), and to propose and discuss new tools and changes to the software which would address these. Feel free to comment on any of the ideas in this page, and to edit anything which is not signed as being the opinion of one user.

Summary of the current systemEdit

As of MediaWiki 1.4, translations of the interface are defined in two places - "LanguageXX.php" files and the "MediaWiki namespace". The former are sections of the PHP source code which define the default interface strings for a particular language; they are primarily used to initialise the MediaWiki namespace on installation. LanguageXX.php files are sometimes created as pages here on "meta" and then copied into the program's source tree. The MediaWiki namespace uses the wiki database to allow easy editting of the messages, treating each interface message as a page in the wiki. In version 1.4, users can also opt for a different translation of the interface in their preferences; in this case, it will generally come from the appropriate LanguageXX.php file.

The problemsEdit

l10n versus customizationEdit

Because edits to the MediaWiki namespace have instant effect, it is tempting for translators to start a wiki in the target language, and translate the interface as they go. However, because the messages are specific to one project, they can and are also used for customizing the interface specifically for that project - changing the links in the "navigation" sidebar, for instance, is now achievable by anyone who can edit these messages. Thus the MediaWiki namespace from a particular project may contain a mixture of improved translations and project-specific customisations. This makes exporting the translated messages for use elsewhere somewhat problematic.

Updating an installed wikiEdit

When a new version of the software is installed, messages in the MediaWiki namespace are not changed; however, some of them will have changed functionally because of new features, and some may have been re-translated in the relevant LanguageXX.php file. Indeed, new translations are frequently completed, sometimes by editing here on "meta", and must then somehow be applied to existing projects. The only way of acquiring these changes is to over-write the entire MediaWiki namespace, including any project-specific customizations, with the new version. This leads to the practise of translating within a project's MediaWiki namespace, which in turn leads to the export problem mentionned above. Nor could it be as simple as replacing all uncustomized messages, since any customization may well need merging with the newer message, or adapting in some way in response to it.

Synchronisation between languagesEdit

Compounded with keeping a particular install up to date is the problem of keeping the translations up to date. When a new version of the software is released, some of the interface messages are likely to need altering - as well as new messages for entirely new elements of the interface, some messages may contain descriptions which are no longer accurate. But there is currently no system for alerting translators to such changes, leaving non-English interfaces inaccurate - or, in some cases, broken - after the upgrade.


An interesting point is made at Help talk:MediaWiki namespace#Licensing issues about the legality of incorporating changes made on a wiki (under the GFDL or whatever content license is used) into the software disctribution (under the GPL).

Summary of requirementsEdit

Based on the above analysis of the problems with the current system, the following seem to be needed:

  • in general, a system which allows l10n to be carried out in parallel to customization, not in conflict with it
  • the ability for all non-customised messages to be based on - or at least updated from - the latest translation
  • an install/upgrade process that alerts administrators to changes in messages which have been customised, so that the customization and
  • a l10n system which tells translators which messages require re-translation for a new software version

Possible improvementsEdit

Common MediaWiki namespaceEdit

One suggestion for dealing with the problems of importing and exporting messages from the MediaWiki namespace is to create a common site from which all projects (at least all Wikimedia Foundation projects) draw their interface messages. Thus, translations could be made with instant effect across all projects. However, this runs into the problem of #l10n versus customisation, since a system would then be needed for making project-specific versions "on top of" the centrally maintained ones. While it would be possible to use the central version only where no local version has been created, this would be equivalent to updating from LanguageXX.php only those messages which hadn't been customised - leaving any changes necessary to those that had been editted just as unnoticed as before.

A central point for managing the l10n of the software is, however, a sensible idea; with proper import (and export) facilities in individual installations, this could be used as the basis of the translation for all users (including those downloading the software from Sourceforge). To a certain extent, the pages here on meta on which LanguageXX.php files are (sometimes) created is an attempt at this.

l10n management softwareEdit

In order to solve the problems with #Synchornisation between languages, it would be useful to run a piece of software - either written from scratch or adapted from something existing - to keep track of which messages had up-to-date translations in which languages. Such a piece of software would ideally:

  • allow the editting of each message, perhaps in a wiki style (history, RecentChanges, etc)
  • allow developers to flag a message as changed in a new version
  • show at a glance which messages in a particular language were either untranslated or out-of-date
  • export the messages automatically to LanguageXX.php files
    • possibly also allow trusted users to check these into Sourceforge and/or install them into running Wikimedia projects
  • either import from, or perhaps allow comparison with, the MediaWiki namespaces of existing projects

A l10n installerEdit

In order for any of this to be any use, however, the MediaWiki software itself needs to have a system for installing the new translations. This could take the form of a "l10n installer", which would run during pgrade to a new software version, and also "on demand" when a new LanguageXX.php file was made available. It could:

  • over-write all those pages in the MediaWiki namespace that had never been editted with messages from the new LanguageXX.php (we can't just not create them on install, because they've already been created long ago)
  • create a list of those messages which had been customised, but which had also been changed in the new LanguageXX.php file, and therefore needed review. This list would then somehow have to be brought to the attention of the project's community of users.