Language tagging
This page explains the best practice of language tagging, i.e. marking a certain text as being in a certain language and/or script.
Language tagging
editIt is a best practice to tag a web page or a piece of text in a web page in the correct language. An HTML element should contain a lang attribute identifying the language it is written in, and also a dir attribute, identifying the writing direction.
Site-level
editThe MediaWiki software adds a lang and dir attribute on the whole page according to the user language.
Page-level
editMediaWiki has a concept of "page content language". Each page is embedded in a <div id="mw-content-text" lang="xyz" class="mw-content-ltr/rtl">. See mw:Page content language for more information.
This should be correct in most cases. It is however not (yet) possible to change this other than via a MediaWiki hook. See bug 9360/bug 28970 for that.
Pieces of text in a different language than the page language
editWhen content on a Wikimedia wiki (be it Wikipedia, Wikisource, ...) contains pieces of text in a language and/or script different than the language on the page level, it is recommended to mark this correctly in the content. This can be done by putting a span or div element around the respective text, including a lang (and possibly a dir) attribute.
Here the language tagging is summarized for easy reference to Wikimedians:
- <span lang="el">Χαίρε, ω χαίρε Ελευθεριά!</span>
Greek is implied to be written in the Greek script. If this were transliterated to e.g. Latin:
- <span lang="el-Latn">Haire, o haire, Eleftheria!</span>
If you want to further specify that this is from a specific country, say Greece:
- <span lang="el-Latn-GR">Haire, o haire, Eleftheria!</span>
When the writing direction is different from the page's, use:
- <span lang="he" dir="rtl">להיות עם חופשי בארצנו</span>
If this is a longer block of text, usually a div, also use the mw-content-rtl (or mw-content-ltr) class, which will properly align lists etc.:
- <div lang="he" dir="rtl" class="mw-content-rtl">להיות עם חופשי בארצנו</div>
A lang attribute is made of ISO 639 language codes, and optionally ISO 15924 script codes and/or ISO 3166-1 country codes.
Language codes are usually mentioned on the respective Wikipedia article about the language. SIL maintains code tables and a simple text file.
Tagging with xml:lang should be done at this moment, but it will be redundant once the wikis are shifted to HTML5.
Benefits
editExcept for being a best practice, the WebFonts extension (enabled on Incubator, MediaWiki.org and many Indic wikis) also relies on lang attributes to recognize it and provide a font for the script. See the documentation for more information.