User:DChan (WMF)/Forms of writing used in Chinese editions of Wikipedia
Forms of writing used in Chinese editions of Wikipedia
editEngineers often ask me to explain the relationship between (A) the different forms of Chinese and (B) the editions of Wikipedia written in them. I'm not aware of a single place that has pointers to all this information, so this is my attempt to create one.
Summary table
editThere are eight editions (language versions) of Wikipedia in different forms of Chinese. This table summarises the forms of writing used in them. Note Chinese text appears in non-Chinese Wikipedias too (e.g. giving someone's personal name in Chinese), and likewise other languages appear in Chinese editions of Wikipedia. See below for background information.
Variety | BCP 47 tag | Wikipedia URL | Active users (2020-05) | Content |
---|---|---|---|---|
Standard Written Chinese | zh | zh.wikipedia.org | 9075 | Written in mixed Hant/Hans script (each page is a mix, because each edit is written in the editor's preferred variant).
|
Cantonese | yue | zh-yue.wikipedia.org | 304 | Written in Hant script.
|
Min Nan | nan | zh-min-nan.wikipedia.org | 72 | Articles mainly use Latin script (Pe̍h-ōe-jī romanization).
|
Wu | wuu | wuu.wikipedia.org | 56 | Written in Hans script (using Wu grammar and vocabulary). |
Hakka | hak | hak.wikipedia.org | 32 | Articles mainly use Latin script (Pha̍k-fa-sṳ romanization).
|
Gan | gan | gan.wikipedia.org | 21 | As on Chinese Wikipedia, articles can be written in mixed Hant/Hans script (each page can be mixed).
|
Min Dong | cdo | cdo.wikipedia.org | 20 | Articles and discussion pages mainly use Latin script (Foochow romanization), though Hant/Hans script is allowed too. |
Classical Chinese | lzh | zh-classical.wikipedia.org | 75 | Articles and discussion pages use Hant script. |
The following background information explains terms used in the table.
Wikipedia in Standard Written Chinese
editThere are numerous varieties of spoken Chinese: Mandarin, Cantonese, Hakka, Wu etc. They vary in pronunciation (greatly), vocabulary (considerably) and grammar (somewhat), and are not mutually intelligible.
- Standard Written Chinese is a written form that mostly uses the vocabulary and grammar of spoken Mandarin. (Pronunciation is not strongly represented in Chinese characters.)
- Yet Mandarin speakers and non-Mandarin speakers alike use it as a written standard.
- Therefore it is often just called "Chinese" — for instance "Chinese Wikipedia" means the Standard Written Chinese edition of Wikipedia.
- Note that for non-Mandarin speakers, writing in this standard means using different vocabulary and grammar than they do in speech (diglossia).
There is, however, some variation in Standard Written Chinese, as explained below.
Script variation: Hant and Hans scripts
editChinese characters (also called "Han characters") can be written according to two standards, called "Hans" and "Hant".
- Hant ("Han traditional") is standard in Taiwan and Hong Kong, and was standard in Mainland China before the 1960s.
- Hans ("Han simplified") is standard in Mainland China today. Compared to Hant, it modifies certain characters to have fewer penstrokes.
- It is laborious trying to read Hans if you are only familiar with Hant, and vice versa.
- It is infeasible to write correct Hans if you are only familiar with Hant, and vice versa.
- Wikipedia editors are fairly evenly split between Hans and Hant.
- Script variation bears no relation to the differences in varieties of spoken Chinese. For instance you can write Mandarin in either Hans or Hant.
Vocabulary variation
editThere are vocabulary differences between Standard Written Chinese as used in Mainland China versus Taiwan, Hong Kong etc — similar to, but more extensive than, the English vocabulary differences between the USA versus the UK, Canada etc. This applies to terms in science, technology, engineering etc, but also very extensively to non-Chinese names. E.g. John Lennon's surname is written as 列侬 in Mainland China, 連儂 in Hong Kong and 藍儂 in Taiwan.
LanguageConverter
editChinese Wikipedia uses MediaWiki functionality called LanguageConverter to display articles automatically in the reader's preferred variety of Standard Written Chinese. LanguageConverter handles script conversion using a fixed lookup table (like many other software libraries). However, and far more unusually, it can also handle vocabulary conversion, powered by lexical lists within Chinese Wikipedia itself, arranged by subject, e.g. Physics or Showbiz. Each item gives different versions of a term, e.g.:
{ type = 'item', original = 'Electric field', rule = 'zh-tw:電位;zh-cn:电势;zh-hk:電勢' } ... { type = 'item', original = 'Lennon, John', rule = 'zh-cn:约翰·列侬;zh-tw:約翰·藍儂;zh-hk:約翰·連儂' }
Some other Wikipedia editions use LanguageConverter to convert scripts; however only Chinese uses it for vocabulary conversion. To the best of my knowledge, no other project attempts anything like this sort of vocabulary conversion for Chinese.
Wikipedia in other modern varieties of written Chinese
editAs stated earlier, Standard Written Chinese mostly uses the grammar and vocabulary of Mandarin, which means non-Mandarin speakers use different grammar and vocabulary in their speech than in standard writing. Alternatively, they can write down the exact words they would speak.
- This gives rise to Written Cantonese, Written Hakka, Written Wu etc
- Each of which is very different from Standard Written Chinese (and not readily intelligible to a Mandarin speaker).
- These written varieties are mostly used in informal contexts.
- Wikipedia is highly unusual as a collection of formal writing because it has editions in six of these written varieties besides Standard Written Chinese:
- Cantonese Wikipedia
- Min Nan Wikipedia
- Wu Wikipedia
- Hakka Wikipedia
- Gan Wikipedia
- Min Dong Wikipedia
Script variation: Hant, Hans, Romanization
editAside from Chinese characters, Latin letters can also be used to transcribe varieties of Chinese. This is called romanization. Min Nan Wikipedia, Hakka Wikipedia and Min Dong Wikipedia all use romanization instead of Chinese characters for article text.
Cantonese Wikipedia and Gan Wikipedia are primarily written in Hant script. Cantonese has client-side code to allow viewing or entering text in Hans script. Gan uses the same LanguageConverter functionality as Chinese Wikipedia (but only for script conversion, not vocabulary conversion).
Wuu Wikipedia is primarily written in Hans script.
Classical Chinese
editClassical Chinese was the standard written form of Chinese for many centuries until around 1920. In some ways Classical Chinese Wikipedia serves a similar role to Latin Wikipedia. By convention, Classical Chinese is written in Hant script, so there is no need for script conversion.