Talk:Interwiki sorting order
Proposal: Storing interwiki sorting at local system message
editAs some of you know i am writing an alternative implementation of the interwiki bot in java. Most interwikis bots are using the pywikipediabot framework and also AWB need the interwiki sorting order. I don't know if there are more interwiki frameworks, but they all need to know the interwiki sorting order for adding interwikis to pages. At the moment each framework has its own config file storing information about the interwiki sorting order for each wiki. If there is a new wiki all config files must be updated manually and it getting more complicated if an wiki want to change its order used. This takes some time even at the pywikipediabot framework. Sometimes its even complicated for the human developers not knowing a alphabet or the correct transliteration to identify the correct order.
I am talking about the interwiki sorting order at source code. I know that there are already some exentions to mediawiki changing the interwiki order on a rendered page. But this does not help tools working at the source code. The content of Interwiki sorting order can not really be processed by a parser. Thats why e.g. AWB uses en:Wikipedia:AutoWikiBrowser/IW. My suggestion is to store the information about the interwiki sorting order at the MediaWiki namespace of each wiki with some duplicated information sourced out to metawiki.
At the moment there are six general sorting orders:
- A By order of (latin) alphabet, based on language code
- B By order of (fy) alphabet, based on language code (with i=y)
- C By order of alphabet, based on local language
- D By order of alphabet, based on local language by first word
- E By order of latin alphabet, based on local language (by first word)
- F By order of roman alphabet, based on local language (by first word)
A and B can easily calculated by the tools itself and need not be stored somewhere i think. E and F are only used once, so this information could be stored completely on this wiki. I would like to offer to following procedure:
On each wiki there is a system message MediaWiki:Interwiki config/sorting order containing the sorting order for this wiki with one interwiki code per line. On srwiki sr:MediaWiki:Interwiki config/sorting order would look like:
ace af ak als am ang ab ar an arc roa-rup frp arz …
For the sorting order codes are read line by line with skipping duplicate codes (so the first position is always used).
- If this system message does not exist, sorting order A is used (e.g. dewiki or most other wikis).
- If expected codes are missing they are append at the end of the list using sort order A.
- This makes it possible for hewiki and huwiki to define e.g. he:MediaWiki:Interwiki config/sorting order containing only
en
- So wikis can move some language to the top without taking care about the rest.
Sorting order C and D could be stored at metawiki and a placeholder keyword starting with meta- is used in the local system message to reduce redundant informations. e.g if the keyword for C is meta-aphabet-local the sorting order could be found at MediaWiki:Interwiki config/sorting order/aphabet-local (with meta-* = meta:MediaWiki:Interwiki config/sorting order/*)
ace af ak als am ang ab ar an arc roa-rup frp as ast gn av …
- Then for enwiki the following code would to be added to en:MediaWiki:Interwiki config/sorting order
meta-aphabet-local
- For urwiki which whould like to have ar,fa and en on top it would be
ar fa en meta-aphabet-local
- Because of the condition skipping duplicated lines from above, you can simply replace the meta- keyword by the special list read from metawiki to ar fa en af ak als am ang ab ar an arc roa-rup frp as ast gn av ….
For the calculated orders A and B the keyword starts with general- (e.g. general-alphabet-code for A and general-alphabet-code-iy B) to distinguish them from code stored at meta (meta-) or used language codes.
The above description is very technical because its important that developer can read it without having any questions on some special cases. So its sounds much more complicated than it really is. For the local community it is getting easier to define there own sorting order and not all tool developers need to care about new wiki any more. In most cases only a meta admin has to add a new wiki to the system messages.
The initial work for setting up this system can be done by a global admin or sb. with editprotected right. Because most wikis use the default ordering this new system message must only be created on few wikis.
Quick migration for all existing tools is quite easy, because they can write a script which fetches the information from wikis and automatically creates to config file. The pywikipediabot project already uses such a script for autocreating the namespaces names at family files and AWB simple has to copy the sorting order from meta to en:Wikipedia:AutoWikiBrowser/IW. Also toolserver users could read the configuration from the replicated db servers. Later they could modify their framework to read the information live from wiki if they want, but i think having a computer readable config and then using an automated config file creation script would be already an improvement for all tools. Merlissimo 17:09, 31 March 2011 (UTC)
Discussion
edit- Support --Akkakk 17:41, 31 March 2011 (UTC)
- Support excellent idea. Seb az86556 17:59, 31 March 2011 (UTC)
- Support GameOn 05:55, 13 June 2011 (UTC)
- Support How soon can this be implemented? -- Lavallen 06:01, 13 June 2011 (UTC)
- I don't understand all the technical stuff here, but I would love to see something that
- has a single source - Interwiki sorting order and Wikipedia:AutoWikiBrowser/IW often define different sort orders, or include different language codes
- is simple to change when needed - Interwiki sorting order is a pain to edit (do the lines have to contain the same number of items?)
- Where is the definition of the displayed name of the language stored? At some stage the displayed name of language kbd seems to have been changed from Къэбэрдеибзэ (see Interwiki sorting order/table) to Адыгэбзэ (see top entry in the languages side bar of en:Abaza language, which links to the kbd wikipedia); but the sort orders had not been updated.
- It would be good if changes to the displayed name of the language also resulted in appropriate changes to the sort orders.
- —Coroboy (talk) 05:47, 15 August 2011 (UTC)
So this is now announced for half an year without any oppose now (and i informed all bot framework developers). I will request global editinterface right and create the system messages a suggested above. I think most programmiers will need some inital time for implementing this, so we'll keep this python list for some time (2-3 month?). Merlissimo 11:25, 4 November 2011 (UTC)
- I have implemented to config as provided at the old version to all wikis and added lez and shi. I'll add some examples during the next days. I hope the desciption is complete. But some details - i already described at the proposal above - can be pointed out by adding exmaples.
- But first i will implement this to my own bot. Then i can use the script as validator to show that no lancode is missing on any wiki. Merlissimo 23:42, 7 February 2012 (UTC)
- While implementing i recognize that its not possible the read system messages containing an "/" using the message module. So its only possible to read this as normal page content. Should we move the config sites to "-" (local config would be on MediaWiki:Interwiki config-sorting order and meta config e.g. on MediaWiki:Interwiki config-sorting order-native-languagename) or should i change the description for reading the last revision instead? If would prefer the first option, so developers still have both option. I announced to allmessages method because i though it would be easer because you don't have to care about existing last revisions. Merlissimo 15:41, 8 February 2012 (UTC)
- I have moved all config pages from slash to hyphen. Merlissimo 17:18, 8 February 2012 (UTC)
- I implemented the new config way to my bot and i could verify that all local configs are ok. Later i'll write a script checking periodically if all local configs are ok.
- I would also suggest to add a rule, that a local configs can simply be deleted by stewards if that was not maintained by local admins for some month. This won't be a problem soon, but perhaps in a few years if a local admin that added the config got inactive. Then this could be a problem for very small wikis who are using their own full sorting order (wikis only adding some top interwikis and are using an auto- oder meta-config for the rest aren't a problem).
- Currently only srwiki and svwiktionary are using its own full interwikis sorting rule. I'll write a scripts that notifies these local communities about missing codes after e.g. a langcom approval. Merlissimo 01:02, 9 February 2012 (UTC)
- I have moved all config pages from slash to hyphen. Merlissimo 17:18, 8 February 2012 (UTC)
- While implementing i recognize that its not possible the read system messages containing an "/" using the message module. So its only possible to read this as normal page content. Should we move the config sites to "-" (local config would be on MediaWiki:Interwiki config-sorting order and meta config e.g. on MediaWiki:Interwiki config-sorting order-native-languagename) or should i change the description for reading the last revision instead? If would prefer the first option, so developers still have both option. I announced to allmessages method because i though it would be easer because you don't have to care about existing last revisions. Merlissimo 15:41, 8 February 2012 (UTC)
2015 usage
edit
Current usage, generated by fetching all MediaWiki:Interwiki_config-sorting_order
and then using
for f in $(find interwikiconfig/ -type f -size +1c| sort); do echo $f; echo $(cat $f); echo; done | sed -e 's/interwikiconfig\///;s/^\([^w]\)/ \1/;s/^\(w[^/]*\/[^/]*\)\/.*/;\1:/;'
- test/test
auto-languagecode
- wikibooks/th
meta-native-languagename
- wikibooks/vi
meta-native-languagename-firstword
- wikipedia/be-x-old
meta-native-languagename
- wikipedia/en
meta-native-languagename
- wikipedia/et
meta-native-languagename-firstword
- wikipedia/fi
meta-native-languagename-firstword
- wikipedia/fiu-vro
meta-native-languagename-firstword
- wikipedia/fy
auto-languagecode-fy
- wikipedia/he
en auto-languagecode
- wikipedia/hu
en auto-languagecode
- wikipedia/ilo
meta-native-languagename
- wikipedia/mk
meta-native-languagename
- wikipedia/ms
meta-native-languagename-firstword
- wikipedia/nds
nds-nl auto-languagecode
- wikipedia/nds-nl
nds auto-languagecode
- wikipedia/nn
nb no sv da meta-native-languagename
- wikipedia/no
meta-native-languagename
- wikipedia/nv
en es meta-native-languagename
- wikipedia/pdc
de en auto-languagecode
- wikipedia/pl
meta-native-languagename
- wikipedia/simple
meta-native-languagename
- wikipedia/sr
ace kbd af ak als am ang ab ar an arc roa-rup frp arz as ast gn av ay az bjn id ms bg bm zh-min-nan nan map-bms jv su ba be be-x-old bh bcl bi bn bo bar bs bpy br bug bxr ca ceb ch cbk-zam sn tum ny cho chr co cy cv cs da dk pdc de nv dsb na dv dz mh et el eml en myv es eo ext eu ee fa hif fo fr fy ff fur ga gv sm gag gd gl gan ki glk got gu ha hak xal haw he hi ho hsb hr hy io ig ii ilo ia ie iu ik os xh zu is it ja ka kl kr pam krc csb kk kw rw ky mrj rn sw km kn ko kv kg ht ks ku kj lad lbe la ltg lv to lb lez lt lij li ln lo jbo lg lmo hu mk mg mt mi min cdo mwl ml mdf mo mn mr mus my mzn nah fj ne nl nds-nl cr new nap ce frr pih no nb nn nrm nov oc mhr or om ng hz uz pa pfl pag pap koi pi pcd pms nds pnb pl pt pnt ps aa kaa crh ty ksh ro rmy rm qu ru rue sa sah se sg sc sco sd stq st nso tn sq si scn simple ss sk sl cu szl so ckb srn sr sh fi sv ta shi tl kab roa-tara tt te tet th ti vi tg tokipona tp tpi chy ve tr tk tw tyv udm uk ur ug za vec vep vo fiu-vro wa vls war wo wuu ts xmf yi yo diq zea zh zh-tw zh-cn zh-classical zh-yue bat-smg
- wikipedia/sv
meta-native-languagename
- wikipedia/test
auto-languagecode
- wikipedia/te
en hi kn ta ml auto-languagecode
- wikipedia/th
meta-native-languagename
- wikipedia/ur
ar fa en meta-native-languagename
- wikipedia/vi
meta-native-languagename-firstword
- wikipedia/yi
en he de auto-languagecode
- wikiquote/th
meta-native-languagename
- wikisource/sv
meta-native-languagename
- wikisource/th
meta-native-languagename
- wikisource/vi
meta-native-languagename-firstword
- wikivoyage/he
en auto-languagecode
- wiktionary/en
meta-native-languagename
- wiktionary/pl
meta-native-languagename-firstword
- wiktionary/sv
aa af ak als an roa-rup ast gn ay az id ms bm zh-min-nan jv su mt bi bo bs br ca cs ch sn co za cy da de na mh et ang en es eo eu to fr fy fo ga gv sm gd gl hr io ia ie ik xh is zu it kl csb kw rw rn sw ky ku la lv lb lt li ln jbo hu mg mi mo my fj nah nl cr no nn hsb oc om ug uz nds pl pt ro rm qu sg sc st tn sq scn simple ss sk sl so sh fi sv tl tt vi tpi tr tw vo wa wo ts yo el av ab ba be bg mk mn ru sr tg uk kk hy yi he ur ar tk sd fa ha ps dv ks ne pi bh mr sa hi as bn pa pnb gu or ta te kn ml si th lo dz ka ti am chr iu km zh ja ko
- wiktionary/th
meta-native-languagename
- wiktionary/vi
meta-native-languagename-firstword
I'll try to make this prettier, as a table, and then post it on the page. John Vandenberg (talk) 02:06, 21 June 2015 (UTC)