Community Wishlist Survey 2019/Wiktionary/Multiple collations per site

Multiple collations per site

  • Problem: It is extremely common, on Wiktionary projects, to display entries of multiple languages on the same page. But, only one collation can be used on a particular Wikimedia project. That means: if a website uses a language-compliant collation, e.g. uca-default which is a English- and Portuguese-friendly collation, all categories concerning e.g. Swedish words, will sort words starting with Å under A, because Å is considered in English to be the same letter than A with a diacritic, while it is a whole new letter in Swedish (where it is sorted at the near end of the alphabet). Categories' headers are therefore incorrect for many languages with the current solution used on Wiktionary projects.
    Currently a way to circumvent the problem is to use the default Mediawiki collation (namely uppercase), but this implies that sort keys are added in all English/French/etc. entries with a diacritic in the title, as Å, É, etc., as all diacritic letters are considered as first-entry headers in categories, and this implies a huge amount of sort keys in pages to bypass this behavior (and thus sort Å under A for e.g. English), and makes Wiktionary projects less readable and editable for newcomers.
  • Who would benefit: users of Wiktionary categories, and new editors to all Wiktionary projects
  • Proposed solution: allow multiple collations per site, and therefore collation to be specified per category: uca-sv should be used for Swedish-related categories, uca-es for Spanish cats, uca-default for English (and similar), etc.
  • More comments: Liangent and Bawolff have been working on this in the past, but feasability seems also to depend on sysadmins (for increased system load).
  • Phabricator tickets: phab:T30397
  • Proposer: Automatik (talk) 12:18, 11 November 2018 (UTC)Reply[reply]