Community Wishlist Survey 2020/Wiktionary

Wiktionary
20 proposals, 143 contributors, 458 support votes
The survey has closed. Thanks for your participation :)



Multiple collations per site

  • Problem: It is extremely common, on Wiktionary projects, to display entries of multiple languages on the same page. But, only one collation can be used on a particular Wikimedia project. That means: if a website uses a language-compliant collation, e.g. uca-default which is a English- and Portuguese-friendly collation, all categories concerning e.g. Swedish words, will sort words starting with Å under A, because Å is considered in English to be the same letter than A with a diacritic, while it is a whole new letter in Swedish (where it is sorted at the near end of the alphabet). Categories' headers are therefore incorrect for many languages with the current solution used on Wiktionary projects.
    Currently a way to circumvent the problem is to use the default Mediawiki collation (namely uppercase), but this implies that sort keys are added in all English/French/etc. entries with a diacritic in the title, as Å, É, etc., as all diacritic letters are considered as first-entry headers in categories, and this implies a huge amount of sort keys in pages to bypass this behavior (and thus sort Å under A for e.g. English), and makes Wiktionary projects less readable and editable for newcomers.
  • Who would benefit: users of Wiktionary categories, and new editors to all Wiktionary projects
  • Proposed solution: allow multiple collations per site, and therefore collation to be specified per category: uca-sv should be used for Swedish-related categories, uca-es for Spanish cats, uca-default for English (and similar), etc.
  • More comments: Liangent and Bawolff have been working on this in the past, but feasability seems also to depend on sysadmins (for increased system load).
  • Phabricator tickets: phab:T30397
  • Proposer: Automatik (talk) 21:58, 23 October 2019 (UTC)[reply]

Discussion

  • This proposal is a rerun of the 2019 proposal, always topical. — Automatik (talk) 21:58, 23 October 2019 (UTC)[reply]
  • It's not up to me to decide (so this is not official in any way shape or form) but, in my opinion, I dont think there are scalability concerns with allowing collations to be set on a per category basis, provided any individual category only has one collation (e.g. there is a magic word to say that this category is french or german or whatever. You can only specify one, you dont for example have a drop down where you can view a category with different collations on the fly (like is wanted in zh)). Bawolff (talk) 04:24, 24 October 2019 (UTC)[reply]
  • This should be merged with Community Wishlist Survey 2020/Wiktionary/Context-dependent sort key. Urhixidur (talk) 14:13, 25 October 2019 (UTC)[reply]
  • This feature is sorely needed. Currently in the English Wiktionary we use a sort_key value for each language in our language data modules that describes how to generate sortkeys from page titles, and is used by the makeSortKey method of our Language objects. The sortkeys are generated inside many different templates, and are used in category links, and to sort lists of links to entries (for instance in Template:col3). The generated sortkeys are not always able to make categories sort correctly, as described in the proposal.

    An extension of this proposal would be allowing definition of custom collations. Some languages probably do not have a collation system (not sure of the correct terminology) available, such as Egyptian (which in Wiktionary mostly uses a transliteration system rather than hieroglyphs). The desired sort order for the Egyptian transliterations (ꜣ j y ꜥ w b p f m n r h ḥ ḫ ẖ z s š q k g t ṯ d ḏ) is so different from the order of code point values (b d f g h j k m n p q r s t w y z š ḏ ḥ ḫ ṯ ẖ ꜣ ꜥ), which is presumably used in Category:Egyptian lemmas, that a custom sortkey cannot work. We can sort lists of links by generating a sortkey for each link with a module (Module:egy-utilities), but the Egyptian module cannot be used in categories because the sortkeys would put nonsensical code points in the category headers. (The sortkey-generating function works by replacing the characters in the transliteration with arbitrary code points that have the correct sort order.) So getting Egyptian categories to sort correctly requires a custom collation system.

    Another idea would be to make collation available in a Lua library for Scribunto. At minimum what would be required is a function to compare two strings using a collation and yield values indicating "greater than", "less than", or "equal" (like strcmp in C), which could be adapted for use by table.sort (which requires a function returning a truthy value that indicates whether argument 1 is less than argument 2). Then we could sort lists of links using the same collations used in categories whenever possible, rather than using module-generated sortkeys. This might not depend on the implementation of the "multiple collations" proposal, but if custom collations were implemented, ideally they would be available in the Lua library.

    I haven't submitted the first idea as a separate proposal because it depends on the "multiple collations" proposal, but perhaps I should submit the second one. — Erutuon (talk) 20:53, 7 November 2019 (UTC)[reply]

Voting

What's in the newspaper today?

  • Problem: Wiktionarians can't detect every new used word in real time to include them as soon as they appear, although they are examples of use accessible online.
  • Who would benefit: Contributors and readers
  • Proposed solution: Development of a tool that harvests online newspapers to record words that are missing in Wiktionaries database.
  • More comments: This tool have to be adapted for each language and/or resource. Darkdadaah created a similar tool and had made it run from 2010 to 2013 for French.
  • Phabricator tickets:
  • Proposer: DaraDaraDara (talk) 14:54, 8 November 2019 (UTC)[reply]

Discussion

  • Harvesting newspapers is a great way to detect new words, and it helps to have selected sentences to add as examples, after some manual selection as some sentence are correct but too long or too much in need of the context. Also, a thematic labelling may help Wikinewsies and Wikipedians to find more sources. Noé (talk) 11:27, 15 November 2019 (UTC)[reply]
What about licenses of those newspapers? -Theklan (talk) 10:30, 22 November 2019 (UTC)[reply]
@Theklan: it does not matter. The same rule applies for the book. Here the idea is just to crawl all the newspaper everyday and to extract only the sentence with the new word. In that case, this is short citation and it is allowed to use it. See this page with French words for example. Pamputt (talk) 13:40, 23 November 2019 (UTC)[reply]
  • We already have "Wiktionary:Frequency lists" which is based on tv subs and have thousands of missing words in all languages including English. Newspapers will give you a lot of typos and game plays. Not needed as long as we have big lists of missing words. We can also use aspell lists to locate some more missing words. In Wikipedia there is a project named moss (under typo team) that offers thousands of missing words that are used in Wikipedia.Uziel302 (talk) 21:18, 25 November 2019 (UTC)[reply]
    @Uziel302: indeed we will have typo but they should be limited because here we parse newspaper (not blog and forum). And yes, there are alrealdy a lot of missing words but parsing newspaper will help to identify neologisms and then to create the missing entry. It is interesting because people can look for neologism more than rarer words. Pamputt (talk) 06:45, 26 November 2019 (UTC)[reply]

Voting

Wikicode variables usable in and between templates

  • Problem: Wiktionaries organize their pages using a section for each language. However, in each template of each language section, we must add the language code, which leads to errors and a lot of repetitions
  • Who would benefit: All projects needing to define a value and to use it several times. Also it would make contribution for beginners much easier.
  • Proposed solution: Be able to define variables, to assign and reassign values and to use them in templates.
  • More comments:
    • As an example, here's what I thought for French Wiktionary:
      1. Define and assign the language code to the variable lang at the beginning of the section.
      2. For each template which needs the language code, use the variable lang in the template's source code.
    • The templates of the French Wiktionary which would benefit from this proposition are:
      • the pronunciation templates pron, phon and phono
      • the stub templates such as ébauche, ébauche-étym, ébauche-exe, etc...
      • the domain and lexicon templates such as poissons, apiculture, mathématiques, etc...
  • Phabricator tickets:
  • Proposer: Lepticed7 (talk) 15:15, 10 November 2019 (UTC)[reply]

Discussion

  • Having this kind of variable or being able to identify a section of a page as describing a specific language could help a lot to deal with multilinguism. The challenge here is to not include the whole in a template as it will make it more difficult to edit, but to tag a segment where a language code is valid. It could clarify the wikicode a lot. Noé (talk) 15:39, 10 November 2019 (UTC)[reply]

Voting

Change color of the interwiki link regarding to the content of the sections in the page

  • Problem: Cognate is now deployed on all Wiktionaries to manage interwiki links. Currently, all links are blue. The suggestion is to color the interwiki link differently, if on the linked page, there is a section related to the current language.
Example :
I'm on the French Wiktionary, on the page "pain". There is an interwiki link to the English Wiktionary because a page "pain" also exists there.
On this en:pain, there are several sections for several definitions in several languages. There is a section about French word "pain".
Then, on the French page, the link will be blue.
Otherwise, if no French section exists on the English page, then on the French page, the link to the English page will be (for example) green.
  • Who would benefit: this informs the user that even if a page exist in another language, there is no section about his own language, then maybe he could go there and add it. It would encourage contribution and cross-wiki contribution.
  • Proposed solution: improve the Cognate extension or develop a new gadget.
  • More comments: a list with the formatting of all the Wiktionaries have already been built and is available here.
  • Phabricator tickets: T150841
  • Proposer: Pamputt (talk) 18:07, 7 November 2019 (UTC)[reply]

Discussion

  • Hi! I don't understand when the link is supposed to be blue. Is it only when there are, on both side, the same language sections? Lepticed7 (talk) 09:01, 11 November 2019 (UTC)[reply]
    Well, when there is at least a section of the based-language of the source version, i.e. in French Wiktionary, the link should be blue when French is described in the linked languages. If it isn't, red links may be used instead of a new color. Noé (talk) 11:54, 15 November 2019 (UTC)[reply]
This is an interesting functionality. If possible, we could also customize the user's preferred language, e.g. German, so a user editing the English/French Wiktionary would know if other Wiktionaries, e.g. Italian have the German word or not. KevinUp (talk) 11:54, 19 November 2019 (UTC)[reply]

Voting

Find more active users

  • Problem: Small user communities (Wiktionary as well as others) have a problem to find and welcome new users.
  • Who would benefit:
  • Proposed solution: Solve bug T234798 to improve Mediawiki's existing mechanism for finding "active" users (with 1 contribution in the last 30 days) to add a threshold for at least some number N contributions.
  • More comments: When created in early October 2019, T234798 was categorized as a low priority. Hopefully, this wishlist survey could change that.
  • Phabricator tickets: phab:T234798
  • Proposer: LA2 (talk) 20:57, 23 October 2019 (UTC)[reply]

Discussion

  • This is definitely a bug (and has been logged as such), but I'm curious as to how we could use the function for the purposes of the project. Could it be used to encourage further editing, for instance? Yannis | 13:09, 27 October 2019 (UTC)[reply]
It's not a bug, but a feature request. Today, when I look at the list of "active" users on Swedish Wiktionary, I get a list of 79 people who made at least one edit in the last 30 days. But many of these have done only one or two edits as part of vandalism. I want to find the people who made at least 10 edits to find newcomers that should be welcomed. It is possible to scroll through the 79 names and pick those with at least 10 edits. But the software could provide that filtering. --LA2 (talk) 20:18, 27 October 2019 (UTC)[reply]

Voting

Insert attestation using Wikisource as a corpus

  • Problem: Wiktionaries definitions relies on attestations, sentences from corpora illustrating the usages and meanings of words. Wikisource is an excellent corpus for Wiktionaries, especially for classic uses, but it is uneasy to search into the texts for a specific word. Now, the reference of the sentence had to be copy/paste by hand and it's a long and unfunny way to contribute, the result being few quotation from Wikisources (less than 3 % for French Wiktionary).
  • Who would benefit: Readers of Wiktionaries would find more examples of usages and a way to access the whole source directly in Wikisource. Contributors of Wiktionaries would have a fancy and enjoyable way to add attestations, similarly as Insert media tool that dig into Wikimedia Commons, and the community may grow with new people that like to add sentences from their readings. Editors of Wikisource would have a new way to shed light on their sisyphean work. Both projects visibility would increase in search engines with more links between them. The global audience of both projects may increase with more connectivity. Also, other projects may benefit from this feature, such as Wikipedia to add quotations in authors' pages.
  • Proposed solution: This feature is inspired by Insert media but targeting Wikisource instead of Wikimedia Commons. So, instead of an snippet search offering pictures, Insert attestation would display a list of sentences from a targeted Wikisource (could be same language or other than the source project) that include the targeted sequence of characters. Their is no meaning requirement nor proximity, it is exact results only to keep it simple. In the displayed snippet of results, an editor would just grab a sentence with a single click and it will be added with the adequate sources picked from Wikidata associated with the Wikisource page. The feature would copy the sentence (no transclusion) and the source of the sentence (adding the information for the number of the page in the original manuscript optimally, i.e. "page 35."). This feature may need a specific parser to identify limits of sentences and to bold the targeted sequence of characters.
  • More comments: This feature/tool/functionality should be accessible through WikiText editor and VisualEditor. It may be interesting to keep track of the reuses of Wikisource content in other project with a specific What's link here from Wiktionary to Wikisource, similarly as Wikimedia Commons indication of reuses in others projects, but this could be part of another development. This idea was suggested in 2018 with 36 supports, in 2017 and supported by 32 people, a draft was suggested in 2016 with 19 supports and this idea was coined first in a MediaWiki discussion.
  • Phabricator tickets: T139152, T157802
  • Proposer: Noé (talk) 07:33, 22 October 2019 (UTC)[reply]

Discussion

  • My idea is - page Foo (1), Click on something will run search in wikisource for sentences conatining word foo (2). Then editor must chceck, if this word is in correct context/sense and select part for copy to some input field (3). Sometimes some corrections are needed (…), shorten long sentence, add missing subject from previous sentence... Then click to OK - and there will be example (4) with reference.
    1. I have word cs:wikt:pitel,
    2. Search on Wikisource gives me some examples
    3. I select one of them - sentence Od dávna trvající věrný pitel vína dobrého. from Paměti
    4. I got #* {{Příklad|cs|Od dávna trvající věrný pitel vína dobrého.}}<ref>Mikuláš Dačický z Heslova: [[s:Paměti/1601–1605|Paměti]]</ref> for copying to Wiktionary.

JAn Dudík (talk) 20:52, 7 November 2019 (UTC)[reply]

  • I agree with your description, but I also think it could be done in VisualEditor without even see any wikicode. It could be user-friendly and easily accessible for new user, like "Add translation" in some wiktionaries. Like "Insert Media", very easy to use in Wiktionary. - Noé (talk) 16:47, 8 November 2019 (UTC)[reply]
    But because Wiktionary pages are mostly from various templates, VE is hardly usable in Wiktionary [2]. JAn Dudík (talk) 10:06, 10 November 2019 (UTC)[reply]
    French Wiktionary use it. It imply to document every templates with TemplateData and still, it adds several unnecessary line break, but it is possible to use it Noé (talk) 11:00, 11 November 2019 (UTC)[reply]

Voting

Create memory games for words in watchlist

  • Problem: The watchlist has a fairly limited function.
  • Who would benefit: Those who seek to improve word recollection of "favorited" entries.
  • Proposed solution: Redirect the watchlist to another site that draws out from the list and randomly selects words for memory games.
  • More comments:
  • Phabricator tickets:
  • Proposer: Clicero (talk) 21:28, 24 October 2019 (UTC)[reply]

Discussion

@Clicero: it looks like a 2019 proposal. Would this proposal fits your need (IMHO, the 2019 proposal is broader)? If so, I will update your proposal to take into account information given last year. Pamputt (talk) 17:11, 1 November 2019 (UTC)[reply]
    • Yes, but as you said that proposal seems a bit broader, so I'd like to add that it would be nice if memory games that randomize words from the watchlist would be included. Clicero (talk) 18:34, 1 November 2019 (UTC)[reply]

@Clicero: the English name for this is flashcards, perhaps people will understand you better if you use it. MaxSem (WMF) (talk) 19:25, 8 November 2019 (UTC)[reply]

Voting

Display definitions from Wikisource dictionaries

  • Problem: Wiktionaries offer some definitions but there are many ways to describe a meaning, and the actual wiktionary interface doesn't make it easy to display definitions from other dictionaries. Some are mentioned as references but they are not accessible in Wiktionary.
  • Who would benefit: Wiktionary readers
  • Proposed solution: Many dictionaries are already in Wikisource and we can use them to offer more definitions. A dedicated transclusion may help harvesting automatically entries with a specific tagging in the dictionaries hosted in Wikisources. They could come from several Wikisources, to be display in several Wiktionaries. It could be a new tab next to "Article" and "Talk", named "Dictionaries".
  • More comments: Some dictionaries are already properly tagged; for the others, it could be a good opportunity to do it, so that they can more easily be reused in open source projects.
  • Phabricator tickets: T240191
  • Proposer: DaraDaraDara (talk) 14:32, 8 November 2019 (UTC)[reply]

Discussion

Voting

More Lua memory for Wiktionary

  • Problem: Lack of Lua memory for basic words. See wikt:CAT:E for currently affected words.
  • Who would benefit: All users and readers of Wiktionary.
  • Proposed solution: More Lua memory for Wiktionary.
  • More comments:
    • Pages that lack memory are not being properly categorized and the sortkey is not working properly.
    • Standard information such as citations, semantically related terms are being removed as a temporary solution and this is a loss of information for our readers.
  • Phabricator tickets: phab:T188492
  • Proposer: KevinUp (talk) 16:20, 11 November 2019 (UTC)[reply]

Discussion

Alternatively, consider implementing a tool so that the source of each language can have its own separate page, like how all the proposals have its own individual page. KevinUp (talk) 16:20, 11 November 2019 (UTC)[reply]

The most parsimonious solution is to raise the cap on Lua memory. This cap seems arbitrarily placed, and it is crippling for long pages. Metaknowledge (talk) 18:04, 11 November 2019 (UTC)[reply]
Indeed. If all were like in a company we would already have more memory, because the time spent to circumvent the cap is dearer than the trifling amount of more memory that in total would be used (since it concerns but some dozens of pages, for which much dust has been raised). Fay Freak (talk) 21:52, 15 November 2019 (UTC)[reply]
@Noé: Hi. Does the French Wiktionary have the same issue (lack of Lua memory) for entries with short words? I think if we were to migrate information from English Wiktionary to French Wiktionary, the same situation would occur. KevinUp (talk) 23:32, 16 November 2019 (UTC)[reply]
Local community discussion regarding lack of Lua memory can be found here, here and here. If only the Community Tech team would be graceful enough to inform us the actual memory that is needed by wikt:do, wikt:一, wikt:人, wikt:水, wikt:月, wikt:生, wikt:我 which are basic words. KevinUp (talk) 20:15, 18 November 2019 (UTC)[reply]
@Pamputt: Hi. Does the French Wiktionary currently have issues with lack of Lua memory? Do you think the same issue would occur if information from English Wiktionary for entries such as wikt:do, wikt:一, wikt:人, wikt:水 were copied to the French Wiktionary? KevinUp (talk) 11:56, 19 November 2019 (UTC)[reply]
@KevinUp: actually we do not use that much Lua module on the French Wiktionary so I am not aware of such limitations. Yet, maybe JackPotte or Darkdadaah can say more. Pamputt (talk) 18:50, 19 November 2019 (UTC)[reply]
We do use Lua modules quite a bit, although not as much as the English Wiktionary. We don't have as much metadata (languages in particular) and automated content, so I believe we are not at the point where memory is an issue... yet. Our main issue at the moment I believe may be for pages with lots of translations, as in this demonstration page which purports to list the word for "water" in all languages: wikt:fr:Utilisateur:Pamputt/eau. In that case though the limit is execution time (>10s).
For the memory issue, it would be nice to have an idea of what takes so much memory, so that we can make an informed decision on how much memory will be needed. Darkdadaah (talk) 16:12, 21 November 2019 (UTC)[reply]
Thanks for the reply. For translations, some entries in English Wiktionary use wikt:Category:Translation subpages to redirect content and reduce Lua memory. Yes, it would be nice if we knew how much memory is actually needed by pages in wikt:Category:Pages with module errors. KevinUp (talk) 07:15, 22 November 2019 (UTC)[reply]
@Lo Ximiendo: Does the Chinese Wiktionary use a lot of Lua memory as well? I noticed that it uses similar modules from English Wiktionary. KevinUp (talk) 05:13, 30 November 2019 (UTC)[reply]
@KevinUp: All I know is, that the Lua modules in the Chinese were imported from the English Wiktionary, with modifications made of course. --Lo Ximiendo (talk) 05:29, 30 November 2019 (UTC)[reply]
There's also a phabricator task for better memory profiling support, which would allow us to do targeted optimizations instead of blind guesswork. – Jberkel (talk) 08:28, 2 December 2019 (UTC)[reply]

Voting

Fix the AddAudio script to upload audio to Commons seamlessly

  • Problem: A great user script that easily allowed users to record audio recordings of words on Wiktionary and upload them to Commons is broken. I have posted to phab:T206942, and for assistance at Commons and Wiktionary to no avail.
  • Who would benefit: Anyone who wants to hear the audio at Wiktionary, particularly for blind users or those who are learning about foreign languages.
  • Proposed solution: My understanding is that there is a problem with the user script interacting with Commons' API. The script has already been made, so it's not starting from scratch, but tweaking how it saves files at Commons.
  • More comments:
  • Phabricator tickets: phab:T206942
  • Proposer: —Justin (koavf)TCM 19:49, 2 November 2019 (UTC)[reply]

Discussion

Voting

Two options for displaying categories

  • Problem: In dictionaries, words are normally displayed in alphabetical order. In wiktionaries, when a category is displayed, only a part of words of the category are displayed in alphabetical order, those in subcategories are not displayed. It is often useful to see all words of the category.
  • Who would benefit: Everyone using categories.
  • Proposed solution: When displaying a category, add a button to be used when the user wants all words, including words in subcategories (and subsubcategories, etc.)
  • More comments:
  • Phabricator tickets:
  • Proposer: Lmaltier (talk) 20:49, 8 November 2019 (UTC)[reply]

Discussion

Voting

Lexicographic knowledge as a service

  • Problem: There is very little reuse of Wiktionary contents because the dumps doesn't fit the needs.
  • Who would benefit: Reusers
  • Proposed solution: Having more export formats, including set-ups to select a set of specific languages or a combinaison languages, or a set of 50k more viewed pages. DICT format, TEI Lex0 format, xml (based on GLAWI) and RDF (based on Dbnary) and so on.
  • More comments: Dbnary could be used as the source data, and the other formats derived from it.
  • Phabricator tickets:
  • Proposer: Lyokoï (talk) 16:06, 10 November 2019 (UTC)[reply]

Discussion

  • This proposal may be connected with Wiktionary in e-readers and e-book apps and to the development of a Wiktionary mobile app, as they may need a specific format. Kiwix could also be associate to this perspective. Noé (talk) 11:33, 15 November 2019 (UTC)[reply]
  • Lyokoï, thank you for submitting this proposal! It is unfortunately out of scope, so we’ll be unable to take it on. For this work to be possible, structured Wiktionary would need to first be developed. Hopefully, this is something that can be done in the future. Thanks again, and we apologize for any disappointment. --IFried (WMF) (talk) 00:23, 20 November 2019 (UTC)[reply]
    IFried (WMF) What is a « structured Wiktionary » for you ? Wiktionary is already structured, It is one of the most structured project of the mouvement ! GLAWI, DBNARY is very powerful thank to our structure, and I have see a guy (Arkanosis) who generate in 10 min a dictionary of synonym with the french Wiktionary, in a few script. Can you be more explicit about that you need to do for that kind of export ? Lyokoï (talk) 00:54, 20 November 2019 (UTC)[reply]
    @Lyokoï: By "structured Wiktionary" I believe we mean one that is built on top of Wikibase. Currently the format of Wiktionary varies by project, since it is all stored as wikitext. I think we could make an export tool for your wiki, but making one that works for everyone may prove to be challenging if not impossible. Would you like to adjust this proposal to be just about French Wiktionary? If so I will run this by the team and we'll try to get this back into the survey before the voting starts tomorrow. MusikAnimal (WMF) (talk) 03:08, 20 November 2019 (UTC)[reply]
    Dbnary had made the mapping for 20 wiktionaries, so it is not impossible. Including Dnary code is open, the creator is easy to contact and the model is Lemon Ontolex, the same as Wikidata Lexeme project. There is no need for a Wikibase or any other formalism. I understand that you may not have the specific competences in RDF formatting in your team, but I don't consider it is out of the scope. Doing it for one version is not acceptable, as the need is real for each projects, and already expressed by other in the past years, like in Community Wishlist Survey 2017/Wiktionary/Parse dumps for DICT clients (17 supports). Finally, this proposal seems very much aligned with the 2030 strategy. Noé (talk) 06:46, 20 November 2019 (UTC)[reply]
    IFried (WMF) & MusikAnimal (WMF), I support the response of Noé, Wiktionaries doesn’t need to have a Wikibase to be structured. I admit their structures are not absolutely identical, but it’s for the most case just a différent order of sections. They are same granularity of information. I think it is very important for middle sized wiktionaries to be easily reused, it is really encouraging ! Lyokoï (talk) 12:26, 20 November 2019 (UTC)[reply]
    @Lyokoï and Noé: Would it make sense to build a tool that allows you to export data from Dbnary to the other formats you've listed? That seems doable. Otherwise, creating and maintaining something that parses unpredictable wikitext seems like it won't scale. I will admit most of us aren't too well-versed with Wiktionaries, but just comparing English vs. French, I see the latter is more template-heavy and in a somewhat different layout, and this is what gives us pause. I apologize that we're running out of time... Voting starts in one hour! So let me know if working off of Dbnary will work, and if so we will unarchive. Thanks, MusikAnimal (WMF) (talk) 17:00, 20 November 2019 (UTC)[reply]
    @MusikAnimal (WMF): building a tool on Dbnary sounds good for me! Noé (talk) 21:40, 20 November 2019 (UTC)[reply]
    @MusikAnimal (WMF): For me too! --Lyokoï (talk) 11:45, 21 November 2019 (UTC)[reply]
    Great, I've slightly modified the proposal to make this clear, and I am moving it back into the survey. Best, MusikAnimal (WMF) (talk) 16:53, 21 November 2019 (UTC)[reply]
  • I suggest to change the title of the proposal to match the new content. It's not clear whether the current proposal is to have some kind of service as a software substitute or what. The proposal is to have a web API to Dbnary data, correct? Nemo 09:28, 22 November 2019 (UTC)[reply]
    Proposals have to be problems not a specific solution, to let the Community Tech team work and find the better way to solve this problem. Here, the problem is how to offer more format on Wiktionaries exports. One suggestion is to build something on top of Dbnary transformation, as a big part of the job of alignment between wiktionaries is already made by this project. So, it is not an API that dig directly into Dbnary. Plus, this is something that already exist in Dbnary website. Noé (talk) 06:51, 27 November 2019 (UTC)[reply]

Voting

Sections reorder tools

  • Problem: When I work on very long page containing a lot of section like thesaurus, it is not easy to modify the section order to improve the page layout. It is necessary to edit all the page. During the operation, everybody can edit a section and create an edition conflict.
  • Who would benefit: contributors
  • Proposed solution: it will be interesting to have in graphic modification mode 2 buttons on each section (h2, h3, h4...): up and down. The advantage is a simple method to improve page layout and allow to limit the entry in history to a simple comment like: section reorder.
  • More comments:
  • Phabricator tickets:
  • Proposer: Jpgibert (talk) 20:30, 10 November 2019 (UTC)[reply]

Discussion

  • Jpgibert, when you say "graphic modification", do you mean VisualEditor? Noé (talk) 14:12, 15 November 2019 (UTC)[reply]
    Yes, Noé. I forgot the suitable term when I wrote. Thanks for your help. Jpgibert (talk) 14:40, 15 November 2019 (UTC)[reply]
    You're welcome. Then, consider French Wiktionary is the only Wiktionary that offer VisualEditor, for the other wiktionaries, they don't use it. So, your suggestion has to be tied with the proposal to adapt VisualEditor to Wiktionaries.
    translation in French: Bon en fait, on est tous les deux francophones, donc : de rien. En fait, le Wiktionnaire est la seule version de Wiktionary qui utilise l'éditeur visuel, les autres versions ne l'utilisent pas. Du coup, ta suggestion devrait être rattachée ou connectée à celle proposant de construire un vrai éditeur visuel dédié aux Wiktionnaires. Noé (talk) 15:24, 15 November 2019 (UTC)[reply]

Voting

Adopt Lingua Libre Bot service as a tool

  • Problem: Lingua Libre is a great service to record pronunciation of words and, now, Lexemes at Wikidata. When you record them they're uploaded to Commons and via Lingua Libre Bot they are added to the corresponding word/lexeme. But this bot is mantained by a volunteer, and it seems that sometimes it can be stopped for weeks. This service should be adopted as a WMF Tool and make it more stable and not dependant from an user disponibility.
  • Who would benefit: People wanting to record pronunciations
  • Proposed solution: Add it to Toolserver and run it independently.
  • More comments:
  • Phabricator tickets:
  • Proposer: Theklan (talk) 11:01, 27 October 2019 (UTC)[reply]

Discussion

IMHO, the problem is more general. On a long-term period, LinguaLibreBot and Lingua Libre website itself has to be maintained to add more feature and to fix bugs. I think Wimedia France starts to think about this problem (Eavqwiki may comment more). Pamputt (talk) 17:20, 1 November 2019 (UTC)[reply]

Allready an grant, although the solution is different. Will use OAuth, so no bot will be needed. On the talk page, the developer writes: "the files will be transferred from LinguaLibre to Commons using an OAuth authorization". So, in the future an bot will not be used and this issue will be fixed in another way than the proposee here accounted for. (Grants have a lifetime of 12 months)--Snaevar (talk) 12:37, 11 November 2019 (UTC)[reply]

@User:Snaevar The project supported by the grant has been implemented: the recordings are automatically uploaded through OAuth on Commons as Theklan mentioned, when it used to be up to the developer to transfer them periodically from the beta version of lingualibre that didn't use OAuth. Even with OAuth though, the bots are needed to upload the recordings to specific places on other projects - not just Commons -, that's where Theklan suggests it'd be useful to have the tool internal to Wikimedia to internalise both the recording process and the "storing" process. Correct? I think that having a separate Lingua Libre website and recorder is useful because that's where contributors without a Wikimedia account can also record (using a wikimedian's account for now but hopefully not anymore in the future). It's easier to explain the use and value of audiovisual content for languages' vitality to a greater number of speakers on Lingua Libre, all the while uploading the ultimate content on Commons - as WikiTongues does for example. Wikimedians on lingualibre could then validate non Wikimedia speakers' content and upload it to Commons using OAuth and the bots applicable as they do now for their own content. But for wikimedians strictly, yes, they could do with an in-built record wizard on the projects to decrease the reliance on bots to upload 100% of recordings, and the developer maintaining them and the record wizard. Eavqwiki (talk) 11:05, 22 November 2019 (UTC)[reply]

Seems like some overlap/redundancy with my proposal here: Community Wishlist Survey 2020/Wiktionary/Fix the AddAudio script to upload audio to Commons seamlessly. I don't have strong feelings on which tool/script is the best one. —Justin (koavf)TCM 21:00, 25 November 2019 (UTC)[reply]

Voting


Allow searching using ^ and $ anchors at least for intitle: searches

  • Problem: For some reasons, CirrusSearch doesn't support ^ and $ anchors (It is not possible to search for strings beginning/ending with some sequence.) While I understand it's probably not needed for whole document searches, it would find practical uses in the context of intitle: searches.
  • Who would benefit: On Wiktionary it would be possible to search for words (entries) that start with a specific prefix / end with a specific suffix. But clearly there would be many other uses outside Wiktionary.
  • Proposed solution:
  • More comments:
  • Phabricator tickets:
  • Proposer: Zabavuju flašku chlastu maskovanou jako zubní pastu (talk) 15:29, 25 October 2019 (UTC)[reply]

Discussion

Voting

Add breadcrumb/breadcrumb trail (graphical control element)

Discussion

Voting

Context-dependent sort key

  • Problem: In most Wiktionary projects, words of different languages share a page if their spellings are identical. Currently, the magic word DEFAULTSORT works for an entire page, which means we cannot define a default sort key for each language in the same page. That is an issue especially for Chinese, Japanese and Korean (hanja). They share characters but their sort keys are totally different (radicals or pinyin for Chinese, kana for Japanese, hangeul for Korean). If it is allowed to define a default sort key for each section, it will be much easier to correctly categorize pages.
  • Who would benefit: Editors of Wiktionary, especially those who edit Chinese and Japanese entries.
  • Proposed solution: Introduction of a new magic word, say, SECTIONSORT, that works for all categories after it up to the next usage of the same magic word. SECTIONSORT should override DEFAULTSORT if both are defined. The use of SECTIONSORT without a sort key should clear the previous sort key (and should not define an empty sort key).
  • More comments: see Community Wishlist Survey 2017/Wiktionary/Context-dependent sort key for a discussion in 2017. It is still a problem.
  • Phabricator tickets: phab:T183747
  • Proposer: TAKASUGI Shinji (talk) 12:19, 11 November 2018 (UTC)[reply]

Discussion

How it will be visible in category? Sections can't be added to category. --Wargo (talk) 21:48, 16 November 2018 (UTC)[reply]

Currently, one adds a sort key to an entire page. The goal of this proposal is to allow more than on sort key per page: one per section; e.g. one sort key for the Chinese section of , one sort key for the Japanese section of the same entry, etc. This is because a same word may not be sorted the same way in different languages, and Wiktionaries often have entries from multiple languages in the same page, as a page corresponds to a specific spelling (which may occurs in multiple languages). — Automatik (talk) 14:06, 20 November 2018 (UTC)[reply]
Notifying WargoAutomatik (talk) 14:07, 20 November 2018 (UTC)[reply]

See also my somewhat related proposal (I keep missing the deadline) Community Wishlist Survey 2017/Archive/Allow multiple entries within each category. Urhixidur (talk) 13:30, 17 November 2018 (UTC)[reply]

  • I've been thinking a bit about this. The problem here is that you have multiple types (languages) of content inside a single page, with a single title. The page https://en.wiktionary.org/wiki/日本#References for instance (quoted as an example in the ticket) is English. And therefor all categorisation of the page is based on the English title of the page (even though the title is not in the english language). This is a fundamental problem (a mismatch to the wikipage concepts). It really means that the entire system should be changed to make use of MCR and specialised MW contenthandlers, so that more semantic info can be extracted out of the page. (Like how wikidata deals with different types of information in a single page). And then on top of that, you could have a Category be in a certain language, and the category could use the correct sort key for a page, by referring to the information of the applicable 'language section' inside the Page. —TheDJ (talkcontribs) 11:25, 6 November 2019 (UTC)[reply]
    • To further clarify, the community has laid meaning (a convention) into some of the content, which MW cannot contain for them. When you want software features that makes use of those meanings, that meaning first has to be machine extractible (at scale) before we can do things with it that are not; 'a simple wiki page that complies with the assumptions of the original wikipedia' —TheDJ (talkcontribs) 11:28, 6 November 2019 (UTC)[reply]
      If I got your idea right, you are saying that "Page content language" in Page information should be able to deal with more than one language, through a specific tagging in the page or by using a template use for language section title. Then, the ordering for each language could be fixed in MediaWiki. I think this is another way to solve the same issue, and maybe a more MediaWiki-centered one. Noé (talk) 10:05, 9 November 2019 (UTC)[reply]
      This is not what I understand. For en.wikt, the "Page content language" is always English (for apple as well as for pomme or Apfel), for fr.wikt, it's always French, etc. Anyway, there is no such issue with the "multiple collations" proposal. Lmaltier (talk) 13:57, 10 November 2019 (UTC)[reply]
  • This proposal seems to become useless if the "Multiple collations per site" proposal is adopted (i.e. a magic word stating the language for each category). Or do I miss something? Lmaltier (talk) 20:27, 8 November 2019 (UTC)[reply]
    It is mainly for Japanese and optionally for Chinese and Korean (hanja). You cannot generate a correct sortkey for each language in a page of Chinese characters. In the example above, the correct sortkey for 日本 is “にほん” for Japanese and “일본” for Korean. You can have only one default sortkey now. — TAKASUGI Shinji (talk) 23:08, 10 November 2019 (UTC)[reply]
    That concerns far more languages than japanese or chinese and Korean. For exemple, Ásia shouldn't get the same sort key in Portugese and in Northern sami. Unsui (talk) 13:36, 21 November 2019 (UTC)[reply]
  • Using the "multiple collations" proposal together with a language-dependent sortkey seems to me a more correct solution than a context-dependent sortkey. (This would probably require a magic word that contains a language code and a sortkey, that specifies the sortkey to be used in categories of that language. I recall reading a discussion about such a magic word on Phabricator, but can't find where that was.) Then instead of specifying a section-dependent sortkey you would specify a language-dependent sortkey: the sortkey for Japanese categories (such as "Japanese nouns"), for Korean categories, for Chinese categories.

    This is more correct, I think, because the sortkey actually depends on the language of the category. It's only by convention or because of practical considerations (for instance, that headword-line templates include category links in them) that categories for a given language are added in that language's section.

    If categories are added by mistake to the wrong section (usually at the bottom of the page), a context-dependent sortkey would be applied to the wrong category, whereas with a language-dependent sortkey, the wrong sortkey would be applied if the category is classified under the wrong language or under no language.

    TAKASUGI Shinji, do you think that the language-dependent sortkey idea would work as a solution? I could be missing some details about how CJK(V) entries work. Erutuon (talk) 20:23, 1 December 2019 (UTC)[reply]

@Erutuon: unfortunately it doesn’t work for Japanese or Chinese. The sort key for 日本 in Japanese is にほん while that for Mandarin is Rìběn (or 日00木01 based on strokes, depending on each Wiktionary project policy). It is impossible to generate correct sort keys algorithmically for all the entries. You need to give a sort key manually in some cases, but you can’t have two default keys in a page now. We need both of the two proposals. — TAKASUGI Shinji (talk) 23:59, 1 December 2019 (UTC)[reply]
Hmm, it doesn't sound like a problem for the language-specific sortkey idea. In the case that you describe, there could be a magic word like {{LANGUAGESORT:cmn|Rìběn}} or {{LANGUAGESORT:cmn|日00木01}} in the 日本 entry to set the sortkey for all Mandarin categories and {{LANGUAGESORT:ja|にほん}} to set the sortkey for all Japanese categories. There would have to be another magic word on the pages for the Mandarin and Japanese categories (for instance Category:Mandarin lemmas, Category:Japanese lemmas) to indicate the language, like {{LANGCAT:cmn}} and {{LANGCAT:ja}}. Then, when there are multiple Mandarin categories in the Chinese section (and multiple Cantonese, Wu), as is true in the English Wiktionary, the sortkey doesn't have to be specified for each category link, and the categories for each Chinese language do not have to use the same sortkey even though they are in the same section. This actually is different from the multiple collation proposal, but it might be compatible with it. Erutuon (talk) 00:55, 10 December 2019 (UTC)[reply]

Voting

Statistics for Wiktionaries

  • Problem: : Mediawiki statistics are not adapted with Wiktionary projects. We don't need pages quantity but lemma quantity, examples quantity, definitions with illustrations, quantity of thesauri, etc.
  • Who would benefit: People who want to communicate about Wiktionary, Contributors
  • Proposed solution: Having better metrics, such as the one we have in French Wiktionary, for examples count, pictures count, quantity of nouns, adjectives, etc., how many people have contributed to thesaurus in the past months and so one.
  • More comments:
  • Phabricator tickets:
  • Proposer: Lyokoï (talk) 16:04, 10 November 2019 (UTC)[reply]

Discussion

Voting

Provide full translatability of the Cognate dashboard

  • Problem: Currently, the Cognate dashboard interface can be translated manually. The community can provide a translation on wiki, that can be added by the developer on the dashboard.
Two problems exist:
  • the translation is not automated, an action has to be done by the developer to integrate or modify a translation
  • not all parts of the interface are translatable: the "dynamic" parts, such as column titles or action buttons, remain in English
  • Who would benefit: all users using the dashboard and who are not comfortable with English
  • Proposed solution: develop the Cognate dashboard to make it translatable
  • More comments:
  • Phabricator tickets: T202613
  • Proposer: Pamputt (talk) 18:11, 7 November 2019 (UTC)[reply]

Discussion

Voting

Search in a lexicon

  • Problem: Search engine is not made for dictionary needs.
  • Who would benefit: Contributors and readers
  • Proposed solution: Having an internal anagram and advanced search such as the one we have in French Wiktionary, to find anagrams but also words with a specific sound or with a sequence of letter and a grammatical class for example. It could be based on a parsed dump to have part of speech distinction as well (only in nouns, only in verbs, etc.).
  • More comments:
  • Phabricator tickets:
  • Proposer: Lyokoï (talk) 16:07, 10 November 2019 (UTC)[reply]

Discussion

Voting