Community Wishlist Survey 2020/Wiktionary/Lexicographic knowledge as a service

Lexicographic knowledge as a service

  • Problem: There is very little reuse of Wiktionary contents because the dumps doesn't fit the needs.
  • Who would benefit: Reusers
  • Proposed solution: Having more export formats, including set-ups to select a set of specific languages or a combinaison languages, or a set of 50k more viewed pages. DICT format, TEI Lex0 format, xml (based on GLAWI) and RDF (based on Dbnary) and so on.
  • More comments: Dbnary could be used as the source data, and the other formats derived from it.
  • Phabricator tickets:
  • Proposer: Lyokoï (talk) 16:06, 10 November 2019 (UTC)[reply]

Discussion

  • This proposal may be connected with Wiktionary in e-readers and e-book apps and to the development of a Wiktionary mobile app, as they may need a specific format. Kiwix could also be associate to this perspective. Noé (talk) 11:33, 15 November 2019 (UTC)[reply]
  • Lyokoï, thank you for submitting this proposal! It is unfortunately out of scope, so we’ll be unable to take it on. For this work to be possible, structured Wiktionary would need to first be developed. Hopefully, this is something that can be done in the future. Thanks again, and we apologize for any disappointment. --IFried (WMF) (talk) 00:23, 20 November 2019 (UTC)[reply]
    IFried (WMF) What is a « structured Wiktionary » for you ? Wiktionary is already structured, It is one of the most structured project of the mouvement ! GLAWI, DBNARY is very powerful thank to our structure, and I have see a guy (Arkanosis) who generate in 10 min a dictionary of synonym with the french Wiktionary, in a few script. Can you be more explicit about that you need to do for that kind of export ? Lyokoï (talk) 00:54, 20 November 2019 (UTC)[reply]
    @Lyokoï: By "structured Wiktionary" I believe we mean one that is built on top of Wikibase. Currently the format of Wiktionary varies by project, since it is all stored as wikitext. I think we could make an export tool for your wiki, but making one that works for everyone may prove to be challenging if not impossible. Would you like to adjust this proposal to be just about French Wiktionary? If so I will run this by the team and we'll try to get this back into the survey before the voting starts tomorrow. MusikAnimal (WMF) (talk) 03:08, 20 November 2019 (UTC)[reply]
    Dbnary had made the mapping for 20 wiktionaries, so it is not impossible. Including Dnary code is open, the creator is easy to contact and the model is Lemon Ontolex, the same as Wikidata Lexeme project. There is no need for a Wikibase or any other formalism. I understand that you may not have the specific competences in RDF formatting in your team, but I don't consider it is out of the scope. Doing it for one version is not acceptable, as the need is real for each projects, and already expressed by other in the past years, like in Community Wishlist Survey 2017/Wiktionary/Parse dumps for DICT clients (17 supports). Finally, this proposal seems very much aligned with the 2030 strategy. Noé (talk) 06:46, 20 November 2019 (UTC)[reply]
    IFried (WMF) & MusikAnimal (WMF), I support the response of Noé, Wiktionaries doesn’t need to have a Wikibase to be structured. I admit their structures are not absolutely identical, but it’s for the most case just a différent order of sections. They are same granularity of information. I think it is very important for middle sized wiktionaries to be easily reused, it is really encouraging ! Lyokoï (talk) 12:26, 20 November 2019 (UTC)[reply]
    @Lyokoï and Noé: Would it make sense to build a tool that allows you to export data from Dbnary to the other formats you've listed? That seems doable. Otherwise, creating and maintaining something that parses unpredictable wikitext seems like it won't scale. I will admit most of us aren't too well-versed with Wiktionaries, but just comparing English vs. French, I see the latter is more template-heavy and in a somewhat different layout, and this is what gives us pause. I apologize that we're running out of time... Voting starts in one hour! So let me know if working off of Dbnary will work, and if so we will unarchive. Thanks, MusikAnimal (WMF) (talk) 17:00, 20 November 2019 (UTC)[reply]
    @MusikAnimal (WMF): building a tool on Dbnary sounds good for me! Noé (talk) 21:40, 20 November 2019 (UTC)[reply]
    @MusikAnimal (WMF): For me too! --Lyokoï (talk) 11:45, 21 November 2019 (UTC)[reply]
    Great, I've slightly modified the proposal to make this clear, and I am moving it back into the survey. Best, MusikAnimal (WMF) (talk) 16:53, 21 November 2019 (UTC)[reply]
  • I suggest to change the title of the proposal to match the new content. It's not clear whether the current proposal is to have some kind of service as a software substitute or what. The proposal is to have a web API to Dbnary data, correct? Nemo 09:28, 22 November 2019 (UTC)[reply]
    Proposals have to be problems not a specific solution, to let the Community Tech team work and find the better way to solve this problem. Here, the problem is how to offer more format on Wiktionaries exports. One suggestion is to build something on top of Dbnary transformation, as a big part of the job of alignment between wiktionaries is already made by this project. So, it is not an API that dig directly into Dbnary. Plus, this is something that already exist in Dbnary website. Noé (talk) 06:51, 27 November 2019 (UTC)[reply]

Voting