Community Wishlist Survey 2020/Wiktionary/Insert attestation using Wikisource as a corpus

Insert attestation using Wikisource as a corpus

  • Problem: Wiktionaries definitions relies on attestations, sentences from corpora illustrating the usages and meanings of words. Wikisource is an excellent corpus for Wiktionaries, especially for classic uses, but it is uneasy to search into the texts for a specific word. Now, the reference of the sentence had to be copy/paste by hand and it's a long and unfunny way to contribute, the result being few quotation from Wikisources (less than 3 % for French Wiktionary).
  • Who would benefit: Readers of Wiktionaries would find more examples of usages and a way to access the whole source directly in Wikisource. Contributors of Wiktionaries would have a fancy and enjoyable way to add attestations, similarly as Insert media tool that dig into Wikimedia Commons, and the community may grow with new people that like to add sentences from their readings. Editors of Wikisource would have a new way to shed light on their sisyphean work. Both projects visibility would increase in search engines with more links between them. The global audience of both projects may increase with more connectivity. Also, other projects may benefit from this feature, such as Wikipedia to add quotations in authors' pages.
  • Proposed solution: This feature is inspired by Insert media but targeting Wikisource instead of Wikimedia Commons. So, instead of an snippet search offering pictures, Insert attestation would display a list of sentences from a targeted Wikisource (could be same language or other than the source project) that include the targeted sequence of characters. Their is no meaning requirement nor proximity, it is exact results only to keep it simple. In the displayed snippet of results, an editor would just grab a sentence with a single click and it will be added with the adequate sources picked from Wikidata associated with the Wikisource page. The feature would copy the sentence (no transclusion) and the source of the sentence (adding the information for the number of the page in the original manuscript optimally, i.e. "page 35."). This feature may need a specific parser to identify limits of sentences and to bold the targeted sequence of characters.
  • More comments: This feature/tool/functionality should be accessible through WikiText editor and VisualEditor. It may be interesting to keep track of the reuses of Wikisource content in other project with a specific What's link here from Wiktionary to Wikisource, similarly as Wikimedia Commons indication of reuses in others projects, but this could be part of another development. This idea was suggested in 2018 with 36 supports, in 2017 and supported by 32 people, a draft was suggested in 2016 with 19 supports and this idea was coined first in a MediaWiki discussion.
  • Phabricator tickets: T139152, T157802
  • Proposer: Noé (talk) 07:33, 22 October 2019 (UTC)[reply]


  • My idea is - page Foo (1), Click on something will run search in wikisource for sentences conatining word foo (2). Then editor must chceck, if this word is in correct context/sense and select part for copy to some input field (3). Sometimes some corrections are needed (…), shorten long sentence, add missing subject from previous sentence... Then click to OK - and there will be example (4) with reference.
    1. I have word cs:wikt:pitel,
    2. Search on Wikisource gives me some examples
    3. I select one of them - sentence Od dávna trvající věrný pitel vína dobrého. from Paměti
    4. I got #* {{Příklad|cs|Od dávna trvající věrný pitel vína dobrého.}}<ref>Mikuláš Dačický z Heslova: [[s:Paměti/1601–1605|Paměti]]</ref> for copying to Wiktionary.

JAn Dudík (talk) 20:52, 7 November 2019 (UTC)[reply]

  • I agree with your description, but I also think it could be done in VisualEditor without even see any wikicode. It could be user-friendly and easily accessible for new user, like "Add translation" in some wiktionaries. Like "Insert Media", very easy to use in Wiktionary. - Noé (talk) 16:47, 8 November 2019 (UTC)[reply]
    But because Wiktionary pages are mostly from various templates, VE is hardly usable in Wiktionary [2]. JAn Dudík (talk) 10:06, 10 November 2019 (UTC)[reply]
    French Wiktionary use it. It imply to document every templates with TemplateData and still, it adds several unnecessary line break, but it is possible to use it Noé (talk) 11:00, 11 November 2019 (UTC)[reply]