ContribuLing 2021/Program

ContribuLing 2021 Program & Videos Contact us Discussion

Program edit

Thursday 3rd June
Time (UTC) Session title Speaker Duration
11:00 UTC Keynote speech
Anna Belew, Endangered Languages Project 30 min
11:30 UTC Lingua Libre (Presentation)

Lingua Libre is a project developed by Wikimédia France, which aims to build a collaborative, multilingual, audiovisual corpus under free licence in order to: Expand knowledge about languages and in languages in an audiovisual way on the web, on Wikimedia projects and outside ; Support the development of online language communities — particularly those of poorly endowed, minority, regional, oral or signed languages — in order to help communities accessing online information and to ensure the vitality of the languages of these communities.

Adélaïde Calais WMFr (EN), Lucas Prégaldiny (FR) (WikiLucas00) 30 min
12:00 UTC Wikidata Lexemes, Where to Find Them and How to Use Them?

Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It started in 2012 and now has more than 12 billion data about 93 million concepts. In 2018, a namespace for Lexemes (units of lexical meaning) has been added to store the specific needs of lexicographical data.

Nicolas Vigneron (VIGNERON) 30 min
12:30 UTC BREAK 30 min
13:00 UTC Wikimedia Incubator

Wikimedia Incubator is a project of the Wikimedia Foundation which aims to organise the creation of new language wikis belonging to Wikimedia.
As the name suggests, the platform incubates new language versions of Wikimedia projects - such as the encyclopaedia Wikipedia, the travel guide Wikivoyage, the collection Wikiquote or the dictionary Wiktionary; before fully achieving the status of wiki edition of the said project.
The platform allows the organisation, preparation, editing, testing and coordination of the new language edition to be eventually hosted by the Wikimedia Foundation.
Wikimedia Incubator is a mandatory step before launching new language versions of Wikimedia projects.

Reda benkhadra 30 min
13:30 UTC Kumoontun – Building collectively

Kumoontun is a word in Ayöök that means tequio or collective work for the benefit of the population. This is our work philosophy in the collective with the same name, where we generate materials and digital tools for children, teachers, parents and teachers according to the language and context of their native peoples. One of these tools is the Kumoontun app that we developed three years ago to learn the Ayöök language. Our aim is to share and interact with other users, but also to provide didactic materials in different native languages such as stories, narrations, videos and songs. Another of the characteristics that we consider essential for kumoontun to be possible is that the materials we have are available to the public free of charge and with the possibility that they can be printed or translated. For example, the children's storybook "Pixk" has been translated into Popti' Maya from Guatemala, Tének from Central Mexico, English and soon into Mixe in its different variants: Ayuujk, Ayuuk, Eyuk and Ayuk.

Zitlali Guadalupe Martínez Pérez 30 min
14:00 UTC BREAK 30 min
14:30 UTC (A) Collaborative workshop on translating digital words into African languages on the Internet

The writing workshops in African languages on the Internet are working sessions organised by the IdemiAfrica Collective for the transcription into African languages of generic digital terms in order to have a database of expressions to participate in the translation of major CMS into African languages.
This is one of the activities of IdemiAfrica, a collective born of the desire to make African languages more visible on the web through the platform
During ContribuLing, we will be running a writing workshop by providing participants with a reference file containing the expressions to be translated, along with resources to facilitate writing in languages on a telephone or computer.

Collectif Idemi Africa 1h30
14:30 UTC (B) Lingua Libre (Workshop)
Emma Vadillo Quesada (Eavq), Lucas Prégaldiny (WikiLucas00) 1h30
16:00 UTC End of Day 1

Friday 4th June
Time (UTC) Session title Speaker Duration
11:00 UTC Adding an endangered language to the Wiktionnaire

The Wiktionnaire (french-speaking Wiktionary) is a sister project of Wikipedia. It is a collaborative online dictionary that aims to describe all words of all languages. The French-speaking version already describes almost 4.3 million forms from 4,800 languages, making it the largest French-speaking dictionary ever. Its digital nature allows it to associate a lot of information to each word: etymology, lexical field, audio pronunciations, examples of usage, illustration, translations... It is built and maintained by a community of enthusiasts composed of several hundred people.
After an introduction to how the project works, the presentation will be divided into three parts. We will start with the aspect of the description of rare, endangered or extinct languages in the Wiktionnaire, then we will move on to the case of a well-described language (Gallic), and we will finish with a language in the process of being described: Lorrain.

Lucas Lévêque (Lyokoï), Florian Cuny (Poslovitch) 30 min
11:30 UTC Wiktionary tools and experiences in the Tacawit language

Wiktionary is a lexicographical project of the Wikimedia Foundation, which aims to define all words in all languages. There are more than 150 writing languages1. The term "Wiktionary" refers to the French version of this project, Wiktionary being the official name in English. It is based on a wiki system and its content is freely reusable (under CC-BY-SA).

Reda Kerbouche 30 min
12:00 UTC The Permanent Congress of the Occitan Language and Wikimedia tools: a mutual enrichment

For minority languages wishing to make progress in the field of NLP (automatic language processing), the pooling of data is essential. This pooling involves the use of collaborative platforms, such as the various tools offered by the Wikimedia Foundation. In this presentation, we will focus on some of them (Lingua Libre, Wikidata, Wikipèdia in Occitan) and explain how Le Congrès uses them to develop its own tools.
Example 1: Using Lingua Libre to record words to be integrated into an Occitan multidictionary
Example 2: Contribution to Wikidata lexemes with the aim of developing semantic analysis: creation of a bot for automatic uploading of lexicons of inflected forms and of a serious game allowing lexemes to be linked to the concepts they represent
Example 3: Putting online a notice for a pedagogical activity of realization of a mini-documentary in Occitan, by contributing to Wikidata, Wikipèdia and Wikimedia Commons.

Aure Séguier (Unuaiga), Vincent Gleizes 30 min
12:30 UTC BREAK 30 min
13:00 UTC Extracting a massively multilingual pronunciation dictionary from Wiktionary

WikiPron (Lee et al. 2020) is a free Python software library used to mine pronunciation dictionaries from Wiktionary, a free collaborative multilingual online dictionary. Such dictionaries can be used to build grapheme-to-phoneme conversion models (e.g., Gorman et al. 2020, in preparation), a key component of speech recognition and synthesis engines. The design of WikiPron is described, as well as its scaling to support 215 languages. Ongoing quality-assurance/vetting projects are then reviewed, as are future plans to improve "upstream" (i.e., Wiktionary) data quality. Participants are shown how to install and use WikiPron, and how to make quality assurance contributions in a language of their choice.

Lucas F.E. Ashby, Dr. Kyle Gorman 30 min
13:30 UTC Living Dictionaries:A Web Platform to Protect Endangered Languages and Maintain Global Linguistic Diversity

Living Dictionaries are collaborative multimedia web tools that can help languages survive for generations to come. Ideal for maintaining indigenous as well as diasporic languages, Living Dictionaries are never out-of-print, infinitely expandable resources. They go well beyond a static print dictionary, combining language data with digital audio recordings of native speakers, photos and videos. Living Dictionaries address the urgent need to provide comprehensive, freely accessible tech tools for both linguists engaged in documentation and to assist community activists in grassroots conservation efforts and revitalization programs. The intended audience of this web app is inclusive, diverse and multilingual. Living Dictionary managers and contributors may create new entries, edit entries, add images, upload audio files, as well as record directly into dictionaries using microphones on smartphones or laptops.

Anna Luisa Daigneault, Dr. Gregory D. S. Anderson 30 min
14:00 UTC BREAK 30 min
14:30 UTC (A) Wikisource

Started in 2003, Wikisource is one of the oldest sister projects of Wikipedia. The purpose of Wikisource is to digitize free (as in public domain or free-license) works. There are several specialized tools available at Wikisource that are not present in Wikipedias. Wikisource is split into 72 language subdomains (plus a multilingual one as incubator). Recently it has experienced somewhat of a resurgence, with more focus and more activities going on – especially in Indic languages. Moreover, each Wikisource contains a large variety of linguistic documents, like dictionaries and grammars. To be in public domain, these documents are old (70+ years) but could still be useful in many ways (for diachronic analysis, to compare how words evolved).
The workshop will demonstrate how to proofread texts on Wikisource. It will allow attendees to learn how to contribute and take part in the Wikisource community. First, we will introduce: existing projects on Wikisource, proofreading status, how to ask for help. Then, the workshop will start and each participant will get to work on a few pages. Finally, we will discuss about the steps to start a new proofreading project.

Nicolas Vigneron (VIGNERON), Antoine Srun (Assassas77) 1h30
14:30 UTC (B) Explore, Analyse and Translate Multilingual Wikidata Properties

Wikidata is highly evolving. New properties are proposed, discussed, created and deleted regularly. It is highly difficult to know the latest information on the total count of properties, new properties, new data types, new WikiProjects, supported languages etc. unless contributors write (complex) queries on the Wikidata query service. Users, especially newcomers who wish to contribute do not know various ways by which they can document and contribute. Take, for example, users often ask the questions how to document a historical monument, person, book, software, etc. WDPropp provides a visual interface to answer such questions.
It also helps the users in understanding and improving multilingual information related to Wikidata properties. Every property has three major pieces of information that require to be translated: label, description, and aliases. A property may have none, some, or all of this information in any given language. Thus for any given language, properties can be therefore be separated into two categories: translated properties and untranslated properties. A contributor may wish to focus on translating the labels or a particular set of properties like those belonging to a particular property class or a WikiProject. WDProp helps contributors in finding properties that require translation. The users can also visualize the historical translation process and even detect possible vandalism. Finally, users can also search relevant WikiProjects related to their domain.

Dr. John Samuel 1h30
16:00 UTC Closing session
Damien Nouvel 30 min
16:30 UTC End of Day 2