Time (UTC)
|
Session title
|
Speaker
|
Duration
|
---|
11:00 UTC
|
Adding an endangered language to the Wiktionnaire
Description
|
---|
The Wiktionnaire (french-speaking Wiktionary) is a sister project of Wikipedia. It is a collaborative online dictionary that aims to describe all words of all languages. The French-speaking version already describes almost 4.3 million forms from 4,800 languages, making it the largest French-speaking dictionary ever. Its digital nature allows it to associate a lot of information to each word: etymology, lexical field, audio pronunciations, examples of usage, illustration, translations... It is built and maintained by a community of enthusiasts composed of several hundred people. After an introduction to how the project works, the presentation will be divided into three parts. We will start with the aspect of the description of rare, endangered or extinct languages in the Wiktionnaire, then we will move on to the case of a well-described language (Gallic), and we will finish with a language in the process of being described: Lorrain.
|
|
Lucas Lévêque (Lyokoï), Florian Cuny (Poslovitch)
|
30 min
|
11:30 UTC
|
Wiktionary tools and experiences in the Tacawit language
Description
|
---|
Wiktionary is a lexicographical project of the Wikimedia Foundation, which aims to define all words in all languages. There are more than 150 writing languages1. The term "Wiktionary" refers to the French version of this project, Wiktionary being the official name in English. It is based on a wiki system and its content is freely reusable (under CC-BY-SA).
|
|
Reda Kerbouche
|
30 min
|
12:00 UTC
|
The Permanent Congress of the Occitan Language and Wikimedia tools: a mutual enrichment
Description
|
---|
For minority languages wishing to make progress in the field of NLP (automatic language processing), the pooling of data is essential. This pooling involves the use of collaborative platforms, such as the various tools offered by the Wikimedia Foundation. In this presentation, we will focus on some of them (Lingua Libre, Wikidata, Wikipèdia in Occitan) and explain how Le Congrès uses them to develop its own tools. Example 1: Using Lingua Libre to record words to be integrated into an Occitan multidictionary Example 2: Contribution to Wikidata lexemes with the aim of developing semantic analysis: creation of a bot for automatic uploading of lexicons of inflected forms and of a serious game allowing lexemes to be linked to the concepts they represent Example 3: Putting online a notice for a pedagogical activity of realization of a mini-documentary in Occitan, by contributing to Wikidata, Wikipèdia and Wikimedia Commons.
|
|
Aure Séguier (Unuaiga), Vincent Gleizes
|
30 min
|
12:30 UTC
|
BREAK
|
30 min
|
---|
13:00 UTC
|
Extracting a massively multilingual pronunciation dictionary from Wiktionary
Description
|
---|
WikiPron (Lee et al. 2020) is a free Python software library used to mine pronunciation dictionaries from Wiktionary, a free collaborative multilingual online dictionary. Such dictionaries can be used to build grapheme-to-phoneme conversion models (e.g., Gorman et al. 2020, in preparation), a key component of speech recognition and synthesis engines. The design of WikiPron is described, as well as its scaling to support 215 languages. Ongoing quality-assurance/vetting projects are then reviewed, as are future plans to improve "upstream" (i.e., Wiktionary) data quality. Participants are shown how to install and use WikiPron, and how to make quality assurance contributions in a language of their choice.
|
|
Lucas F.E. Ashby, Dr. Kyle Gorman
|
30 min
|
13:30 UTC
|
Living Dictionaries: A Web Platform to Protect Endangered Languages and Maintain Global Linguistic Diversity
Description
|
---|
Living Dictionaries are collaborative multimedia web tools that can help languages survive for generations to come. Ideal for maintaining indigenous as well as diasporic languages, Living Dictionaries are never out-of-print, infinitely expandable resources. They go well beyond a static print dictionary, combining language data with digital audio recordings of native speakers, photos and videos. Living Dictionaries address the urgent need to provide comprehensive, freely accessible tech tools for both linguists engaged in documentation and to assist community activists in grassroots conservation efforts and revitalization programs. The intended audience of this web app is inclusive, diverse and multilingual. Living Dictionary managers and contributors may create new entries, edit entries, add images, upload audio files, as well as record directly into dictionaries using microphones on smartphones or laptops.
|
|
Anna Luisa Daigneault, Dr. Gregory D. S. Anderson
|
30 min
|
14:00 UTC
|
BREAK
|
30 min
|
---|
14:30 UTC (A)
|
Wikisource
Description
|
---|
Started in 2003, Wikisource is one of the oldest sister projects of Wikipedia. The purpose of Wikisource is to digitize free (as in public domain or free-license) works. There are several specialized tools available at Wikisource that are not present in Wikipedias. Wikisource is split into 72 language subdomains (plus a multilingual one as incubator). Recently it has experienced somewhat of a resurgence, with more focus and more activities going on – especially in Indic languages. Moreover, each Wikisource contains a large variety of linguistic documents, like dictionaries and grammars. To be in public domain, these documents are old (70+ years) but could still be useful in many ways (for diachronic analysis, to compare how words evolved). The workshop will demonstrate how to proofread texts on Wikisource. It will allow attendees to learn how to contribute and take part in the Wikisource community. First, we will introduce: existing projects on Wikisource, proofreading status, how to ask for help. Then, the workshop will start and each participant will get to work on a few pages. Finally, we will discuss about the steps to start a new proofreading project.
|
|
Nicolas Vigneron (VIGNERON), Antoine Srun (Assassas77)
|
1h30
|
14:30 UTC (B)
|
Explore, Analyse and Translate Multilingual Wikidata Properties
Description
|
---|
Wikidata is highly evolving. New properties are proposed, discussed, created and deleted regularly. It is highly difficult to know the latest information on the total count of properties, new properties, new data types, new WikiProjects, supported languages etc. unless contributors write (complex) queries on the Wikidata query service. Users, especially newcomers who wish to contribute do not know various ways by which they can document and contribute. Take, for example, users often ask the questions how to document a historical monument, person, book, software, etc. WDPropp provides a visual interface to answer such questions. It also helps the users in understanding and improving multilingual information related to Wikidata properties. Every property has three major pieces of information that require to be translated: label, description, and aliases. A property may have none, some, or all of this information in any given language. Thus for any given language, properties can be therefore be separated into two categories: translated properties and untranslated properties. A contributor may wish to focus on translating the labels or a particular set of properties like those belonging to a particular property class or a WikiProject. WDProp helps contributors in finding properties that require translation. The users can also visualize the historical translation process and even detect possible vandalism. Finally, users can also search relevant WikiProjects related to their domain.
|
|
Dr. John Samuel
|
1h30
|
16:00 UTC
|
Closing session
|
Damien Nouvel
|
30 min
|
16:30 UTC
|
End of Day 2
|
---|