Grants:Project/Rapid/Language Diversity Hub and Wikitongues/Report

Movement Strategy Implementation Grant Report
Accepted

Introduction

edit

7,000 languages are spoken or signed today, but 3,000 languages could disappear in a generation, erasing centuries of cultural, historical, and ecological knowledge. Language extinction is not inevitable. People lose their languages to economic exclusion, political oppression, and violence. However, with the right resources, you can learn the language of your ancestors, raise the next generation as native speakers, and keep your culture alive. The social process of saving an endangered or dormant ('extinct') language is called language revitalization.

To prevent language extinction, Wikitongues invests in language activists, accelerates endangered language revitalization projects, and defends diversity on a global scale. Since 2014, we've safeguarded resources in over 700 languages, or 10% of every language in the world, and since 2021, we have kickstarted more than 40 language revitalization projects across every continent. As a Wikimedia Affiliate, we support mother-tongue contribution to Wikimedia projects, like adding language resources to Commons, contributing lexemes to Wikidata, and creating mother-tongue versions of Wikipedia.

Sadly, a majority of endangered languages are also under-resourced, so the grassroots creation of mother-tongue materials is both a method of safeguarding cultural knowledge and a critical first step in the process of language revitalization. In that sense, mother-tongue contribution to Wikimedia projects represents a valuable opportunity for global language revitalization efforts. To date, that potential remains largely untapped. For example, only about 5% of the world’s languages and 8% of the world’s writing systems are represented on Wikipedia, a significant gap in the Wikimedia movement’s mission to effectively safeguard and disseminate the sum of human knowledge on a global scale.

Project Scope and Objectives

edit

At Wikitongues, our core project is the Language Revitalization Accelerator, a funded fellowship for the leaders of new and early-stage language revitalization projects. In annual cycles, we help language activists identify their communities' long-term and short-term language needs and build measurable plans to implement those needs over a generation. We then supplement their work with microgrants, in-kind services, and volunteer labor.

As we have already demonstrated, mother-tongue contribution to Wikimedia can be a powerful vehicle for language revitalization. It also advances the Wikimedia mission of safeguarding and amplifying knowledge. In this project, we set out to pilot a special Wiki track of our Language Revitalization Accelerator, dedicated to mother-tongue projects that center Wikimedia contribution; and set three measurable outcomes to determine success:

  1. Support 10 Wikimedia-focused language revitalization projects with broad geographical representation
  2. Produce free resources that guide new Wikimedians through the processing of mother-tongue contribution
  3. Identify roadblocks to mother-tongue contribution that, if addressed, could accelerate the growth of linguistic diversity across the Wikimedia movement

Results

edit

Objective #1: Support 10 Wikimedia-focused language revitalization projects with broad geographical representation

edit

The 2023-2024 cohort of our Language Revitalization Accelerator was the first to include a Wiki track. Of 21 fellows, 10 were rising Wikimedians from Africa (Nigeria, Botswana, and Benin), Asia (Indonesia and Israel), Europe (Italy), and the Americas (United States). The Wiki cohort represented languages at different levels of endangerment, from widely spoken but under-resourced (Dendi) to critically endangered (Jewish Neo-Aramaic). Each of the fellows proposed projects that leveraged different Wiki platforms: Wikipedia, Wiktionary, Wikidata, and Wikimedia Commons. 6/10 fellows met or exceeded their mother-tongue Wikimedia goals. 4/10 have made significant progress and are still working.

  • Agnes Ajuma, Igala language: Agnes set out to launch an Igala version of Wikipedia. She successfully incubated the Igala Wikipedia, which left the incubator in late 2023 with 456 articles and an active community of Igala editors.
  • Oteng Tiro Sandra Kolobetso, Sekgalagadi language: Oteng set out to launch a Sekgalagadi version of Wikipedia. She has successfully launched the project, which is currently in the incubator with 13 articles and 7 editors.
  • Amrit Sufi, Angika language: Amrit set out to safeguard Angika folklore by uploading video oral histories to the Wikimedia Commons and transcribed texts to Wikisource. Working with other members of her community, whom she helped train, she recorded and uploaded 59 Angika videos. After Amrit's fellowship, she secured additional funding from the Foundation to continue her work, with Wikitongues as co-sponsor.
  • Ross Patrick Azogbonon, Dendi language: Ross set out to create a Dendi version of Wikipedia. He has successfully launched the project, which is currently in the incubator with 30 articles.
  • Mahuton Possoupe, Fongbé language: For writing, the Fongbé language uses an extended version of the Roman alphabet, which is poorly supported by modern devices, obstructing not just Wikimedia contributions but contributions to the wider Internet. His goal, therefore, was to develop an online Fongbé keyboard, which could be used to accelerate Fongbé-language contributions. He and his team have successfully launched the keyboard and are now educating their community about its availability.
  • Martin Di Maggio, Arbëreshë language: Martin set out to design and produce a deck of cards of everyday concepts in the Arbëreshë language, with corresponding articles on Arbëreshë Wikipedia. He successfully designed, published, and distributed the cards in local stores. His team contributed articles to the Arbëreshë test Wikipedia, which has 336 articles on the incubator and a growing team of editors.
  • Faisal Ansari, Banjar language: Faisal set out to add 3,000 mother-tongue lexemes to Wikidata. So far, he has added 1,590 lexemes and is still working toward his original goal. Accessible lexeme list forthcoming.
  • Manzhuur Daanisy Ahmad, Kaidipang-Bolangitang language: Manzhuur's goal was to create a bilingual dictionary and launch a mother-tongue version of Wikipedia. To date, they have collected ~2,500 terms and verified ~800 terms with their elders, while working with Wikimedia Indonesia to get training on Wikipedia editing. They are still working toward their goal.
  • Ariel Nosrat, Jewish Neo-Aramaic: Ariel set out to build a 2,000-word dictionary for Jewish Neo-Aramaic, published on Living Dictionaries and Wiktionary. So far, he has added at least 580 Jewish Neo-Aramaic entries to Living Dictionaries, and is still working toward his goal of 2,000. When he reaches that goal, we will add the materials to Wiktionary. Ariel's dialect of Jewish Neo-Aramaic is known natively as Lishan Dodan and originated in the Kurdish region of Iran.
  • Jacqueline Brixey, Choctaw language: Jacqueline set out to build a Choctaw corpus and train language models on that corpus for English-Choctaw machine translation, as a means of rapidly expanding Choctaw Wikipedia. She has successfully trained her model and has begun recruiting volunteers to deploy it.

Objective #2: Produce free resources that guide new Wikimedians through the processing of mother-tongue contribution

edit

The aforementioned fellows were trained using these linked resources, developed by our Wikimedian-in-Residence, Tochi Precious.

Objective #3: Identify roadblocks to mother-tongue contribution that, if addressed, could accelerate the growth of linguistic diversity across the Wikimedia movement

edit

By guaranteeing diverse representation across regions, linguistic vitality, and Wikimedia platforms, we intended to give ourselves a well-rounded picture of the barriers to mother-tongue Wikimedia contribution. We've identified the following roadblocks as especially urgent to address:

  • New language versions of Wikipedia are required to be validated by an Expert Reviewer before they are approved for leaving the incubator. While this may work for larger languages, the vast majority of languages, especially endangered languages, are under-researched, making an independent 'expert' hard to find. For some languages, the only experts are the speakers themselves. In the absence of an independent 'expert', therefore, many languages are condemned to never leaving the incubator. This could be addressed by having a tiered approval system for languages based on their endangerment level, speaker population size, or availability of independent scholarship.
  • The incubator has a steep learning curve, which can suppress enthusiasm among new volunteers. It also is a different tool from Wikipedia, meaning that rising Wikimedians must learn two technologies in order to create a mother-tongue Wikipedia version. This could be addressed with more training resources, and by transforming the Incubator into a UX/UI mirror of Wikipedia.
  • Not all writing systems are equally supported, making it difficult to contribute certain languages. This challenge is beyond the scope of Wikimedia itself and applies to the Internet at large. However, writing system accessibility should be a technical priority for our movement if we're serious about expanding linguistic diversity.
  • Wikimedia projects are largely volunteer-run, which can add an extra barrier in countries and regions where disposable income and free time are scarce.

Securing funding and resources for developing test projects can be a challenge. This could be addressed by cultivating a tech volunteer corps in the movement to help rising Wikimedians realize their initial goals, maintaining a special rapid grant pool for new language projects (like our Accelerator's Wiki track, but at scale), and helping new Wikimedians promote their project to attract more independent contributors and donors.

  • A successful test project needs a critical mass of content to be valuable to users. Contributors need to find reliable sources and build up a substantial amount of content to justify the launch of a new project which, in large part, requires community organizing. Teaching new Wikimedians how to build up a local volunteer base would help mitigate this challenge.

Next Steps

edit

In addition to supporting the 2023-2024 Wiki fellows as they continue to grow their mother-tongue projects, we will complete the following objectives in the next 18 months.

  • Consolidate our research on contribution roadblocks with parallel research from the Language Diversity Hub, distilling all findings into a single report.
  • Publish the free resources described above on MetaWiki, and incorporate them into an updated draft of our Language Sustainability Toolkit.
  • Share our findings at Wikimania 2024.

Thanks to additional funding from the MSIG team, we have extended the Wiki track pilot an additional year, having recently onboarded 10 Wikimedians from under-resourced and endangered language communities. We will share their projects in the coming months.

Lessons

edit

Over the past year, in working with these fellows and contributing more broadly to the Language Diversity Hub, we have realized that there should be better infrastructure in place for comprehensively measuring the state of linguistic diversity across all Wikimedia projects. For example, right now, it's possible to count how many language editions of Wikipedia exist and how rigorous each edition is (article count, etc), but there's no obvious solution for evaluating which languages are represented on the Wikimedia Commons. To effectively expand linguistic diversity across the movement, we'll need to build a way of counting the scope and depth of language representation across every Wikimedia project. We hope to develop this in the next 1-2 years.

Budget

edit

The budget was spent in full, as described in our proposal.