Needs assessment for documentation and revitalization of Indic languages using Wikimedia projects/Executive report

Introduction

According to Professor David Crystal in his book ‘Language Death’, “When language transmission breaks down, through language death, there is a serious loss of inherited knowledge”. Since language is at the core of human civilization, it is imperative that we document and revitalize it. A vast majority of Indic languages are not represented enough on Wikimedia platforms. Only 7% of the world’s 7000 languages are available in published material, and an even smaller fraction of these languages are available online.

This research started with the aim to understand the current state of Indic languages and what can be done to increase their representation on the Wikimedia platforms. In the course of the research, we analyzed the specific needs of Indic communities for the digitization of the Indic languages. Before beginning the digitization of a given language, it is essential to understand the needs of the language communities. As native speakers would be the backbone of open source language digitization, we conducted surveys and interviews to get an understanding of their needs, what can be done to encourage more contribution from them in digital preservation of languages.

This is the executive summary of the research. Click here to read the complete report.

Context

We collected the data for this research by conducting surveys and interviews. There were 139 survey responses and we interviewed 15 people. Our interviewees and survey takers were from three categories: Wikimedians, indigenous/native language speakers, and language experts. We mainly relied on free form questions rather than multiple choice questions in the surveys since we wanted to understand what the respondents would say when not given any cues. The respondents were from 41 different languages. As there were parallels between them, e.g. most native language speaker interviewees and survey takers were also scholars of linguistics and languages, they were aware of the need for proper representation of the variety of Indic languages on online platforms.

This research was carried out in an explanatory manner with a qualitative method of semi-structured interviews. Secondary research was included i.e. analysis of already available data. The data analysis is both inductive and deductive, deductive analysis appearing mostly in the conclusion and recommendations section of this research. Purposive sampling and snowball sampling methods were used to choose the interview participants. Initially, we spoke to participants we were already familiar with and then those participants suggested other prospective participants.

Learnings

We found that there is little awareness among native language speakers (non-Wikimedians) regarding the existence of any Wikimedia projects other than Wikipedia. 89% of non-Wikimedian survey takers were unaware of Wikimedia projects other than Wikipedia. This needs to be remedied since we cannot expect people to contribute knowledge formats that they are unaware of Wikimedia projects being able to support.

The text-centeredness of Wikimedia has been pointed out by a survey respondent and several interviewees. They mention unexplained removal of audio-visual content from Commons, lack of understanding of cultures where languages are used majorly for oral communication as some of the issues that make linguistic and cultural inclusivity difficult. Inclination towards Wikipedia, while it can be explained by its brand value, is downplaying the importance of other Wikimedia projects and their immense potential as sites for digital preservation.

Around 70% non-Wikimedian survey respondents chose recording folk songs and folktales as the preferred method to contribute to their language digitally. Several interviewees talked about the need of recording audio-visuals of folk culture. For instance, native speakers of Bodo and Braj language told us that oral culture in their language is fast disappearing, so recording them is pertinent.

Pramod Rathor, a Braj speaker says: “The varieties of Braj folk songs- Suddas, Languriya, Aalha, Rasiya, Malhaar, Faag etc. sung in rural areas have near to no representation on digital platforms. As people are migrating to urban areas, these forms of songs are being lost, since they are not practiced anymore.”

Linguist Bidisha Bhattacharjee in her essay ‘Role of Oral Tradition to Save Language and Cultural Endangerment’ states, “The oral tradition is a rich source of preservation of cultural heritage and it reflects through the linguistic expression and linguistic variety of people.”

Recommendations

  1. Moving out of the text-centrism and using Wikimedia projects innovatively: only about 40% of the Wikimedians mentioned that it’s possible to utilize sister projects of Wikimedia for digitization of oral culture.
  2. Citizen archivists have to be promoted to create oral culture content: This research has established the importance of oral cultural and linguistic content, the next step is to put forth the creation of such content in motion. Citizen archivists from the same community or region can capture the orality of the language well. The 1947 Partition Archive has successfully trained individuals and collected more than 10,000 oral histories. A training course similar to the Reading Wikipedia in the Classroom training of trainers might be useful as well.
  3. Creation of oral culture content relevant for given languages: As mentioned above, oral culture like folk songs are disappearing fast. The Oral Culture Transcription Toolkit provides guidance on utilizing Commons and Wikisource for documenting oral culture content and for its representation in the textual form. However, major improvements in the technical infrastructure will help to make the process easier and involve less hopping between platforms.
  4. Providing needed support to interested individuals: There are certain avenues for supporting individuals and communities with internet, equipment, and mentorship support. , as Eddie Avila, director of Rising Voices, mentions: “Even if there are not existing activists in one’s language, opportunities for cross-linguistic, cross-regional mentorship between activists from another language can guide and inspire interested individuals.” In the context of Wikimedia, he says:

“In terms of Wikimedia projects, policies might be tough to understand for those not familiar with the platform, but the mentoring model especially from those from the same language community can help remove some of these barriers to understanding. We have seen examples of how communities are adapting Wikimedia projects based on their own local context and approach to knowledge sharing.”

The complete report for this research project is available here.