Grants:Project/Rapid/Oral Culture Transcription Toolkit

Oral Culture Transcription Toolkit
Creating a toolkit to help Wikimedians transcribe oral knowledge on Wikimedia platforms
targetWikimedia Commons and Wikisource
start dateSeptember 3
end dateDecember 30
budget (USD)312,000 INR / 4203 USD
grant typeIndividual
granteeAmrit Sufi
contact(s)• amritsufi2(_AT_)

Review your report

Project Goal


Briefly explain what are you trying to accomplish with this project, or what do you expect will change as a result of this grant. Example goals include, "recruit new editors", "add high quality content", or "train existing editors on a specific skill".

UNESCO's latest interactive Atlas of the World’s Languages in Danger has identified nearly 2,500 languages endangered out of the 7000-odd languages spoken across the world. With as many as 197 endangered languages among its 600 plus tongues spoken, India tops the list.[1] Whose Knowledge echoes the finding, claiming that only 500 of the world’s 7000 languages are represented online, with English and Chinese dominating, thereby concluding that online knowledge is accessible only through colonial languages.[2]

India is a multicultural and multilingual country. However the online presence of several of these languages and hence cultures is minimal. A good way to deal with this problem is to focus on the point where language intersects with culture. Documentation of folk songs, folktales and folk-art can be a starting point here. In the future, we can also focus on cultural material artefacts such as ornaments and utensils we might encounter while documenting the culture.

What necessitates action in this direction is the fact that several Indian languages come under the vulnerable and endangered category. In the age where communication and information access has become increasingly dependent on the internet, this project might be beneficial in restoration of such languages. In doing so, we will also produce a toolkit and therefore a blueprint for future documentation.

In the pilot project, we would like to focus on creating a toolkit in at-least a couple of Indian languages. We intend to pilot this by documenting at least two languages, at least one of which is in the vulnerable category. We will document, record and transcribe folk songs and folktales in these languages. Working with the native people and experts we will also build a toolkit which will aid in further documentation of languages and cultures, along with creating their transcriptions and subtitles. We are going to build this toolkit on the basis of existing toolkits and also try to fill present gaps with research and interviews, with special focus on ethical ways to approach indigenous communities as outsiders.

As a proof of concept, we have starting creating a new Wikisource for Angika (one of the endangered languages of India) on the Multilingual Wikisource, which is entirely based on audio-visual content as of now.

The final outcome of the project will be a unified toolkit (translated in at-least 2 Indian languages) that uses the elicitation protocol from Wikitongues (that was also used in Wikimedia Nigeria's language documentation project), the technical aspects from the OpenSpeaks toolkit and the workflow to add the multimedia content to Wikimedia Commons & Wikisource, that we will map in this project.

At the end of this pilot project, we hope to see more emerging communities feel empowered to share their cultural/linguistic knowledge on Wikimedia platforms using the toolkit that we will develop. This will become a new pathway to enter the Wikimedia movement and enrich it with new knowledge formats.

Project Plan




Tell us how you'll carry out your project. What will you and other organizers spend your time doing?

September 2021 - Initial research & Documentation & Toolkit translation

October 2021 - Training with a few community members in at-least 2 languages

November 2021 - Share with the larger Wikimedia community, encourage creating content in this new format and finalize the toolkit based on the feedback received

How will you let others in your community know about your project (please provide links to where relevant communities have been notified of your proposal, and to any other relevant community discussions)? Why are you targeting a specific audience?

What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

  • Create a culture documentation toolkit to promote documentation of Indian languages and culture by building on already existing toolkits, e.g. Wikitongues[3] and OpenSpeaks[4]
  • Translate the toolkit in at-least 2 Indian languages and engage with at-least 10 community members (newbies as well as veteran Wikimedians)

Are you running any in-person events or activities? If so, you will need to complete the steps outlined on the Risk Assessment protocol related to COVID-19. When you have completed these steps, please provide a link to your completed copy of the risk assessment tool below:

We are not running any in-person activities with more than 10 people and we will make sure to follow the local guidelines while engaging with the communities offline.



How will you know if the project is successful and you've met your goals? Please include the following targets and feel free to add more specific to your project:

  1. Number of total participants - 10
  2. Toolkit translated in number of languages - 2
  3. Language communities engaged - 2



What resources do you have? Include information on who is the organizing the project, what they will do, and if you will receive support from anywhere else (in-kind donations or additional funding).

  • User:Amrit Sufi (Research and Documentation specialist/Project Manager) is an educator of language and literature with 2 years of academic experience. As a researcher, she has published 2 research papers. She is a native speaker of Angika and also speaks Hindi and English as second and third languages.
  • User:Nitesh Gill (Community Coordinator) is a long-term Punjabi Wikimedian who has been engaging with various Indic communities in both professional and volunteer capacities. She is also studying for her PhD in Punjabi language and literature at the University of Delhi.
  • Daniel Bögre Udell is the co-founder of Wikitongues and as an advisor on this project, he will share the learnings from the recent Jewish Languages project, that they did in partnership with the Living Tongues. He will also share Wikitongues' metadata practices and language determination methods. He will help adapt the elicitation protocol from Nigeria and the Jewish Languages Project to the cultural context of the Indian sub-continent. Wikitongues will also help in archiving the oral culture videos at the U.S. Library of Congress and Internet Archive.
  • User:KCVelaga is an advisor and a learning partner on this project and he hopes to use the learnings of this project to explore potential technological solutions for streamlining the oral knowledge transcription workflow.
  • User:SGill (WMF) is the Program Officer, GLAM & Culture at the Wikimedia Foundation and he is an advisor on this project.

What resources do you need? For your funding request, list bullet points for each expense:

  • 1 full-time Research and Documentation Specialist - 50,000 INR per month * 3 = 1,50,000 INR
  • 1 part-time Community Coordinator - 25,000 INR per month * 3 = 75,000 INR
  • Translation cost = 20,000 INR
  • Remuneration for training participants = 10,000 INR
  • Travel support = 30,000 INR
  • Accommodation, when required = 10,000 INR
  • Internet support =  5,000 INR
  • Covid-19 safety supplies = 2,000 INR
  • Miscellaneous = 10,000 INR
  • Total = 3,12,000 INR (~4203 USD)




  •   Support This is definitely an interesting project. It would not only be beneficial to have a workflow that uses Wikimedia projects for the documentation but explore (rather rethink) the forms and the sources of content Wikisource can host. KCVelaga (talk) 06:16, 7 August 2021 (UTC)
  •   Strong support! Daniel from Wikitongues here. I'm thrilled to see efforts to make contributing language documentation and oral knowledge to Wiki projects a more accessible process. This will help to improve linguistic and cultural gaps on Wikipedia and Commons. I'm very happy to incorporate our work from Wikimedia Nigeria's Oral History project and strongly support this project. Bogreudell (talk) 15:53, 9 August 2021 (UTC)
  •   Strong support This an amazing way of preserving our cultural values for the next generation. We should encourage more of this intiative.Olaniyan Olushola (talk) 13:23, 14 August 2021 (UTC)
  •   Strong support<nowiki> Oral documentations of ours folklore and lingua are so important specially in this modern era. So I find this project very useful and it should proceed further.Gaurav Jhammat (talk) 03:50, 18 August 2021 (UTC)
  •   Support Jagseer S Sidhu (talk) 07:15, 16 August 2021 (UTC)
  •   Support I'm very excited about seeing the community take on this very important work. Strong support! Tnegrin (talk) 16:06, 31 August 2021 (UTC)
  •   Support fantastic project with great potential for further application across other oral communities. Doctor 17 (talk) 03:07, 12 September 2021 (UTC)