Lingua Libre/GSoC24
{Intro here}
Context
editLingua Libre is a notorious open source project within the Wikimedia Foundation.
Lingua Libre SignIt is single page app audio recording tool helping to document languages of the world. Created in 2018 with a Wikidata-like database, the stack proven too hard to maintain by wikimedian volunteers.
Application
edit{Pushkar report summary here}
editImpact
editThe GSoC24 and Pushkar revamp using Django+Vuejs provided a critical boost for the Lingua Libre volunteers developpers. It put the project back into safe, sustainable territory. We are already seeing the impact with more volunteer Django developers manifesting interest to contributing this open source project in past 4 months compared to past 4 years.
User interface
edit-
Home page.
-
Login.
-
Step 1: Add new locutor or update existing locutor with their demographic, geographic and linguistic informations.
-
Step 2: Select a language from your locutor's languages.
-
Step 3: Hardware check, audio for oral languages or video for signed languages. If you selected a sign language in previous step this step checks for your camera, else it checks for your microphone
-
Step 4: Word list from local list, nearby elements, categories and external tools word generators are also done
-
Step 5: Record audio or video as per your language selected
-
Step 6: Review your recordings, you can select which recordings you want to publish, and published recordings are uploaded to the backend
WMFr GSoC24 delivery meetup
edit- Title: WMFr GSoC24 delivery meetup
- Date: 2024.09.18 , 10:am.
- Participants: User:Xavier Cailleau WMFr , User:Michael Barbereau WMFr, Yug
Reception
editWe acknowledged the state of LinguaLibre Django codebase at the end of the GSoC24 :
- WMFR was reminded and is thankful to Poslovitch for the solid ground provided (Winter 2023-24), with healthy structure, unit tests, and detailed documentations.
- WMFR was showcasted and is thankful to Pushkar laying the wall and roof (GSoC24), with step 1-6 (re)implementations, new design, new features (rich list, share list), expanded VueJs data flow, and support for Alpha deployment.
- We noted the overall state of https://dev.lingualibre.org, with (1) major advances but also (2) multiples local shortcomings to fix.
Django production deployment
editWMFR is planing the next phase, ideally later this Fall or this coming Winter, aiming for putting online and shifting to the new codebase. This Winter 2024-2025 sprint will include major fronts. All hands are welcomed since a lot of different skills will be needed.
- Django/VueJS : deep review (data flow integrity), fixes local shortcomings (i18n, few behaviors), tests and documentation updates.
- Data migration :
- Recordings data from LinguaLibre.org's wikibase toward Commons Wikibase ;
- Users data from Lingualibre.org to MariaDB.
- Wikipages migration : migrations of wikipages to Commons. Bulk migration is possible but needs approval within Commons Wikimedia.
- Lists migration : migration within a subdomain of Commons.
- Landing website : isolate the static website.
Budget is available for coding freelance, likely by multiple smaller missions of few 1000s €. Available sub-budget and priorities hopefully clarified around November 2024.
More front work in needed
edit- Properly test Alpha
- Properly plug-in with live Wikimedia Commons API for Oauth and uploads
- Implement i18n on translatewiki.net, gitlab, and in the code.
Database migration is needed
edit- Properly migrate items' data such as writing, language, etc to Commons.wikimedia.org
- Properly migrate speakers' data such as learning place, language proeficiency, wikimedia username to local MariaDB database.
Healthy efforts to lead
edit- Update unit tests
- Update documentation ./doc
- Migrate statistics system
- Migrate downloadable dataset system
2025 sprints
editNo certainty for a GSoC25, to be decided in early 2025. Critically needs tech mentors.
Name | Stack | Workload | Possible mission | Possible tech mentor(s) |
---|---|---|---|---|
LinguaLibre | Django + VueJS | 350h | improve datasets, statistics, sound library. | ? |
LinguaLibre | Django + VueJS | 150h? | Help:Toolforge/My first Django OAuth tool | |
Machine learning | PyTorch, Django | 350h | collaboration with WikiSpeech (WMSE) to create a pipeline from Lingualibre training data to text2speech models and online API service. | ? |
Lingua Libre Bot | PyWikiBot | ? | refactor the python bot for easier multilingual | ? |
Flex / FieldWorks (?) | C/C++/Django | ? | collaboration with leading lexicographic software to ease co-integration https://github.com/sillsdev/FieldWorks | ? |
Additional resources
edit- Lingua Libre Django
- Phabricator master task: T361440
- GSoC24 Lingua Libre SignIt > Mid-term mentoree's post: « Halfway through the GSOC journey with Lingualibre »
- GSoC24 Lingua Libre SignIt > Final mentoree's post: « Navigating through the GSOC journey »
- GSoC24 Lingua Libre Django > Reports : « Lingua Libre v3.0 enhancement and migration »
- SignIt
- Phabricator master task: T361550
- Lingua Libre SignIt > Final post: GSoC 2024 Summary with the Wikimedia foundation »
- GSoC24 Lingua Libre SignIt > Evaluations : https://summerofcode.withgoogle.com/organizations/wikimedia-foundation-nd/projects/details/88IXaP0O »