Grants talk:Project/0x010C/LinguaLibre

Latest comment: 6 years ago by 0x010C in topic MediaWiki extension?

Eligibility confirmed, round 2 2017 edit

 
This Project Grants proposal is under review!

We've confirmed your proposal is eligible for round 2 2017 review. Please feel free to ask questions and make changes to this proposal as discussions continue during the community comments period, through 17 October 2017.

The committee's formal review for round 2 2017 begins on 18 October 2017, and grants will be announced 1 December. See the schedule for more details.

Questions? Contact us.

--Marti (WMF) (talk) 02:42, 4 October 2017 (UTC)Reply

Linguisitic diversity of the LinguaLibre community edit

Hi and thank you for this project :) You have gathered a lot of community support, but it feel likes it is very centered around France and French-speaking communities, and the outreach efforts reinforce that. Is it a conscious choice (waiting for the tool to be easier to use before extending to a larger audience) ? Could you tell us the language(s) in which LinguaLibre is available ? Léna (talk) 20:12, 23 October 2017 (UTC)Reply

The two volunteer developers who successively took charge of this internationalization task got both catch up by major private life changes. We would indeed appreciate a financial support (ie this grant) to kick this critical bottle neck. Yug (talk) 08:47, 24 October 2017 (UTC)Reply
To complete what Yug just write, the fact is that today the UI of the current version of Lingua Libre is only in French. Because of this, it is very difficult for non-French speakers to test and adopt LinguaLibre as is. So, "is it a conscious choice? Partially yes. We essentially outreached French-speaking communities because they are the only ones who can test the tool and see it's potential.
We wont make the same mistake twice. The i18n will be thought from the beginning, and will be one of our main focus for this second version of LinguaLibre, to be able to reach a larger audience.
Note however that we already have good contacts with motivated German, Spanish, Tamil,... speaking contributors, who will help us get in touch with their communities when we are ready for it!
0x010C ~talk~ 10:16, 24 October 2017 (UTC)Reply
The current PhP LinguaLibre was primarily made for French users, as a discrete pilote project to be used by few wiktionary passionates. For dying French dialects. French labels were hard coded in it.
We did not expected to have that much success among international wikipedians. We did not anticipate that many other wikipedians and communities would get enthusiastic about it. Yug (talk) 16:10, 24 October 2017 (UTC)Reply
General note: please prefer endangered languages rather than dying dialects. There is a large literature on that matter, including for example the UNESCO map for languages in danger   Noé (talk) 12:52, 25 October 2017 (UTC)Reply

Financial support from other organizations edit

Wikimedia France used to fully support LinguaLibre. The chapter being in a difficult situation right now, it is logical that you turn to a project grant this year. Do you plan to turn back into a WMFr's project once the chapter is back on their feet ? Also, in this grant, you talk about presentations local French languages ; have you tried to have financial support from the organizations supporting these languages ? Léna (talk) 20:12, 23 October 2017 (UTC)Reply

I just had a call with Xenophôn, which follows the project at Wikimédia France. WMFr will continue to support LinguaLibre after having overcome its actual difficulties (i.e. for further development, hosting the servers or organizing recording sessions). But as the project starts to become international, we wish to broaden our supports to other Wikimedia organizations (also to avoid being again in a situation like what happen this year at WMFr).
We already have some partnership with organizations like the Office pour la langue et la culture d'Alsace, the Office public de la Langue Basque, or the Agence pur le Picard, with whom we organize workshops and recording sessions.
I hope I was clear, don't hesitate to ask me for clarifications if it's not the case!
0x010C ~talk~ 17:12, 24 October 2017 (UTC)Reply

Wikimania travel costs edit

It feels that the Wikimania travel costs are low compared to other grants. I think it would be more realistic to increase this line, and to send two people in Wikimania, so you could have a LinguaLibre workshop during the whole event (at long as it is accepted by the Wikimania programm commitee of course). Léna (talk) 20:12, 23 October 2017 (UTC)Reply

  Done, it seems a good idea in fact to be two person to outreach LinguaLibre at Wikimania. — 0x010C ~talk~ 10:32, 24 October 2017 (UTC)Reply
That would be a good challenge to see how many different languages you can record a set of pre-determined words at Wikimania. :) Amqui (talk) 21:45, 26 October 2017 (UTC)Reply

Question and concerns edit

I read this interesting proposal and have some questions and concerns:

  1. What will be the audio recording workflow after project completion beginning from the recording of a wav file to final upload to Commons? There is no clear description of it.
  2. You are gong to develop an audio recording extension. Where is this extension intended to be used? Only on your own wiki? Or on any Wikimedia Wiki?
  3. Where will your wiki (including a wikibase instance) be hosted? On a Wkimedia server or on some external hosting service?
  4. You indicated that "... the recorded files will have to pass through our server". Is this the same server that will host the wikibase?
  5. Since your are going to eventually upload the recorded audio files to Commons, you will be probably need to to discuss this with the Commons community. Have you attempted such a discussion? Mass uploading files to Commons without explicit consent of the community will not be possible but you even have not notified Commons.
  6. However my main concern is that the project is too ambition and unrealistic. You are going to do a lot of difficult work like creating a new skin based on Timeless, which development has yet been not completed. The last time we were asked to provide around $100,000 just to finish the development of that skin, which we declined.

Ruslik (talk) 12:27, 26 October 2017 (UTC)Reply

Hi @Ruslik0:
  1. After a word is recorded, the WAV file is send to Lingua Libre, where it will be stashed. When the user has completed his/her recording session, he/she will be asked to review (= listen) all the audio files. Once validated, all the files will be queued to be converted in OGG and saved as local files. The user will then be asked if he wants to upload his recordings to Commons, and if so the files will be transferred from LinguaLibre to Commons using an OAuth authorization.
  2. It will first be used on LinguaLibre. If other wikis want to use it (including Wikimedia-wikis), they will be able to.
  3. LinguaLibre is currently hosted by Wikimédia France, who recently engaged itself to continue to host us as long as we want.
  4. Yes it is the same. The conversion part will be managed by the MW-extension.
  5. We talked multiple times with Commons trusted users about the best way to import the files, often directly on IRC or during IRL meetings (recently at the French Wikiconvention, or at the Hackathon in Vienna). Two Commons admistrators follows the project closely: one of them (VIGNERON) is part of our team, and an other (Thibaut120094) endorsed this grant. But forgetting to notify the Commons village pump of this grant proposal is a mistake, I admit.
    Edit: It's done yet.
  6. Sorry if I was not clear: We don't want to create a brand new independant skin, but use all the work already done by an existing skin and adapt it to fit our color theme and such little things to get our own visual identity. We will not redevelop a complete skin.
I hope that answers your questions and limits your concerns. — 0x010C ~talk~ 18:24, 26 October 2017 (UTC)Reply
One comment: It would be a good idea when we identify the name of the person being recorded if we could add the Wikimedia username so it can be attributed correctly, without making it mandatory for the speakers to have an account of course. Amqui (talk) 21:47, 26 October 2017 (UTC)Reply
I have a couple other suggestions I would like to make once this grant is approved. Should I send you a "features request" email? Amqui (talk) 00:32, 27 October 2017 (UTC)Reply
@Amqui: That's what we want to do during the step-by-step recording process, collect as metadata as the user is ready to give. As we will use exclusively Wikimedia accounts through OAuth to connect to LinguaLibre, we will be able to collect the Wikimedia username of the speaker.
If you have any suggestions, I'm interested! You can just write them down here in a dedicated section  0x010C ~talk~ 09:31, 28 October 2017 (UTC)Reply
I agree with this last comment by 0x010C, it's important that each upload be connected to a unique account by the actual person making the recording. This way, if some content problem arises, only that user will be targeted, rather than the entire OAuth application. (This is a regular problem with OAuth applications in Commons, so you've been advised well.) Nemo 12:37, 31 October 2017 (UTC)Reply
I don't know if that's what you are saying or not, but the files should be connected to the account of the person doing the recording, not the speaker since they may or may not be Wikimedians, with proper attribution of course. Amqui (talk) 15:17, 31 October 2017 (UTC)Reply
I had not heard before about this distinction between speaker and recorder. How comes the two roles are distinct, does the system require some very special or complicated technology? You also seem to be assuming that the person doing the recording has most of the responsibility, which was not what I expected. By the way, are you saying that you expect copyright to belong to the person who does the recording? --Nemo 16:01, 1 November 2017 (UTC)Reply

Wiktionary usage edit

Compared to some previous proposal which asked a grant to record audio and store them "somewhere", I like the section "Reuse on Wikimedia wikis" here because it promises a method transparent to the users which will upload suitable files to Wikimedia Commons. However two questions remain unanswered, as far as I can see:

  1. by what method the files will be put into actual use on the French Wiktionary or other Wiktionary subdomains;
  2. how this could (or couldn't) benefit from the previous PronunciationRecording extension effort.

--Nemo 12:34, 31 October 2017 (UTC)Reply

The files will not be only useful to Wiktionaries, but potentially to all Wikimedia projects, especially if the project to put lexicographical content on Wikidata goes forward. For instance, I already used files from Lingua Libre on the French Wikiversity there: [1].
Example of a use of Lingua Libre files on the French Wiktionary there: [2].
Furthermore, the files themselves on Wikimedia Commons are useful pedagogic content like any pictures there that are not used on Wikimedia projects right now for example. They can be reused on external projects as well.
Thanks, Amqui (talk) 15:21, 31 October 2017 (UTC)Reply
Hi @Nemo bis:
  1. As said in the Reuse on Wikimedia wikis section, I will develop a bot that will look for freshly imported pronounciation files, and which will add them to the Wiktionnary entry of the linked word, in the subsection Prononciation (of the corresponding language, if many for this word). Many contributors of the French Wiktionary are motivated by this, that's why we will start with this wiki, but we wish to deploy the bot on as many Wiktionaries as possible.
    Furthermore, we will also work with other Wikimedia projects (like Wikidata, Wikipedias or Wikiversity as Amqui said) which can also benefit from audio recording (for example, fill the P443 property on Wikidata).
  2. I'm not sure we can get something out of the incomplete Pronounciation Recording Gadget ; in fact, we already have a JS recording studio, which works very well...
0x010C ~talk~ 22:05, 31 October 2017 (UTC)Reply
Amqui, as I said in my question I'm certainly happy that files get to Wikimedia Commons. 0x010C, ok, I had read the part about developing a bot but it wasn't clear to me that the community had already agreed with this plan: maybe you could specify it in the same section. --Nemo 15:52, 1 November 2017 (UTC)Reply
Hi! JackPotte already did a bot to automatically add pronunciation to French Wiktionary   Noé (talk) 16:03, 2 November 2017 (UTC)Reply
But my script should be deeply refactored for the other wiktionaries. JackPotte (talk) 16:43, 2 November 2017 (UTC)Reply

Aggregated feedback from the committee for LinguaLibre edit

Scoring rubric Score
(A) Impact potential
  • Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both?
  • Does it have the potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
7.3
(B) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
7.3
(C) Ability to execute
  • Can the scope be accomplished in the proposed timeframe?
  • Is the budget realistic/efficient ?
  • Do the participants have the necessary skills/experience?
6.5
(D) Measures of success
  • Are there both quantitative and qualitative measures of success?
  • Are they realistic?
  • Can they be measured?
6.5
Additional comments from the Committee:
  • The project fits with Wikimedia's strategic priorities and has a great potential for online impact. However its long term sustainability is less clear.
  • The possible impact of this project is considerable big moreover it can be scaled and adapted outside Wikimedia (through Mediawiki extension for example).
  • The approach is innovative. The potential is great but so are the risks.The success can be measured.
  • Innovative approach based on previous experiences and projects. Very clear measures of success and a potential for a large long-term impact.
  • I have doubts that the project can be accomplished in the specified timeframe with the requested budget. It includes many complicated tasks like skin and extension development and a failure to successfully complete any of them will lead to failure of the whole project.
  • Budget is extensive and detailed. Ability to execute seems assured but dubious about time-frame, delays should be considered.
  • The community engagement is minimal. However it will be critically important if any recorded audio files are to be uploaded to Commons.
  • Community seems interested and there is a plan to engage the whole community.
  • The project seems to be unrealistic: it is planned to develop a skin and an extension, to setup a server with running MediaWiki instance (with a new skin and Wikibase instance) and to setup an uploading process to Commons. And all this in less than 5 months and on a budget of 30,000 euro. Unless the scope is reduced to more realistic one the project should not be funded.
 

This proposal has been recommended for due diligence review.

The Project Grants Committee has conducted a preliminary assessment of your proposal and recommended it for due diligence review. This means that a majority of the committee reviewers favorably assessed this proposal and have requested further investigation by Wikimedia Foundation staff.


Next steps:

  1. Aggregated committee comments from the committee are posted above. Note that these comments may vary, or even contradict each other, since they reflect the conclusions of multiple individual committee members who independently reviewed this proposal. We recommend that you review all the feedback and post any responses, clarifications or questions on this talk page.
  2. Following due diligence review, a final funding decision will be announced on Thursday, May 27, 2021.
Questions? Contact us at projectgrants   wikimedia  · org.


Small remarks edit

Hi,

First of all, thanks for all your feedbacks!

I just wanted to clarify shortly some points concerning the last comment:

  • Like I responded to Ruslik0 on this talk page "We don't want to create a brand new independent skin, but use all the work already done by an existing skin and adapt it to fit our color theme and such little things to get our own visual identity. We will not redevelop a complete skin." Moreover, this is not a blocking point for the project. Again, sorry if this part of the grant proposal was not clear enough...
  • As written at the top of the budget section, the project is planned to last almost 7 months (27 weeks), and not "less than 5 months". Furthermore, I will continue to invest myself on this project on my volunteer time after the end of the grant period, as I did before.

Best regards — 0x010C ~talk~ 23:08, 17 November 2017 (UTC)Reply

MediaWiki extension? edit

Hello, and apologies for submitting these comments so late in the Grant process, I hope they are useful considerations nonetheless :)

First off, let me say how heartening it is to see the continued development of Lingua Libre. It would have been a sad but understandable outcome of WMFR turmoil that the project would slow down or stop. Putting this grant request together speaks for the passion and dedication of the people involved in LinguaLibre, and I’ll look forward to seeing this project continue :)

I have several question regarding the technical choices outlined in this grant request.

  • Based on my reading of the grant request, I understand you are not planning to have this future MediaWiki extension deployed in the Wikimedia cluster (on Commons or else) − can you confirm that I got this right?
  • Regarding authentication: you are planning to let users of your wiki use their SUL account for logging-in. Are there any existing MediaWiki instances, not hosted on Wikimedia cluster, that use OAuth this way?
  • You indicate that “Direct upload from the user to Commons is not possible, for two main reasons: Lingua Libre is an external tool (we so have Same-origin policy issues)”, which I don’t understand: many external tools (either web, desktop or mobile) upload files to Commons. If the issue is linked to the upload_by_url whitelist, then surely the LinguaLibre domain could be whitelisted. Can you clarify what I’m missing?
  • More generally, I am sceptical of the decision to develop as a MediaWiki extension − among other things based on my experience as a member of the steering committee of the Extension:GWToolset, whose development as an extension was (late in the project) seen by some as a mistake. These posts by Erik Moeller (then WMF VP of Product) in October 2014 and February 2015 shed some light on that reasoning. These arguments might have less weight given that the Lingua Libre extension is not expected to be deployed on the Wikimedia cluster but still sound to me applicable. I don’t really understand the arguments you raise in the second section to justify that move:
    • experienced Wikimedians are intensive users of many external tools (especially deployed in the Wikimedia Cloud) ;
    • GWToolset being a MediaWiki extension did not particularly help the sustainability of the project − some might argue that it significantly slowed it down ;
    • I’m not sure what you mean either by “the daily-management will be ensured by contributors themselves (like on Wikimedia projects)”. I understand that some of the configuration will live in the tool itself − which is good goal, but which could be achieved in an external tool as well.
    It does make more sense with the use of Wikibase and SPARQL that you further detail, although I am not familiar enough with Wikibase to correctly assess whether it fits your needs or whether Wikibase is easy to develop with.
  • It is unclear to me how you plan to interact with the future Structured Data on Commons (if at all). Do you expect the SDoC metadata model to be complex enough to hold the complex metadata you have? And once SDoC is deployed, do you plan to then move part or totality of your metadata to Wikimedia Commons? Also, I am not totally clear on SDoC timeline but the current documentation hints at an early extension being deployed as early as 2018.
    That being said, I don’t necessarily think that putting SDoC development as a potential blocker for Lingua Libre to move along would be wise either :)

Again, apologies for submitting these comments so late in the process − I hope they help nonetheless :)

Jean-Fred (talk) 12:24, 18 November 2017 (UTC)Reply

Hey @0x010C:, congratulations on getting the grant! I look forward to the future development of LinguaLibre :)
I was wondering whether you had had the chance to read my questions?
Cheers, Jean-Fred (talk) 21:47, 1 January 2018 (UTC)Reply
Hi @Jean-Frédéric:, sorry for the late reply. I did not see your questions until late, then personal obligations made me forget to answer them... So here are my answers:
  • Yes I confirm, it is not the purpose of this grant and it is not planned at short or medium term.
  • If so, I'm not aware of it, but there is an existing mediawiki extension that allows it, see mw:Extension:OAuthAuthentication.
    Edit: There is https://commonsarchive.wmflabs.org which use OAuth this way.
  • By “Direct upload from the user to Commons is not possible, for two main reasons: Lingua Libre is an external tool (we so have Same-origin policy issues)”, I was mentionning direct upload to commons from the client side (using AJAX) on an other webside (Lingua Libre). Furthermore, we want to have a simple and automated recording/upload workflow, that's why using upload_by_url is not desired.
  • I missed that WAV was in fact allowed, but as OGG is still prefered on commons (and is much smaller without noticeable difference), so I think it is preferable to use it.
  • in a few words, using a Mediawiki infrastructure instead of developping an external tool will allow contributors (the other members of the lingua libre team, or any wikimedian) to create and edit campains, help pages, word lists, configuration options, stats, gadgets,... in a wiki way and without having to redevelop many things.
  • When I started this proposal (August - September 2017), SDoC was a cool idea which may come one day. At this time, the best launching date I had see was maybe in 2020 or so, without any technical detail about it. That's why I do not mention it very much in this proposal.
    Yet, this project seems to be more active with a much clear timeline. I currently don't have enough details on it, but I will follow closely its development and deployment to integrate LinguaLibre with it.
For information, we will have regular video-meetings with the lingualibre team, all the details will be send on the lingualibre mailing list, if you wish to don't hesitate to join us  .
I hope this answers most of your questions, and again, apologies for the late reply.
0x010C ~talk~ 14:02, 11 January 2018 (UTC)Reply

Round 2 2017 decision edit

 

Congratulations! Your proposal has been selected for a Project Grant.

The committee has recommended this proposal and WMF has approved funding for the full amount of your request, 30 600 EUR (35 990 USD)


Next steps:

  1. You will be contacted to sign a grant agreement and setup a monthly check-in schedule.
  2. Review the information for grantees.
  3. Use the new buttons on your original proposal to create your project pages.
  4. Start work on your project!

Upcoming changes to Wikimedia Foundation Grants

Over the last year, the Wikimedia Foundation has been undergoing a community consultation process to launch a new grants strategy. Our proposed programs are posted on Meta here: Grants Strategy Relaunch 2020-2021. If you have suggestions about how we can improve our programs in the future, you can find information about how to give feedback here: Get involved. We are also currently seeking candidates to serve on regional grants committees and we'd appreciate it if you could help us spread the word to strong candidates--you can find out more here. We will launch our new programs in July 2021. If you are interested in submitting future proposals for funding, stay tuned to learn more about our future programs.


Congratulations, @0x010C:, @Xenophôn:, @Lyokoï: and @Yug:! I'm happy that your proposal was funded :) Cheers, --Jcornelius (talk) 16:45, 19 December 2017 (UTC)Reply

Return to "Project/0x010C/LinguaLibre" page.