Grants talk:Project/Automatic Extraction of Multi-lingual Text and Concept Similarity

Comments of Glrx

Latest comment: 7 years ago1 comment1 person in discussion

I would decline this proposal. Is this a research project or a software development project? In either case, the proposal does not provide concrete examples of what it will attempt to do. It sounds like it wants to do research, but then it talks about benefiting WP users by taking one article and comparing it to another; failing that, it would find similar articles. The proposal does not follow the given advice about describing contemplated tests and measurements.

What is the similarity metric? What is the training corpus? What are the parameters? See en:Word2vec.

The implication is this project will find similar topics when person A writes one article in en and person B writes a similar article in ru. I need some explanation about how that matching will work and how accurate the matching will be. In practice, person A might write about jellyfish, and person B might write about медуза; in that case, finding similar articles may not be hard; word matching in titles may suffice. I want to see some rationale that argues there are many cases when more sophisticated techniques are needed to find matching articles.

There may be other techniques beyond searching for similar words. If two articles reference the same DOI, then they may cover similar topics.

Furthermore, the goal does not seem like a problem that WP needs solved right now. Say a WP user uses such a tool. What does she do next? If two pages are similar, is the intent to make an interlanguage link? In that case, the user should probably have skills in both languages, and that means she may be able to judge or find the similar pages herself. If there is no matching page, then the user would need multilingual skills to translate the page.

The project does not have a well defined scope.

Glrx (talk) 23:56, 1 March 2017 (UTC)Reply

Similarity measure

Latest comment: 7 years ago1 comment1 person in discussion

This is a similarity measure between texts based on a Markov model. It is pretty straight forward to do, the tricky part is aligning the language models. I would say split the project in two, where one part focus solely on generalizing building of aligned language models. The whole thing is a lot more implementation than research. One year on this seems little to me, unless some of the work is already done.

The most interesting use of this is to be able to detect articles diverging in content on different languages, and this is a real problem that needs a proper solution. — Jeblad 13:57, 1 April 2017 (UTC)Reply

Eligibility confirmed, round 1 2017

Latest comment: 7 years ago1 comment1 person in discussion

This Project Grants proposal is under review!

We've confirmed your proposal is eligible for round 1 2017 review. Please feel free to ask questions and make changes to this proposal as discussions continue during the community comments period, through the end of 4 April 2017.

The committee's formal review for round 1 2017 begins on 5 April 2017, and grants will be announced 19 May. See the schedule for more details.

Questions? Contact us.

--Marti (WMF) (talk) 19:53, 27 March 2017 (UTC)Reply

Wikidata

Latest comment: 7 years ago1 comment1 person in discussion

You write "C. Detection of erroneous links between concepts in different languages." We don't link concepts in different languages against each other but linking every concept to the right Wikidata item. The fact that this grant proposal doesn't say the word Wikidata at least a single time suggests to me that the writers of the proposal haven't thought enough about how their proposal interacts with the existing architecture. ChristianKl (talk) 14:54, 29 May 2017 (UTC)Reply

Round 1 2017 decision

This project has not been selected for a Project Grant at this time.

We love that you took the chance to creatively improve the Wikimedia movement. The committee has reviewed this proposal and not recommended it for funding. This was a very competitive round with many good ideas, not all of which could be funded in spite of many merits. We appreciate your participation, and we hope you'll continue to stay engaged in the Wikimedia context.

Next steps: Applicants whose proposals are declined are welcome to consider resubmitting your application again in the future. You are welcome to request a consultation with staff to review any concerns with your proposal that contributed to a decline decision, and help you determine whether resubmission makes sense for your proposal.

Over the last year, the Wikimedia Foundation has been undergoing a community consultation process to launch a new grants strategy. Our proposed programs are posted on Meta here: Grants Strategy Relaunch 2020-2021. If you have suggestions about how we can improve our programs in the future, you can find information about how to give feedback here: Get involved. We are also currently seeking candidates to serve on regional grants committees and we'd appreciate it if you could help us spread the word to strong candidates--you can find out more here. We will launch our new programs in July 2021. If you are interested in submitting future proposals for funding, stay tuned to learn more about our future programs.

Aggregated feedback from the committee for Automatic Extraction of Multi-lingual Text and Concept Similarity

Scoring rubric	Score
(A) Impact potential Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both? Does it have the potential for online impact? Can it be sustained, scaled, or adapted elsewhere after the grant ends?	4.6
(B) Community engagement Does it have a specific target community and plan to engage it often? Does it have community support?	3.9
(C) Ability to execute Can the scope be accomplished in the proposed timeframe? Is the budget realistic/efficient ? Do the participants have the necessary skills/experience?	4.3
(D) Measures of success Are there both quantitative and qualitative measures of success? Are they realistic? Can they be measured?	2.7
Additional comments from the Committee: The proposal may have some relations to the Wikimedia's strategic priorities although its online impact and sustainability is hard to assess as there are no well defined goals, measures of success or plan. The only (distant) fit with strategic priorities is via encouraging innovation. The potential for online impact is low. The proposal lacks well defined goals and measures of success therefore the risks seem to be very high. It is an innovative project, but there is a high risk of this project having no interest beyond a scientific one, and measures of success are vague. The budget is actually lacking. The ability of the applicant to execute the project is hard to access as there is no suitable track record. The applicant has little Wikimedia related experience -- only six edits across all Wikimedia projects! The budget is unclear. I do not know what they want to cover. Most likely this will be executed within 12 months, and participants will be qualified enough to do this, but not sure the budget is really efficient. Very little community involvement. No clear community support. No community engagement. The potential grantee does not seem to be active in the Wikimedia movement and may have trouble engaging the community. The proposal is generally very poorly written - no clear goals, plan, budget or measures of success. I want to recommend rejection. I don't know how relevant this is to the community but very little is known about the grantee and I’m not quite comfortable with this grant. There needs to be more community engagement and a clear budget before funding this grant. However, the proposed subject is interesting and answers a real need. This development seems to have more of a focus on scientific research than Wikimedia projects. It is not clear whether the development will have any significant practical use, as there is no proof such a feature is wanted by the community. In addition, measures of success are vague, community engagement is almost nonexistent. Thus I do not support funding.

Add topic