Grants:Programs/Wikimedia Research Fund/Using Computational Linguistics to Generate Systematic Reviews for WikiProjects: A Prototype for Invasion Biology
Affiliation or grant type
Research proposal Edit
Description of the proposed project, including aims and approach. Be sure to clearly state the problem, why it is important, why previous approaches (if any) have been insufficient, and your methods to address it.
In this project, we aim to develop Natural Language Processing (NLP) tools to support the automated review of scientific publications and related data in a given knowledge domain for the purpose of creating and updating materials that can assist contributors in the maintenance and improvement of content on Wikimedia projects. Our current method involves the use of n-grams to identify relevant articles within a large cluster of publications, and we plan to improve this method further by incorporating additional quantitative measures, which we will test with several use cases drawn from WikiProject invasion biology. This will allow us to more efficiently and effectively identify relevant literature on a chosen research topic.
With the growing amount of scientific literature being published and made available online, it can be difficult to identify research relevant to a specific topic. NLP techniques can help create and maintain science-related content on Wikimedia projects like Wikipedia. By using computational methods, Wikimedia projects can more easily keep their science content up to date and accurate.
Creating, maintaining, or updating Wikimedia content on a scientific topic often involves conducting a literature review, similar to writing a Systematic Review (SR) for a journal. SRs are a set of techniques that allow researchers to gather and summarize scientific papers in a consistent and reproducible way. When two researchers independently conduct an SR on the same topic, they should arrive at a similar selection of articles for their final review. This helps to ensure the reliability and reproducibility of the review process.
The step of curating publications for an SR needs to be improved because currently, the articles selected for SRs are mostly chosen manually. In order to improve SRs, we will optimize our current methods and explore a score system based on n-grams that can reduce a large cluster of publications (LCP) like that of WikiProject Invasion biology (with currently ca. 40,000 publications) to a smaller one (SCP). By comparing publications within and outside of the SCP, we can estimate parameters like semantic similarity and relatedness across hundreds or thousands of documents. The primary test cases will be an invasive species, an invaded locality and a specific invasion type, complemented with additional examples as needed.
Approximate amount requested in USD.
Briefly describe what you expect to spend money on (specific budgets and details are not necessary at this time).
This budget supports software and data aspects of integrating NLP workflows for Systematic Reviews with Wikidata. It also covers preparation of the results for public dissemination and reporting to Wikimedia community and Wikimedia Foundation.
Besides the applicants, it supports a PhD candidate at University of São Paulo to perform some of the NLP-related tasks.
Address the impact and relevance to the Wikimedia projects, including the degree to which the research will address the 2030 Wikimedia Strategic Direction and/or support the work of Wikimedia user groups, affiliates, and developer communities. If your work relates to knowledge gaps, please directly relate it to the knowledge gaps taxonomy.
This project aligns with three points of the 2030 Wikimedia Strategic Direction, which are:
(1) Improve User Experience
(2) Manage Internal Knowledge
(3) Innovate in Free Knowledge
Plans for dissemination.
• Continuing our presentations through Wikimedia Research-related venues.
• Resulting scholarly publications in open-access journals.
• A PhD candidate will prepare and provide short presentations about this work in different institutes at University São Paulo.
• Aggregation of researchers who have already produced Systematic Reviews and wish to re-assess their work.
• Aggregation of MSc and PhD candidates willing to produce a brand new Systematic Review using proposed methods and codes.
Past Contributions Edit
Prior contributions to related academic and/or research projects and/or the Wikimedia and free culture communities. If you do not have prior experience, please explain your planned contributions.
Two applicants (Rasberry and Mietchen) are active long-term Wikimedia contributors, while Andutta (User:Fpa1981) started contributing more recently. We started this project about a year ago, with only a few presentations since in the following Wiki conferences:
I agree to license the information I entered in this form excluding the pronouns, countries of residence, and email addresses under the terms of Creative Commons Attribution-ShareAlike 4.0. I understand that the decision to fund this Research Fund application, the application itself along with all the information entered by my in this form excluding the pronouns, country of residences, and email addresses of the personnel will be published on Wikimedia Foundation Funds pages on Meta-Wiki and will be made available to the public in perpetuity. To make the results of your research actionable and reusable by the Wikimedia volunteer communities, affiliates and Foundation, I agree that any output of my research will comply with the WMF Open Access Policy. I also confirm that I have read the privacy statement and agree to abide by the WMF Friendly Space Policy and Universal Code of Conduct.
Please add any feedback or endorsements to the grant discussion page only. Any feedback added here may be removed.