University of Virginia/WikiCite for education
WikiCite for education is a project to better curate the metadata for academic publications in the field of education. We estimate that in all academic journals and other scholarly repositories there are 1-2 million research publications on education. Analyzing publications in academic journals is easiest because those papers share the most similarity in format, but this project may consider other documents including white papers, impact research, practice recommendations, preprints, or research notes.
The primary objective of the project is to recommend appropriate academic papers when a user describes what kind of research they want. This project will achieve this through topic tagging to identify the subjects of papers. Secondary outcomes of this project include curating the corpus of papers to analyze, describing social and ethical challenges to creating a cataloging system, and documenting the general process well enough to be a model for curating scholarly literature in any field and not only in education.
Challenge
editCuration at scale is an important research direction to accelerate the availability of professional and academic research in all fields, including education. While better funded fields including medicine have been able to fund high quality manual curation of their research for decades, other fields including education have research collections with less annotation. Funding manual curation curation in education as a stand alone solution would be prohibitively expensive. However, technology has advanced such that compiling a sample of well curated education research could be the start of a machine learning process to automatically tag the rest of the papers at significantly lower resource cost than ever before possible.
By better cataloging papers, we make them more accessible to the benefit of any researcher who wants to quickly determine whether and where they can find research on any given topic.
Objective
edit- Consider all collections of academic papers in education
- Gather or create sufficient library cataloging data to describe the subjects for a subset of these papers
- Use technology to produce topic tags to further describe the rest of the papers
- Publish these terms to Wikidata through the WikiCite project
- Develop functionality which supports exploration of this content within WikiCite tools
Data
edit- ERIC academic records through a Jupyter notebook
- WikiProject Source Metadata - though this project, Wikidata has its own index of most PubMed-indexed papers
- WikiProject Scholia -through this project, Wikidata facilitates query of its academic publication catalog
Deliverables
edit- a novel collection of education topic labels applied to academic papers in that field
- publication of the same in Wikidata
- Development of the Wikidata environment which increases access and use of this data
- Queries for browsing academic literature in Wikidata
- Integration with other Wikidata scholarly cataloging efforts, including the WikiCite project, author disambiguation, and association of papers and research with the author's institutions
Project summary
editAnalyzing the research literature and navigating Wikidata proved to be too complicated. Although the original idea was for students, faculty, and external collaborators to all be highly engaged in Wikidata, instead what happened was that the project proceeded as follows:
- The student research team at the University of Virginia did machine learning analysis on a corpus of scholarly publications in ERIC and What Works Clearinghouse, published a data schema of the topics covered by those publications, and documented their research in Data Schema to Formalize Education Research & Development Using Natural Language Processing (Q113057459)
- Another research team at InnovateEDU used their data schema to plan their own efforts to make educational research more accessible
- Daniel Mietchen and Lane Rasberry integrated parts of that same data schema into WikiCite to make it accessible in the output of Scholia for users searching education topics
Note: the Wikimedia editors can speak to wiki aspects of this project, while other team members did everything else especially the months of analysis which resulted in the creation of the data imported to Wikidata.
Research Team
edit- authors of Data Schema to Formalize Education Research & Development Using Natural Language Processing (Q113057459), who did machine learning at the School of Data Science at the University of Virginia
- contributors from InnovateEDU, who were subject matter experts regarding what is useful to educators
- Wikimedia editors Lane Rasberry, user:bluerasberry and Daniel Mietchen user:Daniel Mietchen who imported data from this project to Wikidata