University of Virginia/WikiCite for education

This page documents a completed research project.


WikiCite for education is a project to better curate the metadata for academic publications in the field of education. We estimate that in all academic journals and other scholarly repositories there are 1-2 million research publications on education. Analyzing publications in academic journals is easiest because those papers share the most similarity in format, but this project may consider other documents including white papers, impact research, practice recommendations, preprints, or research notes.

The primary objective of the project is to recommend appropriate academic papers when a user describes what kind of research they want. This project will achieve this through topic tagging to identify the subjects of papers. Secondary outcomes of this project include curating the corpus of papers to analyze, describing social and ethical challenges to creating a cataloging system, and documenting the general process well enough to be a model for curating scholarly literature in any field and not only in education.

Challenge edit

Curation at scale is an important research direction to accelerate the availability of professional and academic research in all fields, including education. While better funded fields including medicine have been able to fund high quality manual curation of their research for decades, other fields including education have research collections with less annotation. Funding manual curation curation in education as a stand alone solution would be prohibitively expensive. However, technology has advanced such that compiling a sample of well curated education research could be the start of a machine learning process to automatically tag the rest of the papers at significantly lower resource cost than ever before possible.

By better cataloging papers, we make them more accessible to the benefit of any researcher who wants to quickly determine whether and where they can find research on any given topic.

Objective edit

  1. Consider all collections of academic papers in education
  2. Gather or create sufficient library cataloging data to describe the subjects for a subset of these papers
  3. Use technology to produce topic tags to further describe the rest of the papers
  4. Publish these terms to Wikidata through the WikiCite project
  5. Develop functionality which supports exploration of this content within WikiCite tools

Data edit

Deliverables edit

  1. a novel collection of education topic labels applied to academic papers in that field
  2. publication of the same in Wikidata
  3. Development of the Wikidata environment which increases access and use of this data
    1. Queries for browsing academic literature in Wikidata
    2. Integration with other Wikidata scholarly cataloging efforts, including the WikiCite project, author disambiguation, and association of papers and research with the author's institutions

Project summary edit

Analyzing the research literature and navigating Wikidata proved to be too complicated. Although the original idea was for students, faculty, and external collaborators to all be highly engaged in Wikidata, instead what happened was that the project proceeded as follows:

  1. The student research team at the University of Virginia did machine learning analysis on a corpus of scholarly publications in ERIC and What Works Clearinghouse, published a data schema of the topics covered by those publications, and documented their research in Data Schema to Formalize Education Research & Development Using Natural Language Processing (Q113057459)
  2. Another research team at InnovateEDU used their data schema to plan their own efforts to make educational research more accessible
  3. Daniel Mietchen and Lane Rasberry integrated parts of that same data schema into WikiCite to make it accessible in the output of Scholia for users searching education topics

Note: the Wikimedia editors can speak to wiki aspects of this project, while other team members did everything else especially the months of analysis which resulted in the creation of the data imported to Wikidata.

Research Team edit