Grants:IdeaLab/Open Access Reader

status: idea
Open Access Reader
idea creator:
project contact:
A process to systematically have every piece of notable open access research cited in Wikipedia.
created on: 16:53, 3 March 2014

This is now proposed as a grant.

Project idea edit

What is the problem you're trying to solve? edit

There's lots of great research being published in good quality open access journals that isn't cited in Wikipedia. It's peer reviewed, so it should count as a reliable source. It's available for anyone to read and probably comes with pretty decent metadata too. Can we set up a process to make it super convenient for editors to find and cite these papers?

What is your solution? edit

Roughly speaking, my proposed solution works like this:

  1. Pick a respected major repository, e.g. PLOS, to trial this with. More can be added over time. Ideally, piggyback on another project that is trying to aggregate open access repositories, e.g. CORE.
  2. Create a notability filter that helps decide whether a given academic paper is likely to be notable, e.g. a minimum number of (academic) citations. There are various sources for this type of stat. We can make this filter more or less strict depending on community ability to cope with the output.
  3. Create a dictionary somewhere in Wikimedia that matches paper metadata to Wikiprojects e.g. documents in PLOS with the metadata keyword "Paleontology" will probably be relevant to Wikiproject Paleontology. If the metadata is fine enough, it may even be possible to match keywords to specific article talk pages. This dictionary will be open and editable so it will be possible for the community to help populate/correct it. If there is no obvious match, default behaviour will be to skip, i.e. papers with no obvious category will be ignored, so it will be very easy to start with a completely empty dictionary and begin slowly by adding one keyword at a time.
  4. Set up a process (i.e. bot) that regularly checks for new papers that pass the notability filter, and, using the dictionary, suggests them on the relevant Wikiprojects or talk pages as worth adding to articles. This can be a neat template that gives the abstract and a pre-formatted reference tag, making life very easy for editors. It could be appended to a pre-existing page or talk page (determined by the dictionary) or automatically create and append to a new page specifically created for the purpose, e.g. or similar. This would in effect be a regular newsletter of "latest research".

Project goals edit

Academic work is some of the best content out there, but even when it's open access, its discoverability is poor. However, one thing the Wikimedian community is great is distributed categorisation. Let's put this to work and get cutting edge academic work cited in our encyclopaedia!

This project will be primarily about adding extra content to existing articles, rather than creating new articles based on academic work. Therefore in fact notability is less important than finding articles with high relevance to the work, as the existence of an article pre-supposes notability of the topic. The key will be sufficiently specific meta-data supplied by the repository.

The first step will be checking the feasibility of each of these steps by socialising the proposal. The steps above will need elaboration and refinement, but the basic premise - systematic suggestion of academic materials for citations - is hopefully sound. This would lead to a more detailed specification.

Next, I'd try and get a minimal functioning end-to-end solution:

  • One source repository, ideally a friendly one who are aware of and support the initiative.
  • A very simple filter using the easiest available metadata, built in a modular way. Begin with emphasis on avoiding false positives.
  • A small manually populated dictionary covering just one subject area.
  • A single wikiproject on en-wp who are aware of and support the initiative.

Then we can test the workflow in a controlled way.

I'll publicise this experiment at Wikimania, and try and get volunteers to improve and expand the scope of each step:

  • Adding additional source repositories.
  • Better filters that
    • more accurately identify notable papers
    • filter out work that's already cited in wp to prevent duplicated suggestions
  • A dictionary that usefully copes with
    • Multiple source repositories
    • Different levels of precision of metadata (medicine vs surgery vs angioplasty vs percutaneous coronary intervention)
    • proposing citations to various targets: Wikiprojects, talk pages, other places I haven't considered yet, perhaps to multiple destinations at once.
  • collaborations with more Wikiprojects from a variety of topic areas (including humanities & arts as well as sciences, as Open Access in these areas improves).
  • Improvement to the suggestion template:
    • Improve workflow for editors to cite suggested papers.
    • Editor feedback buttons built into the template ("This article is not relevant to this topic", "Already cited here", "Too many suggestions"). These could feedback directly into the dictionary as red flags.

Relevant Resources edit

Get involved edit

Welcome, brainstormers! Your feedback on this idea is welcome. Please click the "discussion" link at the top of the page to start the conversation and share your thoughts.

Does this idea need funding? Learn more about WMF grantmaking. Or, expand to turn this idea into an Individual Engagement Grant proposal
Step 1. Change your infobox from IdeaLab to IEG:

Step 2. Create the rest of your IEG proposal:

Ready to create the rest of your proposal?
Use the button below just once to create the remaining sections you'll need!