This project is funded by an Individual Engagement Grant

Welcome to this project's midpoint report! This report shares progress and learnings from the Individual Engagement Grantee's first 3 months.

Summary

Open Access Reader (OAR) is a project to systematically ensure that all significant open access research is cited in Wikipedia by

developing a tool that identifies missing citations
nurturing a volunteer community to use this tool

We'd like to apply for more funding to:

Commission the CORE team to produce the backend functionality we need ($15,000)
Build a basic UX ($?)
Start growing a test community ($?)

The long term goals for the project should be:

growing a large, productive product user group, from within the existing WP contributor community and other aligned communities
Sustaining stable product development support

Methods and activities

How have you setup your project, and what work has been completed so far?

Describe how you've setup your experiment or pilot, sharing your key focuses so far and including links to any background research or past learning that has guided your decisions. List and describe the activities you've undertaken as part of your project to this point.

In the initial 3 month exploratory period, we validated and socialised our concept and produced a small but promising proof of concept, as well as doing the research necessary to build the first functioning prototype.

More specifically, we produced:

Strong recommendations to use CORE as the source for OA metadata.
A proof of concept generated from a static metadata dump from CORE.
Discovery of research outlining a method of matching OA articles from CORE to Wikipedia categories.
A set of wireframes for a desktop UI, with mockups coming soon.
A proposal from the CORE team to produce and support:
- the backend required to supply open access metadata in the form we require for OAR by augmenting their existing API.
- a considered and justified ranking methodology.
Discovery of Citoid, a tool to automatically generate correct citation links.
A press list for a campaign to develop a crowdsourcing community.

Overview & Motivation

In organising and programming Wikimania, I got a very wide overview of the current developments and challenges going on inside the Wikimedia movement as a whole. This project was inspired by a few strands of that experience, particularly the following observations:

The Open Access movement in academia is only a few years old, but already millions of papers a year are being released under open licenses.
Academic output is a significant source of content for Wikipedia.
Due to the sheer volume, discoverability of academic papers is very poor, despite modern tools like Google Scholar. This is a challenge even for seasoned academics, and a major reason why research tends to gather in topical silos - itself a pattern Wikipedia could help address in the long term.
Academics don't typically engage with Wikipedia much, though this is starting to change.
In contrast, OA publishers find Wikipedia to be a major source of traffic - i.e. people find academic work largely through Wikipedia (!)
Most academic papers are surprisingly intelligible to a layman, however most laymen don't know how or where to find them (and they're historically behind paywalls).
The data ecosystem around Wikipedia (Wikidata and Labs) is maturing quickly, but the potential isn't widely known within the WP community.
New (Wikimedia) contributors find it difficult to imagine tasks that are simultaneously within their competence, and that feel significant enough to be motivating.

Open Access Reader is based around the core functionality of taking an Open Access library, and removing from it papers that are already cited in Wikipedia. This functionality could be used:

As a tool for experienced editors, allowing subject or even article level discovery of new papers to integrate => better quality articles
As a tool for new volunteers, giving them motivating yet well defined editing tasks => new contributors
As part of a campaign to strengthen the link between the academic community and Wikimedia => expert contributors / increased legitimacy for Wikimedia
As an example of a project using open data, both from the Wikimedia ecosystem, and from the Open Access ecosystem, that can be understood by people that aren’t particularly familiar with either, perhaps inspiring similar projects from the open data community in general => more technical contributors

Furthermore, the project seems to:

have a quickly achievable MVP, or at least impressive proof of concept demo.
have an output that’s would be easy to measure and evaluate (users, citations added)
not replicate any similar work
be useful in perpetuity (if supported)

The main steps in the project are:

Source OA metadata
Remove existing citations
Rank papers
Filter papers
Provide a UX

Doing these things in a basic way is quite straightforward, but quality of the output will be low. Most of the exploratory period was spent researching and communicating with subject matter experts online and at events, to try and find out exactly what resources or methods already exist, and generally which parts of the project would be the most challenging.

Midpoint outcomes

What are the results of your project or any experiments you’ve worked on so far?

Please discuss anything you have created or changed (organized, built, grown, etc) as a result of your project to date.

Scoping out project path

Given the amount of funding my primary aim was to invest time in reviewing relevant expertise available to inform good decisons and a viable proof of concept demo. Relevent activities were:

Attending conference
Met with Petra from CORE
Corresponded with project advisors to seek feedback on feasibility and potential pitfalls
Research on bibliometrics to inform decisions on significance filter

Proof of Concept Demo

We decided to prioritise some kind of example output. It would mean that we'd have to tackle every step of the process at some level, and stop us getting distracted by difficulties in a particular area. Additionally, a convincing demo helps grow support for a project, attracting talent, funding and volunteers.

Identify best Open Access Aggregator - https://tools.wmflabs.org/oar/samplemetapretty.json
Find simplistic but quick-to-implement significance filter for the sample - number of citations in other literature in that sample
Explore correspondence between aggregator papers and Wikipedia citations
Publish static output & elicit feedback

Creation of wireframe for UX

This is a cosmetic exercise, but was important to demonstrate the User Interface in order to the applicability of the project to different audiences. It helps to illustrate how the API can be implemented as part of a workflow and the potential for a relatively inexperienced user to rapidly and efficiently improve quality of articles with highly relevant, academic research.

Development of productive partnership with CORE

This is central to the ongoing viability of the project. The most cost effective way of delivering the functionality of OAR is by piggybacking onto an established community and resource with aligned goals. The fact that Petra was prepared to provide the projected cost for the API augmentation and is invested in the future successful integration is good value and far lower risk than attempting to set up our own team from scratch. It is highly likely that this partnership can continue and support the complicated task of creating a functional topic filter.

Finances

Budget spent up to the midpoint was 3275 USD => 1971 GBP

On 10 Nov 2014:

1250 GBP was paid to User:A930913 for software development.
721 GBP was paid to User:EdSaperia for project management.

Learning

The best thing about trying something new is that you learn from it. We want to follow in your footsteps and learn along with you, and we want to know that you are taking enough risks to learn something really interesting! Please use the below sections to describe what is working and what you plan to change for the second half of your project.

What are the challenges

What challenges or obstacles have you encountered? What will you do differently going forward? Please list these as short bullet points.

What is working well

What have you found works best so far? To help spread successful strategies so that they can be of use to others in the movement, rather than writing lots of text here, we'd like you to share your finding in the form of a link to a learning pattern.

Your learning pattern link goes here

Next steps and opportunities

Simple Citation Hitlist Tool

Taking our proof of concept to something that we can deploy to users.

Design a robust significance ranking

The current system is the most simple and convenient but isn't accurate or reliable. The CORE team are experts in bibliometrics and have invited us to commission them to create a better one.

Produce a system that generates a live list of most significant papers.

The current system runs off a static and partial data dump. We will build a product that runs off the live CORE and live Wikipedia databases.

UX design
Design metrics

We'll decide which actions the tool will measure and create an analytics dashboard, allowing us to evaluate results and continue to improve the design.

Share functional tool around:
- General Wikimedia communities

Via mailing lists, community spaces (village pump), active wikiprojects, etc.

- Open (Access) communities

Via mailing lists, publications, influencers.

- Potential volunteer communities

Via PR.

Mature tool with topic filtering

Create topic filtering functionality, making the tool more useful for specific topic communities (e.g. wikiprojects, university courses)

Assess paper metadata
Research, Design & implement topic filtering
Update UX
publicise within topic communities (wikiprojects, etc)

Grantee reflection

We’d love to hear any thoughts you have on how the experience of being an IEGrantee has been so far. What is one thing that surprised you, or that you particularly enjoyed from the past 3 months?

Grants:IEG/Open Access Reader/Midpoint

Contents