OpenAccessReader/PrioritisingSignificance

Given a list of OA papers, we should be able to find the top ten papers that should be cited in Wikipedia. Even better if we can break down by subject, e.g. The top ten papers about paleontology. Volunteer motivation is a finite resource, so emphasis should be on avoiding false positives: it's okay if significant papers are missed, but editors will be frustrated if they're presented with papers that are insignificant, or have already been cited.

The most obvious marker of significance is number of citations. If something has been cited 10,000 times, it almost certainly should feature somewhere in Wikipedia. However, other paper-level metrics may help improve this.

How can we establish number of citations? (I suspect this should be available for most OA papers)
What other paper-level metrics can we find?
How can we use these to best rank papers by significance?
Can we de-duplicate, i.e. remove suggestions of papers that are already cited?
- This paper describes a mechanism for scraping citations: References that anyone can edit: review of Wikipedia citations in peer reviewed health science literature
How well does this ranking compare between subject areas?

Petr Knoth Proposal 3rd November

Petr Knoth, a subject matter expert in (alt)bibliometrics from the CORE team proposes:

We will experiment with the use of traditional bibliometric and Altmetric data, which are paradoxically not widely available in Open Access collections due to data sparsity and current technical limitations of publishers, assessing both their advantages and disadvantages when applied to Open Access research outputs. We will then aim at developing a new ranking method specifically designed for the Open Access domain mitigating the identified disadvantages. The underlying idea of the method is that the contribution of a research paper cannot be assessed purely based on the number of interactions in a scholarly communications network (as is the case in bibliometrics and Altmetrics), but requires evidence of uptake and quality. This will likely lead us in the direction of Semantometrics (Knoth & Herrmannova, 2014) and measures taking as evidence of contribution auxiliary data, such as patents, research data, open software, etc.

@mattjhodgkinson recommends Lagotto

Lagotto is an Open Source application started in March 2009 by the Open Access publisher Public Library of Science (PLOS). Lagotto retrieves data from a wide set of services (sources). Some of these sources represent the actual channels where users are directly viewing, sharing, discussing, citing, recommending the works (e.g., Twitter and Mendeley). Others are third-party vendors which provide this information (e.g., CrossRef for citations).