Grants:Project/ContentMine/WikiFactMine

This project is funded by a Project Grant

statusselected

Project Grants

WikiFactMine

summarySemi-automatic enhancement of Wikidata from the scientific literature

targetWikidata and related Wikidata:WikiProjects (Medicine, Chemistry, Taxonomy, Molecular Biology)

type of granttools and software

amount86.2k USD

grantee• petermr• ContentMine Ltd.

advisor• Daniel Mietchen• Magnus Manske

contact• peter.murray.rust at googlemail dot com

volunteer• BamLifa

this project needs...

volunteer

affiliate

grantee

advisor

give feedback

join

endorse

created on15:30, 31 July 2016 (UTC)

Friendly space expectations

Project idea

What is the problem you're trying to solve?

Explain the problem that you are trying to solve with this project. What is the issue you want to address? You can update and add to this later. To make Wikidata the primary resource for identifying objects in bioscience.

ContentMine (CM) shares Wikimedia’s vision of empowering and engaging people around the world to collect, develop and disseminate open knowledge resources. For scientific information, the authoritative source of supported facts is the peer-reviewed scientific literature and numerous Wikimedia initiatives aim to support the community in using and citing papers effectively ^[1]^[2]^[3]. Three key problems are (i) access to the closed literature (ii) discovery of relevant information by editors and (iii) linking knowledge across Wikimedia initiatives to both add meaningful relationships and reduce duplication.

CM is currently focused on extracting facts with inherently embedded citations at scale from the open and closed literature on a daily basis and releasing them into the public domain as open data. We take advantage of a UK copyright exception for text and data mining and work jointly with librarians at the University of Cambridge. This offers a solution to the access problem but we now wish to address importing those facts to Wikidata and allowing Wikidata curators and Wikipedia editors to access them in a useful form to enrich Wikimedia content with quality, peer-reviewed information and citations. From a technical standpoint, the problem can be summed up as:

Wikidata should be the primary resource for resolving objects in bioscience, but it is not yet well known and is often “very patchy”.

For example, many Wikidata entries / properties have no Wikipedia pages (see a typical stub page of “red links” for Trigonella spp.'). In other cases, they have few or no recent authoritative peer-reviewed citations. For example, some enzyme-based stubs have no citations from the last 30 years but ContentMine discovered a 2-year-old review^[4] that lists some tens of relevant enzymes.

What is your solution?

To use the peer-reviewed bioscientific literature as a primary resource for informing Wikidata and Wikipedia editors of relevant citation-supported facts. We'll automatically (weekly) mine the whole scientific literature for objects which are or should be in Wikidata and offer them to editors for enhancement.

Wikimedia is already considering the semi-automatic collection of references, and we propose to extend that to collect Wikidata-based facts in context.

ContentMine software automatically crawls the daily peer-reviewed scientific literature (closed as well as open) and finds bioscientific terms and identifiers. We plan to use “dictionaries” based on Wikidata to identify these and to add Wikidata identifiers. These dictionaries would contain a list of relevant terms in forms that may be found in the literature, with a mapping to the corresponding Wikidata identifiers. The extracted text snippets, Wikidata identifiers, and supporting citation constitute a “peer-reviewed bioscientific fact in context”. We will rank the articles by the concentration of facts about a Wikidata property. Because we plan to use several dictionaries (e.g. gene, disease, drugs, organisms, chemicals), there is a rich context of supporting material available to editors.

The approach has been applied to disease (e.g. video on mining for Zika), invasive or endangered species, and is very useful for linking concepts by co-occurrence (e.g. which diseases are mentioned with other diseases or with drugs).

We will scan up to 10,000 peer-reviewed articles per day and can also work retrospectively for particular subjects. The facts could be organized on a per-WikidataProject basis (Chemistry, Molecular Biology, Taxonomy) and per diseases in Wikidata (no current project). Editors could then select one or more of these and be alerted (on a weekly basis) when “interesting” articles were published and analysed. They could also register for a sublist of Wikidata items (e.g. certain genera, drugs or diseases). Later, they can propose items for retrospective fact extraction from the literature.

Notes:

Copyright. In the UK, we have the legal right to extract and publish facts from subscription material. A “fact” is the mention of an entity in context (ca. 200 characters).

Commitment to Open

ContentMine is a non-profit organization in the UK (Company Registration Number 10172863). We have been supported by the Shuttleworth Foundation who have developed a commitment to OpenLock, which guarantees a commitment to the world community and prevents “selling out” to commercial interests. All software is OSI-compatible (mainly Apache2), and content is compatible with the Open Definition (either CC0 or CC BY). Many of the team have been involved with some/all of Open Knowledge International, Mozilla Foundation, Wikimedia Foundation and similar organizations and share Open values.

Two Examples

Trigonella

To enhance the 2005 stub page on Trigonella, ContentMine retrieved > 300 Open articles from EuropePMC containing "Trigonella". One contained > 130 mentions of "Trigonella", so was clearly a definitive source, containing details of tens of species:

Karyotype analyses of ten sections of Trigonella (Fabaceae). (PMID:24260623 PMCID:PMC3833733)

This paper would be automatically suggested to editors.

Carvone

Carvone is an important essential oil. In a few seconds, we extracted 200 papers about carvone from EuropePMC and show one typical specimen.

“The Influence of Chemical Composition of Commercial Lemon Essential Oils on the Growth of Candida Strains” Mycopathologia. 2014; 177(1-2): 29–39. doi: 10.1007/s11046-013-9723-3 PMCID: PMC3915084 M. Białoń, T. Krzyśko-Łupicka, M. Koszałkowska, and P. P. Wieczorek

We extracted some hundreds of facts-in-context using our phytochemicals dictionary, and these can be linked directly to Wikidata via their identifier:

" … other main compounds were β-pinene (15.1 %), α-pinene (11.1 %), citral (Q410888) and its isomers (11.4 % in total), γ-terpinene (4.8 … "

[Note that β-pinene and α-pinene are distinct compounds and both link to Q2095629; ContentMine would highlight this discrepancy to Wikidata editors.]

We also extracted information using our disease and species dictionaries:

" … use of lemon essential oil-based products as natural remedies against candidiasis (Q273510) caused by C. albicans (Q310443). Conclusions: Lemon essential oils with... "

We've promoted the use of Wikipedia/Wikidata in this project (e.g. slides 7, 12-24). These facts-in-context, including links to Wikidata pages, can be presented to editors for rapid prescreening for inclusion in or enhancement of Wikidata.

Where the editors have the right to read the whole article, our annotations (which conform to W3C annotation standards) can be automatically added to their browser (e.g. by the Hypothes.is software).

Project goals

Explain what are you trying to accomplish with this project, or what do you expect will change as a result of this grant. Promote Wikidata to the world as the first place to go to or use for reliable identification and re-use of scientific concepts.

The overarching goal of the project is to enhance the bioscientific coverage in Wikidata so it becomes a visible and used resource both for Wikipedians and in general scientific research and discourse.

Specific goals for the 12 months funding requested are:

Goal A: Use dictionaries derived from Wikidata to index the daily literature

Goal B: Create a feed of scientific facts in context with associated citations for Wikipedia and Wikidata editors

Goal C: Promote a combination of the scientific literature, Wikidata and Wikipedia as powerful resources for the scientific community, build and support the community of editors who facilitate interlinking between these three resources.

Project plan

Activities

Building Wikidata-based dictionaries to identify scientific articles and present them to editors.

Our activities will be broadly split into technical and community engagement activities, but these will be tightly linked.

Technical Activities

Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

Goal A: Use dictionaries derived from Wikidata to index the daily literature

Goal B: Create a feed of scientific facts in context with associated citations for Wikipedia and Wikidata editors

Creation and deployment of 4-8 Wikidata-based dictionaries for searching the scientific literature. Daily scraping of the open and closed scientific literature to create and disseminate lists of facts based on the Wikidata dictionaries

ID	Title	Description	Month	Effort (Person Months)
T1	Disease	Extraction of disease terms from Wikidata	M1-M4	0.6
T2	Taxonomy	Extraction of species/genera from Wikidata	M1-M4	0.6
T3	Genes	Extraction of genes from Wikidata	M1-M4	0.6
T4	Drugs	Extraction of drug terms from Wikidata	M1-M4	0.6
T5	Weekly feeds	Weekly feeds of extracted facts for editors	M3-M12	3
T6	Ingest software	Tool to ingest weekly feeds and present to editors	M2-M6	2.4
T7	Wikidata editor	Tool to allow Wikidata editors to edit/enhance WD entries with CM facts	M3-M9	2.4
T8	Customization & integration	Iterative integration of CM tools with WP software.	M1-M12	1.8

In detail: To extract Wikidata-based bioscience facts on a daily basis and publish these to Wikipedia and the wider world.

To create Wikidata-based lists of search terms (“dictionaries”) used in content-mining and promote the use of Wikidata identifiers and properties.

To enhance the coverage and quality of Wikidata properties.

To work with the following Wikiprojects:

Molecular biology (particularly dictionaries for gene Q7187)
Chemistry (in bioscience) dictionaries for drugs ( International nonproprietary name, Q824258) , pesticides (pesticide (Q131656): chemical subclasses (e.g. alkaloids (Q70702)
Taxonomy (dictionaries for [genus Q34740] and [species Q7432])

By creating and deploying dictionaries for the editors To work with other Wikipedia editors outside Wikiprojects to create a disease-based dictionary (subclass of (disease Q12136)

To work with StrepHit to

Communally promote the value of natural language processing for citation-based facts
Use their tools for further refinement and accuracy of the ContentMine procedures
Make CM facts and dictionaries available to increase the specificity and range of StrepHit

Community Engagement Activities

Past engagement

Peter Murray-Rust at Wikimania 2014

ContentMine has extensively engaged the Wikimedia community in the past. Peter Murray-Rust has been a Wikipedian for 10 years and delivered a keynote talk at Wikimania 2014 and at the Wikipedia Science Conference 2016, where we also ran a hands-on workshop. Our full-time developer Tom Arrow was awarded the 2015 Bradley-Mason prize for open chemistry by Imperial College for his participation with Wikimedia, which included taking part in a Wikimedia EU hackathon in Lyon France, introducing students and faculty to Wikidata for storing chemical data and enriching Wikipedia chemistry pages, thus showcasing open data and the wiki approach in an undergraduate teaching environment. Since working for ContentMine, he has attended the WikiCite meeting and continues to find links between the two communities.

ContentMine has introduced Wikidata to audiences in over 40 presentations at UK and international meetings, highlighting it as a key resource for those searching for scientific knowledge and as the future of large-scale scientific data curation. We have explicitly stated our desire to tightly link the ContentMine system with Wikidata and thereby Wikipedia and other Wikimedia projects such as StrepHit. This has led to numerous conversations with active Wikimedians including Magnus Manske and Daniel Mietchen about possible future directions for our software. An example of this is the current effort to reconcile entries in the existing ContentMine dictionaries with Wikidata items using Mix'n'Match

Community Engagement for CM Project

Goal C: Promote a combination of the scientific literature, Wikidata and Wikipedia as powerful resources for the scientific community and build a community of editors who facilitate interlinking between these three resources.

In the first instance, we will tackle the problem of under-recognition of Wikidata by promoting the concept of linking the scientific literature with Wikidata to the bioscience-related WikiProject groups:

WikiProject Biology
WikiProject Chemistry
WikiProject Genetics
WikiProject Molecular and Cell Biology
WikiProject Medicine
WikiProject Neuroscience
WikiProject Psychology
WikiProject Species
WikiProject Taxonomy
WikiProject Tree of Life

Our current capacity for community engagement is saturated by developing our core software pipeline and encouraging early adoption in the research community via our ContentMine Fellows. Therefore, despite significant voluntary effort already being contributed from the CM team, we believe that effective community engagement and volunteer recruitment will require dedicated support. We support working in the open and we understand the extra effort required to ensure transparency and engagement at each step of a community software development process, particularly where that community has many existing demands on their time.

We are therefore requesting funding for a Wikimedian in Residence to work jointly with ContentMine and the University of Cambridge Library, who are providing access to their subscription content for text and data mining research. Their role would be broadly to promote links between the scientific literature, Wikidata and Wikipedia, drawing on expertise from scientific librarians, researchers and interested Wikipedians. This would complement the recent and ongoing activities of Wikipedian in Residence positions at Royal Society, Royal Society of Chemistry, Cochrane Collaboration, Cancer Research UK and others ^[5]. This role would include the following activities:

Coordinate volunteers from all relevant WikiProjects to suggest and help compile Wikidata dictionaries.
Find early adopter editors and curators for the 4-8 dictionaries to assess the usefulness of the results emerging and to co-design integration of tools with editing workflows, ensuring this work and the results are communicated to interested communities at each step in the process.
Run in-person events in Cambridge and online looking at channelling scientific information from the literature into WikiData, with the aim to recruit more volunteer editors and developers. Cambridge offers a fantastic environment for this work with a thriving community of students, a very international population and active Wikimedia science contributors based locally (e.g. Magnus Manske, Alex Bateman, Charles Matthews).
Online engagement with the broader open science and open data community to ensure work is complementary and valuable, not duplicating effort or failing to engage with and assist a key group.

Our software developer will also be tightly integrated with this engagement process and their specfic role will be to contact those who have already developed relevant bots, tools and automated techniques e.g. StrepHit, to ensure that our software solutions are compatible and complementary.

Wikimania 2017 Workshop

We aim to have a beta system by Wikimania 2017 that can be tried out with many Wikimedians in a workshop run by the two project team members. The primary activities and aims of the workshop will be:

Demo and hands-on user workshop with active Wikipedia and Wikidata editors to get feedback and ideas for useful features.
Developer session with others looking at semi-automated information flow into Wikidata and other related tools.

We chose Wikimania because it brings together many of the most active Wikimedia volunteers and contributors and we will be ready for broader feedback on the tools by August 2017.

Friendly space and community code of conduct

We fully endorse the Wikimedia Friendly Space Policy and have a code of conduct for our community which reflects all principles in the Wikimedia policy. This code of conduct and associated policies would apply to all activities undertaken in the course of this project.

Summary of Community Engagement Activities

ID	Title	Description	Month	Effort (Person Months)
C1	Dictionary volunteers	Coordinate volunteers from all relevant WikiProjects to suggest and help compile Wikidata dictionaries, inform about project and gather feedback.	M1-M4	1
C2	Co-design with editors	Co-design integration of tools with editing workflows through direct interaction with experienced and new Wikidata and Wikipedia editors in the science area.	M2-M6	1.5
C3	Early adopters	Find and coordinate early adopter editors and curators for the 4-8 dictionaries to assess the usefulness of the results emerging.	M4-M8	1
C4	Community building events	Run in-person events in Cambridge and online looking at channeling scientific information from the literature into WikiData, with the aim to recruit more volunteer editors and developers.	M4-M12	2
C5	Wikimania workshop	Organise and deliver workshop for users and developers at Wikimania 2017, follow-up on results	M7-M10	1
C6	Project communications	Ensuring development work and results are communicated to interested communities at each step in the process.	M1-M12	0.5
C7	Wikipedian-in-Residence	Wikipedian/Wikidata-ian in Residence activities with Cambridge University Library, to be determined flexibly with library staff and including: promoting understanding of Wikimedia among staff at the University Library and faculty/students at the University through workshops and events. Working with science librarians to digitize, compile, and organize additional resources that can be shared with the Wikipedia community.	M1-M12	4

Budget

Item	Cost	Note
Full time software developer (40 hours per week for 12 months)	40k USD	1
Full time Wikipedian-in-Residence, joint with University of Cambridge libraries (40 hours per week for 12 months)	39k USD	2
Travel for offline workshop at Wikimania 2017 (2 return flights London to Montreal)	3k USD	3
Accommodation in Montreal (two rooms for five nights)	1.2k USD	4
	Total = 86.2k USD

Notes

[1] Average salary for full-time software developers in Cambridge, UK is 35k GBP^[6] (46.7k USD), we plan to offer a salary of up to 27k GBP and total on-costs including mandatory employer contributions to national insurance and workplace pension total 30k GBP (40k USD).

[2] Wikimedia UK recommends a salary of £23,400-31,200 for a Wikipedian-in-Residence^[7], we plan to offer a salary of up to 26k GBP and total on-costs including mandatory employer contributions to national insurance and workplace pension total 29k GBP (39k USD).

[3] Based on average cost of return flight on Skyscanner.net, total taken from July next year as August prices unpublished.

[4] Based on average cost of three-star hotel room on Trivago.

ContentMine will absorb any project management and direct costs associated with the project. We will seek funding for additional travel and local workshops in Cambridge from other sources.

Sustainability

What do you expect will happen to your project after the grant ends? How might the project be continued or grown in new ways afterwards?

We expect our tools to be adopted by editors, and Wikidata-dictionaries to be used by scientists. With the increasing use of content-mining, we expect these dictionaries to become core tools.

ContentMine is a non-profit company based in the UK whose core mission is to enable anyone to perform text and data mining of the scientific literature using open source software and liberate scientific facts as open data. This project is thus completely aligned with that mission, and the daily stream of facts will continue to be supported by ContentMine. We also undertake to maintain the Wikidata-related tools for as long as our own income stream allows. If the project is successful, we hope to have gathered a community who are interested enough in using the information that we are able to find a small number of volunteer developers to help with fixes and feature requests on an on-going basis.

Opportunities for growth

We hope that the broader community of users of the information grows sufficiently over the 12 months of the project that it is self sustaining and we also see opportunities for expansion either led through volunteer effort or additional grant proposals:

Additional dictionaries: This is a pilot project taking a limited number of information types, future dictionary-based indexing could extend to many more categories and properties in Wikidata.

Tools for Scientific publications: We see the Wikidata-enhanced dictionaries as key tools for authoring, reviewing, and reading the scientific / medical literature.

Authoring. Creators of authoring tools (e.g. Overleaf, Authorea, WriteLatex, etc.) could use WikiFactMine to help authors find terms in their manuscripts, check spelling and link directly to Wikidata. As Wikidata improves its status as a neutral, authoritative, and up-to-date source, that means many authors may value and use it.
Reviewing. Publishers could help reviewers by supporting annotating manuscripts with WikiFactMine.
Readers. Readers who need help with understanding terms would find linking to Wikipedia immediately valuable. Browser-based annotation tools such as Hypothes.is can already support this functionality using data provided by ContentMine.

Measures of success

Editors will increasingly use Wikidata-based dictionaries for adding and editing material. Scientists will use them for authoring, reviewing and ontologically-supported reading.

Goal A: Use dictionaries derived from Wikidata to index the daily literature

4-8 dictionaries successfully deployed creating an index of facts from up to 10k papers on a daily basis. We anticipate reliably indexing around 200k terms, generating upwards of 10k facts per day by M10. Daily facts will be dumped as open data in Zenodo and a report can be generated to measure actual figures on a monthly basis.

Goal B: Create a feed of scientific facts in context with associated citations for Wikipedia and Wikidata editors

Feeds actively used on a weekly basis by at least 10 Wikipedia and Wikidata editors by M12
Contributions to 1000 Wikidata entries by M12
Contribution to 100 Wikipedia entries by M12

Goal C: Promote a combination of the scientific literature, Wikidata and Wikipedia as powerful resources for the scientific community and build a community of editors who facilitate interlinking between these three resources.

1-2 members of each active WikiProject actively contributing to feedback on dictionaries and tools.
> 250 people attending in-person or online events relating to linking the scientific literature, Wikidata and Wikipedia.

Demos and resources

We now have a workflow to download the whole scientific literature that The University of Cambridge subscribes to and legally extract the facts. The extraction is largely through dictionaries which create "facets" of information (terms in context - often called "facts"). A demonstration output is at: doi:10.5281/zenodo.61276 (5 Mb download, lightweight demo in preparation). These facts will be enhanced with Wikidata IDs as we bring more WD-derived dictionaries into play.

Example of extracted facts [1] showing links to Wikidata-enhanced dictionaries.Petermr (talk) 18:56, 21 September 2016 (UTC)

At https://github.com/ContentMine/amidemos. there is a demo of extracted facts in table form for "aardvark" and "zika". All extracted facts are linked to en-wikipedia pages (e.g. https://rawgit.com/ContentMine/amidemos/master/zika/full.dataTables.html for Zika), there are very few false positives. In the future with WD-enhanced dictionaries there will be almost no false positive for concepts such as species and drugs.

Many of the examples are available as slides, e.g. mining for "carvone" (a terpene). This also highlights the way we are advocating for Wikidata.

Interactive downstream analysis of extracted facts: The Wikidata-enhanced facts are extracted and and plotted as:

timeseries . Pull down the tab to show "endangered species" dictionary completely created from Wikidata. Numbers not yet significant!
co-occurrences suggest 20 examples of "hgnc" tab (human genes mentioned in Zika papers).

Prototype Wikidata-enhanced dictionaries. These are created either from an authority and then enhanced manually or with "Mixnmatch" or from a SPARQL query on Wikidata itself (e.g for endangered species).

References

↑ "Wikipedia:The Wikipedia Library/Databases". Wikipedia, the free encyclopedia. 2016-07-31.
↑ "Wikipedia:WikiProject Open Access". Wikipedia, the free encyclopedia. 2016-03-05.
↑ "OAbot". tools.wmflabs.org. Retrieved 2016-08-02.
↑ Marmulla, Robert; Harder, Jens (2013-12-01). "Microbial monoterpene transformations”a review". Frontiers in Microbiology 5. PMC 4097962. PMID 25076942. doi:10.3389/fmicb.2014.00346.
↑ "Wikipedian in Residence - Outreach Wiki". outreach.wikimedia.org. Retrieved 2016-08-02.
↑ "Cambridge, England: Cambridgeshire City Salaries - City of Cambridge Average Salary - PayScale". www.payscale.com. Retrieved 2016-08-02.
↑ "Wikimedian in Residence draft job description - Wikimedia UK". wikimedia.org.uk. Retrieved 2016-08-02.

Get involved

Participants

The grant would be to ContentMine Limited, a UK non-profit for extracting facts from the scientific literature.

Peter Murray-Rust (Founder and Director of ContentMine)

Peter has been a Wikimedian since 2006 and delivered a keynote talk at Wikimania 2014 and Wikipedia Science Conference 2015, where CM also ran a hands-on workshop. Peter founded ContentMine as a Shuttleworth Foundation Fellow, and is the main software pipeline architect. He received his Doctor of Philosophy from the University of Oxford and has held academic positions at the University of Stirling and the University in Nottingham. His research interests have focused on the automated analysis of data in scientific communities. In addition to his ContentMine role, Peter is also Reader Emeritus in Molecular Informatics at the Unilever Centre, in the Department of Chemistry at the University of Cambridge, and Senior Research Fellow Emeritus of Churchill College in the University of Cambridge. Peter is renowned as a tireless advocate of open science and the principle that the right to read is the right to mine.

Tom Arrow (ContentMine Developer)

Tom was awarded the 2015 Bradley-Mason prize for open chemistry by Imperial College for his participation with Wikimedia, which included taking part in a Wikimedia EU hackathon in Lyon France, introducing students and faculty to Wikidata for storing chemical data and enriching Wikipedia chemistry pages, thus showcasing open data and the wiki approach in an undergraduate teaching environment. Since working for ContentMine he has attended the WikiCite meeting and continues to find links between the two communities. Tom leads current development of the ContentMine web API.

Jenny Molloy (Director of ContentMine)

Jenny is a molecular biologist by training and manages ContentMine collaborations and business development. She spoke on synthetic biology at Wikipedia Science Conference 2015 and has been a long term supporter of open science.

Wikimedian advisors:

Daniel Mietchen
Magnus Manske
Volunteer I am a full stack dev and lover of open source projects. I'm ready to work as volunteer for this project related to scientific research in medecine BamLifa (talk) 09:35, 16 February 2017 (UTC)

Community notification

Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?

English WikiProject Medicine notified 2 August 2016
[WikiProject Biology] notified 2 August 2016
[WikiProject Chemistry] notified 2 August 2016
[WikiProject Genetics] notified 2 August 2016
[WikiProject Medicine] notified 2 August 2016
[WikiProject Molecular and Cell Biology] notified 2 August 2016

[Not yet notified]

Open Knowledge mailing list
Open Science working group mailing list

WikiProject Neuroscience
WikiProject Psychology
WikiProject Species
WikiProject Taxonomy
WikiProject Tree of Life

Endorsements

Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).

very good idea, well worth the effort (in terms of Medical articles its important to remember we use en:Wikipedia:Identifying reliable sources (medicine), not primary sources) Ozzie10aaaa (talk) 12:08, 2 August 2016 (UTC)
ContentMine and StrepHit are complementary efforts that share the same vision. This is a great opportunity to join forces and make Wikidata the central access point for high-quality knowledge. Hjfocs (talk) 12:10, 2 August 2016 (UTC)
Solid proposal from a team that's best positioned to deliver much needed technical solutions for semi-automated fact extraction. This will dramatically enhance the value of Wikidata and contribute technical solutions to the issues we worked on at WikiCite. --Dario (WMF) (talk) 20:31, 2 August 2016 (UTC)
Added concerns to the talk page[2] Doc James (talk · contribs · email) 15:32, 3 August 2016 (UTC)
Happy to endorse, will add some comments and suggestions on the talk page. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:31, 4 August 2016 (UTC)
I've always been impressed by the work done around Content Mine. With good community engagement on Wikidata I am confident this will be successful. It'll be a huge boost to one of Wikidata's biggest remaining issues for wider adoption: the lack of scientific references for a lot of data. --Lydia Pintscher (WMDE) (talk) 12:55, 5 August 2016 (UTC)
Support There are still details to work out but so many participants in this proposal have such a good record of success that I anticipate more of the same. en:Peter Murray-Rust has been managing open content projects for longer than Wikipedia has existed. Having better exchange with databases of the en:Open Knowledge is an essential step in the development of Wikidata and the Wikimedia community will only benefit by having a new partnership with affiliates of that organization. That ContentMine is already an established project is a big help, because it means that this grant project will be able to begin with live publishing rather than readying content. Overall, I trust the reputations of the management experience of the people involved in this project. The money requested is less than the worth of the data offered, the expertise backing it, and the experience that the requesting persons have of engaging in discussions in the free culture community. Wikidata needs this particular expertise and engagement now to create a precedent for less experienced organizations who would contribute in the future. Blue Rasberry (talk) 16:53, 8 August 2016 (UTC)
Support Very much supported. From drugs it is easily extended to metabolites which is highly interesting for WikiPathways as well as drug metabolites, which is of high relevance to the XMetDB. Egon Willighagen (talk) 12:12, 9 August 2016 (UTC)
A lot of the facts in the scientific literature aren't available to computers. Having an automated way to make it easier for Wikidata editors to enter new facts is a great project. ChristianKl (talk) 15:23, 11 August 2016 (UTC)
Support --Tobias1984 (talk) 17:58, 25 September 2016 (UTC)
Support -- Julialturner (talk) 18:47, 15 February 2017 (UTC)
Support -- GerardM (talk) 14:31, 6 May 2017 (UTC)

[1] "Wikipedia:The Wikipedia Library/Databases". Wikipedia, the free encyclopedia. 2016-07-31.

[2] "Wikipedia:WikiProject Open Access". Wikipedia, the free encyclopedia. 2016-03-05.

[3] "OAbot". tools.wmflabs.org. Retrieved 2016-08-02.

[:0-4] Marmulla, Robert; Harder, Jens (2013-12-01). "Microbial monoterpene transformations”a review". Frontiers in Microbiology 5. PMC 4097962. PMID 25076942. doi:10.3389/fmicb.2014.00346.

[5] "Wikipedian in Residence - Outreach Wiki". outreach.wikimedia.org. Retrieved 2016-08-02.

[6] "Cambridge, England: Cambridgeshire City Salaries - City of Cambridge Average Salary - PayScale". www.payscale.com. Retrieved 2016-08-02.

[7] "Wikimedian in Residence draft job description - Wikimedia UK". wikimedia.org.uk. Retrieved 2016-08-02.

[1]

[2]

[3]

[4]

[5]

[6]

[7]