Grants:Project/ContentMine/WikiFactMine/Planning
This page is for rough planning of the broken down tasks that need to be completed. Eventually this will become the Timeline once we've also also got a better handle on the order and length of time it may take to complete the individual tasks. This page has particular emphasis on the software development work as opposed to the Wikimedian in residence or outreach to the scientific community.
Workflow of creating a pool of facts that is updated daily
edit- Port the existing workflow (called canary) to tool labs if possible
- Gain access to the elasticsearch cluster on tools lab
- Alter canary to run on the grid. Either:
- Secure the unsecured web interface and run on the web grid
- Rewrite components into commandline tools
- Ensure it runs daily without intervention
- Build a Fact pool
- decide if it will at first be either an ES DB or MySQL
API to return facts by day loaded into fact pool
edit- Build API that can take a date/date range and return list of facts
Tool to present and browse facts by day
edit- Adapt factvis tool so it can also load facts by AJAX from api rather than disk or static http place.
- Build interface for selecting date/date range to pass to API
API to return relevant papers when queried with a Wikidata ID
edit- API to take Wikidata ID and return all facts in papers that include this Wikidata ID
- Sort these facts by number of facts per paper
Gadget to suggest papers relevant to article/item on Wikipedia/Wikidata
edit- Small tool in the sidebar to suggest top n papers by occurrence of Wikidata item
API to return papers which have a co-occurance of Wikidata IDs
edit- API to take two Wikidata IDs and return fact in all papers that contain both IDs
- Decide a way to rank this co-occurrence
Tool to suggest References for Unreferenced Wikidata Statements
edit- Distributed Wikidata the Game extension for suggesting references
- It should auto include the reference from the metadata we have downloaded
- It should show the relevant sections of an open access paper to aid editor decisions
API to return Wikidata items related to papers
edit- API to take Wikidata id of paper
- Further development would allow selecting paper by external ID such as DOI/PMCID etc..
- API to return ranked list of facts from that paper
- Ranking initially by number of facts within that paper