WikiCite 2016/Proposals/Generation of referenced Wikidata statements with StrepHit

Proposal edit

Background edit

Data quality in Wikidata is crucial and references to trustworthy third-party sources are a way to ensure it. Lots of Wikidata statements are either unsourced or sourced to Wikimedia sister projects (typically Wikipedia via bots). Adding references to such small units of information may be a cumbersome task for human editors.

StrepHit wants to relieve this effort: it is a Natural Language Processing system that reads documents across reliable Web sources and produces referenced Wikidata statements.

Aim edit

  1. Play with the current StrepHit dataset: biographies in English;
  2. create and fill a Request for Comments;
  3. encourage referenced data donations through the primary sources tool:

Demo edit

Install the primary sources tool gadget to check out the StrepHit dataset: instructions at wikidata:Wikidata:Primary_sources_tool#How_to_use

Skills needed edit

  • Basic understanding of how Wikidata works;
  • communication strategies for community engagement, in order to:
    • raise awareness of StrepHit's potential impact;
    • attract new primary sources tool users.

Phabricator task edit

None yet.

See also edit

Participants edit