Group 8: Citoid-Wikidata integration

Room 121 (subgroup)

Etherpad: Group 8 (parent Etherpad)

Attendees

Alphabetical by first letter

Alex Kalderimis (RefMe)
Katie Filbert (Wikimedia Deutschland, Wikidata)
Marielle Volz (Wikimedia Foundation) (attending remotely)
Philipp Zumstein (Universitätsbibliothek Mannheim (Mannheim University Library))
Sebastian Karcher (Qualitative Data Repository / Zotero, Citation Style Language (CSL))

Links

Proposal: https://meta.wikimedia.org/wiki/WikiCite_2016/Proposals/Citoid_integration_for_Wikidata
Example Call for the Citoid API: https://citoid.wikimedia.org/api?format=zotero&search=http%3A%2F%2Flink.springer.com%2Fchapter%2F10.1007%2F11926078_68
https://citoid.wikimedia.org/
Citoid Codebase on github: https://github.com/wikimedia/citoid/
Citoid calling CrossRef: https://github.com/wikimedia/citoid/blob/master/lib/Scraper.js#L293
Example translator: https://github.com/wikimedia/citoid/blob/master/lib/translators/openGraph.js
DC translator test: https://github.com/wikimedia/citoid/blob/master/test/features/unit/translators/dublinCore.js#L8
Full scraper tests: https://github.com/wikimedia/citoid/blob/master/test/features/unit/scraper.js

Tasks

Subtask 1: Extend the Configuration

Current version: https://github.com/filbertkm/wikidata-refs/blob/master/template.json

List of Zotero item types and fields:

http://aurimasv.github.io/z2csl/typeMap.xml-

List of available properties

Templates:

https://en.wikipedia.org/wiki/Template:Cite_journal/doc

Zotero Field to Wikidata property mapping for itemType journalArticle:

See list of fields here: http://aurimasv.github.io/z2csl/typeMap.xml#map-journalArticle

"title": { ,// this should really go in the "label" field- not be a property. 
  "id": "P78",
  "valuetype": "monolingualtext" 
}, 
"url": {
  "id": "P973",
  "valuetype": "string"
},
"date":{
    "id": "P577",
    "valuetype": "time"
},
"DOI": {
    "id": "P356,
    "valuetype": "external identifier"
},
"volume": {
    "id": "P478",
    "valuetype": "string"
},
"issue": {
    "id": "P433",
    "valuetype": "string"
},
"URL": {
    "id": "P854",
    "valuetype": "URL"
},
"PMID": {
"id": "P698",
"valuetype": "external identifier"
},
"PMCID": {
"id": "P932",
"valuetype": "external identifier"
},
"seriesTitle": { // see series
    "id": "P1433->P1476",
    "valuetype": "string"
},
// The below types are items and it may not be possible to full represent what is needed for them in JSON
"publicationTitle": {
    "id": "P1433",
    "valuetype": "item"  //match via ISSN?
},
"author": {
      "id": "P50"  //expects items, combine firstName and lastName for label. Also we'll be getting multiple items here to create. 
      "valuetype": "item"
 },
"editor": { 
    "id": "p98",
    "valuetype": "item" 
    },
"rights": {
"id": "P275",
"valuetype": "item" //match a string     
},
"language": {
"id": "P407",  //watch deletion/merge discussion at https://www.wikidata.org/wiki/Property:P407
"valuetype": "Item"
},
"series": {
    "id": "P1433",
    valuetype": "item" // Journal
},

Problems:

Handling agent names (personal and institutional names):
- Authors are represented as Items. An item needs to be created for each author, with the appropriate properties set. Note that this may involve duplication of entities, where matching items cannot be resolved.
- published in also requires an itemJournal is also an item.
Particularily since it makes to sense to search for these items

Possible model for handling items where the valuetype is an item:

    
"publicationTitle": {
    "id": "P1433",
    "valuetype": "item"  //match via ISSN?
    "item" : { //fields corresponding to the item: //value of publication title is implied as the label
        ISSN: { //issn field belongs to journal, not journalArticle
            "id":"",
            valuetype:""
            }
        }
},

ISSN should be used to link with the journals and can then afterwards be queried with SPARQL (Wikidata has a hierarchical types, graph model). The same should be true of `series`, `seriesTitle`, `shortTitle`, `libraryCatalog`, `issue`, which are properties on other entities.

Subtask#2 Improve ID import into Citoid

Additional IDs for Wikidata

e.g. JSTOR, OCLC, arXivID, Imdb, MR (mathematical review
we're usually aware of these (e.g. when using citoid on JSTOR or arXiv), but aren't importing them -- Zotero puts them in the extra field; Citoid already parses that for PMID. Add some more to Zotero extra, add more parsing to Citoid.
idea:
- pack some of these ids in the `extra` field in the zotero translator,

https://github.com/zotero/translators/commit/046a7a584ca901744e74f586e3123b5eb9d7facc

https://github.com/zotero/translators/pull/1065

- make sure Citoid understand that

Support non-CrossRef DOIs in Citoid

Options

add DataCite translator
use DOI.org API
- test via http://doi-cache.dissem.in/
- make post request with accept header set to application/citeproc+json
Possible issues: different formats from different agencies
would still need to rewrite translator, which currently relies on COinS (!) in API response

Integrate general translators (Metadata, Highwire etc.) instead of dublicating these functions

Citoid Tools: https://github.com/wikimedia/citoid/tree/master/lib/translators
It seems that the corresponding zotero translators are not working
reached out to Zotero on this; exploring further

WikiCite 2016/Report/Group 8

Contents