Wikimedia Conference 2010/Developers' Workshop/Notes/Structured Data
This pad contains live notes from the Structured Data working Group at Wikimedia Conference 2010/Developers' Workshop.
Outline
editThe first session was used to identify main topics to discuss today and tomorrow. The resulting topics are:
- Using MW templates for managing structured data (do we need better declarations of template parameters? datatypes? data extraction from templates, ...)
- Data import/re-use from external sources (live import vs. data integration, caching, push vs. pull, versioning issues, trust and provenance)
Participants
edit(based on introductory round in the morning)
- Tatiana de la O (acracia): API, RDF storage
- Gregor Hagedorn: image metadata (?), large datasets, (Identification in biology, matrix keys for querying data)
- Simon GESIS: wiki for social science databases (import/export)
- Leszek Krupinski (leafnode): importing metadata properly, database access to wikis
- John Erling Blad (jblad): using external data in Wikipedia, re-using public data
- Daniel Kinzler (Duesentrieb): multi-lingual metadata, commons, metadata extraction
- APPER (Chris): PersonData, tool server
- Lars Aronson (LA2): practical use of metadata, personal metadata
- Jonathan Gray (OKF): Open Data, Open Content, bibliographic metadata, browsing
- Inez: structured data extraction, article recommendation, (WYSIWYG background)
- Sebastien: Automatic Wikification
- Anja Jentzsch (anjeve): DBpedia
- Robert Isele: DBpedia
- Kolossos: Maps, Template Tiger
- Markus Krötzsch: Semantic MediaWiki
- Jeroen De Dauw: Semantic Maps
Use Cases
edit- Image metadata on Commons
- PersonenDaten on German WP
- Geodata on WP
- Bibliographical records (e.g. using FRBR ontology)
- online demo at http://bibliographica.org/
Using MW templates for managing structured data
edit- Should we use templates, or develop some other way of presenting/recording meta-data?
How to manage metadata about templates?
edit- Proposal by Daniel:
Declare template paremeters, including documentation, expressed relation (e.g. RDF property), optional-flag, etc.
- Proposal by Markus:
SMW already has a feature like the one proposed by Daniel. In addition, SMW declares properties on separate pages to have a local name and datatype (instead of just using a technical URI of some external ontology directly)
- Properties should be first-class objects (like in RDF), existing globally and possibly being used in more than one template
- Some property values can have multiple languages
- we want to support this only for plain text values
- we treat it as in RDF internally
- Problem: not all properties have reasonable one-to-one mappings to template fields, e.g. sometimes multiple fields have to be consolidated into one property value
- Possible solution 1: Have a parser function for declaring template fields to have some "meaning", do this with an extension and incrementally introduce it to a WP project; actual values are obtained by hooking into template transclusion and checking if the template has declared meanings for its parameters to process them (advantages: no changes in core, incremental adoption/extension possible)
- Possible solution 2: Have a parser function for processing instantiated template parameters; the parser function is inserted into the template code (wrapped around the value) and processed on the pages the template is used on (advantages: no addition database lookup when using templates)
Which information to declare?
editCore information:
- parameter name
- datatype
- unique identifier for the property (possibly from a standard vocabulary, or from the wiki)
- human-readable documentation
Auxiliary information:
- field required or not (used for editing)
- information for sanitizing inputs
- list-related attributes (e.g. separators for lists)
- ...
Basic datatypes:
- text, multilingual text, dates, numbers, wiki page names, geo coordinates, URLs
- lists of <anything>
- Units of measurement?
Other issues (later)
edit- Data model?
- Data types?
- Mapping to external ontologies?