Wikidata/Notes/ and Wikidata

< Wikidata‎ | Notes

This note describes a possible relationship between and Wikidata.

What is Schema.orgEdit is a project to improve general Web page markup through the use of structured data. It provides ~600 term initial vocabulary, and uses an entity-relationship (RDF) approach. Web markup is annotated in Microdata or RDFa, broadly in the style popularised by the Microformats community (although with different notations and community process). is a collaboration initiated by several major search engines, but takes wider input into the schema via a W3C-hosted discussion forum.


These projects share a concern for improving the treatment of structured data in the popular, mainstream Web. Wikidata, building on Wikipedia, has relatively centralised data but a decentralised descriptive schema built around the edits of thousands of users. By contrast, has very decentralised data (potentially billions of pages), but has relatively greater central control of its schema. There are natural limits to how much a single centralised vocabulary can handle all the descriptive tasks people might ask of it, and there are limits to the extent to which Wikipedia can on its own provide structured data describing everything of interest to its users. The two projects are therefore natural partners. By defining ways for Wikipedia’s huge dataset to be used within descriptions, we could reduce the pressure for to include large lists of things, or have comprehensive type lists for all topics. And by bridging Wikipedia’s data structures to data published elsewhere in the Web, we can show techniques that allow different parties to contribute data to the Wikipedia ecosystem without necessarily copying everything into the Wikidata database. This echoes the debate around deletionism/inclusionism in the wider Wikipedia community. If other sites (perhaps MediaWiki-backed) also publish structured data using + Wikidata markup, it may be possible to show some richer linking of information across sites.

What might this mean in practice?Edit already has many classes for local businesses and services; various kinds of ‘FoodEstablishment’, ‘GovernmentOrganization‘. These don’t exhaust the possibilities. aims to remain a central documentation hub for both search engines and publishers, showing simple practical markup for structured data. But is not the best place to manage lists of kinds of food or government establishment. It is a priority for to show how to integrate such external data - eg. country codes, categories etc. Wikidata, as it evolves, is a natural source of such content. If we define integration points, it should be possible for the Wikipedia community’s work to make richer descriptions possible. Meanwhile, sites that provide structured data using markup can provide data that helps grow Wikidata’s own descriptive databases.