Structured data for GLAM-Wiki/Intro

Structured data for GLAM-Wiki

How do we humans make sense of all the data that gets generated in the world? Through our memory, and through the records we keep. How do machines make sense of all the data that gets generated in the world? Through what programmers call structured data.

Structured data is a way of formally describing objects and relationships in the world that is flexible enough to accommodate what we know, and still be understood by computers.

A short, beginner-level introduction to Structured Data on Wikimedia Commons (3 minutes 43 seconds)

How does it work?

Structured data on Wikimedia projects describes things in the world (creative works, people, media files...) with properties and values. One way to understand this, is to consider a driver’s license, which consists of various properties such as name, address, date of birth, eye color, and license number, and corresponding values which are specific to the individual.

Take a moment to consider what the properties and values of a creative work could be. With structured data, one can assemble as many property-value combinations as one likes. They can describe many different aspects of a creative work: when it was made, by whom, of which materials, and in which collection it is held.

Example

Given a painting, what might some of its properties be?

  • Painted by.
  • The subjects depicted in the painting.
  • Date of creation.

What then are its corresponding values? They might be, respectively:

  • "Thomas Eakins,"
  • "boy,"
  • and "1878."

These are just some of the details we are used to, but property-value combinations are only limited by what one can imagine describing. One might also include properties like "material used," "collection," "height," and "width."

Property and value pairs show up all over the structured data format that Wikimedia projects use, so it’s a good idea to practice thinking of things in this way. If we consider the figure above, we see that structured data forms an almost infinite network of relationships between different data items or entities.

The values in the pairs can, themselves, branch into new property-value pairs: "Thomas Eakins," "banjo," "watercolor painting," and "Metropolitan Museum of Art" can each spawn their own branches.

In fact, structured data on Wikimedia projects forms a very large, multidimensional graph; a map of knowledge, of connected things in the world, that both humans and machines can read and derive meaning from.

Structured data on Wikimedia projects is multilingual

This multidimensional graph of structured human knowledge is designed to be useful to as many people as possible. Entities on Wikidata - both properties and items - are designed to be multilingual. Wikimedia volunteers add labels (similar to names) and aliases (alternative names, spellings...) to them in many languages. In this way, structured data from Wikimedia helps to make information discoverable, no matter in which language you will search.

Example

Structured data on Wikimedia projects is machine-readable

Structured data on Wikimedia is also designed in such a way that it can easily be re-used and remixed by programmers. Software developers use the API (Application Programming Interface) and data access of Wikidata and Wikimedia Commons to create many diverse applications.

Learn more
Examples
  • Open Art Browser
  • The Astrolabe Explorer
  • Dwynwen, a Wikidata-driven interface to the collections of the National Library of Wales (blog post)
  • Interactive timeline of the Museo del Prado:

Structured data on Wikipedia and other Wikimedia projects

Wikimedia projects make more and more use of structured data. You can see structured data in action in various projects:

Wikidata

Wikidata, a sister project of Wikipedia, is the free knowledge base of the Wikimedia ecosystem. Wikidata is the main project that stores freely editable and re-usable, multilingual, structured data.

Wikidata contains tens of millions of data items (or entities) about notable things in the world: places, people, abstract concepts, creative works...

Wikimedia Commons

Wikimedia Commons is a free source of volunteer-contributed media files and their associated metadata. It contains more than 50 million files: images, audio, video...

Many of these files on Wikimedia Commons have been contributed by cultural institutions - from large ones, like the National Archives of the United States, to very small organizations like the Jakob Smitsmuseum in Belgium.

Wikibase

Wikibase is not a website; it is a piece of software. Wikibase powers Wikidata, and the structured data on Wikimedia Commons. And, as free software, any individual or organization can download and use it to create their own, independent, structured data repositories. Many cultural institutions host their own Wikibase, like the German National Library and Rhizome.org.

 
Learn more

This introduction to structured data focuses on Wikimedia projects, not on individual organizations' use of Wikibase.

To learn more about Wikibase and get in touch with its community, check the website: https://wikiba.se/

Wikipedia

Wikipedia is an encyclopedia, describing knowledge about the world in text form. Structured data is not at the center there, but it appears in many places. Examples of structured data on Wikipedia include the authority control information on English Wikipedia, as well as many infoboxes on French, English, Portuguese, and other Wikipedias.

To apply Structured Data to other Wikimedia platforms, the Structured Data Across Wikimedia (SDAW) project is working to structure content on wikitext pages in a way that will be machine-recognizable and -relatable, to make reading, editing, and searching easier and more accessible across projects, such as Wikipedia in various languages, and on the internet.