LinkedOpenData/Strategy2021/Wikidata

Purpose

Wikidata makes it easier than ever to build apps and services around Linked Open Data. It achieves this by empowering communities from all over the world to collaboratively collect and organize important data about the world.

Guiding principles

  • Openness: Wikidata’s content is free and open for anyone to use; participation is open for everyone. Our products should lead to more free and open data, and everyone should be able to join our movement. We want to reduce barriers for new participants.
  • Sustainability: Wikidata is here to stay. It is important to us that the users of our data can have confidence in us, which is why we aim for sustainability over speedy growth and quick wins.
  • Co-creation: Wikidata stands or falls based on the health of its amazing community. We prioritize providing the community with the tools and knowledge they need to find solutions that fit their needs, which will ensure that our community -- including future contributors and volunteers -- continues to have the power to care for Wikidata. Wikidata and its community are not alone. We are well connected to the other Wikimedia projects, and a growing ecosystem of tools and applications exists around Wikidata, the Wikibase Ecosystem, and the larger Linked Open Data web. Our positive impact in the world is strongest when we work together.
  • Utility: In order to have an impact on the world, Wikidata’s content must be used outside of Wikidata. Making sure Wikidata’s data is widely used means we must ensure it remains useful for re-users. This includes maintaining high-quality data as well as easy access to that data.
  • Knowledge equity: The world is beautiful and complex; Wikidata should reflect that. We want to make sure that Wikidata and its decision-making processes support equity.

Background info

Wikidata is a free, collaborative, multilingual knowledge base with a focus on verifiability. It collects structured data to provide support for Wikipedia, the other wikis in the Wikimedia movement, and anyone in the world with a need for general-purpose structured data. Wikidata is based on the Wikibase software and provides data, an ontology and links to other databases.

Wikidata’s data constitutes the basis for a wide variety of applications and services both inside and outside the Wikimedia movement. It is an increasingly important building block for much of the technology we use every day.

Wikidata has just turned nine years old and is thriving more than ever. We now need nine digits for our identifiers (welcome Q100000001), and this summer we crossed the 1.5-billion edit mark. A full 72% of Wikipedia articles use Wikidata for infobox content, auto-categorization, flagging maintenance work and other support functionality, not to mention site links, which are used in 97% of all Wikipedia articles. The Wikidata Query Service sees 11 million queries per day. Wikidata has also successfully expanded into the area of lexicographical data, forming the basis for new initiatives such as Abstract Wikipedia.

Examples of current Wikidata use

Wikidata offers basic building blocks that can be used in various ways. Here are some examples of how Wikidata’s various types of content are currently used:

  • Accessing basic information: These projects use Wikidata to retrieve basic data on specific entities.
    • DerDieDas: This game uses data from Wikidata’s Lexemes to instruct users in the correct articles for German nouns.
    • MyCroft AI: This digital personal assistant accesses data from Wikidata to answer general-knowledge questions.
    • Brave search: This search engine uses data from Wikidata to show information boxes in their search results.
  • Augmenting other data: These projects use Wikidata to retrieve data that enriches data they already have.
    • Musicbrainz: This music encyclopedia augments their music data with Wikidata’s data on concepts (e.g., countries) that are related to but not an essential part of their knowledge area.
    • Kanopi: This semantic note-taking application allows users to link to Wikidata Items and easily augment their notes with data from Wikidata.
    • OpenLibrary: This book-tracking website uses Wikidata to retrieve data on the gender, origin and other attributes of the authors in their catalog, allowing them to provide their readers with statistics on biases in their reading behavior.
  • Machine learning: These projects use Wikidata as a source of training data for machine-learning systems.
    • Exploration of historical theatre photographs: Researchers used Wikidata’s data in support of a machine-learning system that performs image recognition on a collection of old theatre photographs. Wikidata’s data facilitated plausibility checks. If the image detection algorithm recognized, for example, a laptop in a historic photo, it would be marked as highly unlikely, based on when laptops were invented.
    • OpenAI: One of OpenAI’s machine-learning systems uses data from Wikidata to perform entity disambiguation: helping a computer differentiate between two entities with the same name mentioned in a text -- for example, distinguishing between Jaguar (the make of automobile) and a jaguar (the animal).
    • TXT Werk: This company is using Wikidata’s data and entities as a basis for their named-entity recognition tool, which allows users to extract entities from a given text and identify them by their Q-ID.
  • Data cleaning and reconciliation: Wikidata’s statements and ontology are used to connect and clean up a data set.
    • Quora: Their question-answering side uses connections to Wikidata to compare their ontology to Wikidata’s and then correct mistakes in both ontologies.
  • Data exploration and visualization: Wikidata’s data is used to give new insights and overviews in areas such as journalism, education and research.
    • Measuring Political Elite Networks: This researcher uses the entities for elite organizations and persons and the connections between them to better understand how political power networks across the world work.
    • EqualStreetNames.Brussels: This website uses Wikidata’s gender data to explore biases in who streets are named after in Brussels. Similar websites exist for a few other cities.
    • Open Art Browser: This art website lets users explore visual art across time, movements, locations, motifs and more.
  • Gateway to LOD web: Wikidata’s links to other websites, catalogs, archives and more are used to access additional information.
    • The Science Museum: This museum uses Wikidata as a “Rosetta Stone” to provide links to other datasets, to ingest content from those datasets, and to provide richer interfaces to their collection.
  • Source of notable entities for disambiguation, cataloging, tagging and more: Wikidata’s stable identifiers are used to clearly identify concepts in a language-independent manner.
    • OCCRP: The Organized Crime and Corruption Reporting Project uses Wikidata’s Items and their labels in various languages to support their data analysis during reporting, specifically to gain a better understanding of the many different names under which a person or organization might be operating (and hiding behind).
    • Wikimedia Commons: Wikimedia’s media site uses Wikidata’s IDs and concepts for sophisticated image tagging to enable better exploration of its media archive.
    • Reddit: This community discussion site uses Wikidata’s IDs and concepts to better understand what their subcommunities are about and enable better recommendations for subcommunities to join.
    • Tom Scott: This science YouTuber tried to determine the best thing ever by running a survey among his viewers. He used Wikidata as his source of concepts to be voted on.
  • Internationalization: Wikidata is used as a source of names for various concepts across languages.
    • Mapbox: The mapping provider uses Wikidata’s labels to internationalize the names for various geographic features like city names.
  • Place for shared community work of other projects: Wikidata is used as a place to enable community to work on shared data.
    • Wikipedia infoboxes: This encyclopedia uses Wikidata as a store for general-purpose data for infoboxes, enabling shared work across project and language boundaries.
    • Election Tracker: This team uses Wikidata to encourage their users to contribute data for upcoming national election dates on their global election calendar.
    • KDE Itinerary: This open digital travel assistant uses Wikidata to retrieve data, including lists of airports, countries' varying electricity standards and traffic directions to provide travelers with important information for their trip.

Strategies

Empower the community to increase data quality

We must ensure that our socio-technical system helps editors increase the quality of Wikidata’s existing data and contribute new high-quality data.

Why focus here?

We want and need Wikidata’s data to be accurate and verifiable, especially as more and more technology in everyday use relies on our data in providing applications and services. This state of affairs places increased pressure on Wikidata to provide high-quality data and increases the incentives to manipulate our data. Meanwhile, the growth of Wikidata’s content is outpacing the community’s own growth and its ability to maintain that content.

How might we address this?

We can establish feedback loops with data re-users, perform automated verification of data and issue detection, and develop tools and artefacts that make project policies and guidelines easier to implement (e.g., more powerful Entity Schemas).

Facilitate equity in decision making

We want to ensure that diverse perspectives come into play as fundamental decisions are made for and about Wikidata. By doing so we can better support knowledge often underrepresented in knowledge graphs.

Why focus here?

We are committed to equity within Wikidata, and it plays an especially important role in our fundamental decision-making processes.

How might we address this?

We can address this by amplifying voices that are raising awareness and by bringing perspectives to the discussion that have so far been missing from it. We can create spaces for discussion and help with facilitation.

Increase re-use for increased impact

We want to make sure that anyone can use the data in Wikidata to make the world a better place. While everyone can re-use our data, we give priority to organizations and projects that align with our values and have a high impact.

Why focus here?

We want more people to benefit from the data Wikidata provides. High-impact projects that are aligned with our vision and values, along with re-users giving back to Wikidata, provide a greater collective benefit simply by reaching more people.

How might we address this?

We can leverage our new and improved APIs and our improved documentation and showcases and throw more support behind high-impact re-users: namely, educational re-users, those working towards knowledge equity and those that give back.

Strengthen underrepresented languages

More people need access to knowledge and technology presented in their own language, and content in that language should be accessible to all. Language data is a fundamental building block in reaching that goal.

Why focus here?

The underrepresentation of some languages in the realms of technology and of knowledge in general constitutes a significant barrier to granting as many people as possible access to the world’s knowledge in their own language.

How might we address this?

A direct path to improving that situation would be to increase access to multilingual machine-readable data, which would then serve as a basis for apps and services. Improving the user interface for lexicographical data and making lexicographical data accessible in Wiktionary will go some way toward that goal.

Enable Wikimedia Projects to share their workload

The Wikimedia Projects should be able to rely on Wikidata to share their workload across language and project family boundaries.

Why focus here?

We want all Wikimedia Projects and language versions of these projects to flourish; that’s the best and most effective path to giving every single human being access to the sum of all knowledge. To this end, we must support smaller and medium-size projects in particular and free them to rely on Wikidata as a source of basic data that is collectively maintained.

Currently, Wikidata and the other Wikimedia Projects are not tightly integrated, nor is the integration yet elaborate enough to take full advantage of this workload sharing. Wikidata benefits immensely from the efforts, expertise and experience of contributors from other Wikimedia Projects -- not only in keeping its data well maintained, but also in learning from community processes and other experiences that can be applied to Wikidata.

How might we address this?

We can build out interfaces that allow users to edit Wikidata from the other Wikimedia Projects, e.g., Wikidata Bridge, as well as improving tools to monitor and moderate content relevant to a given project and supporting projects like Abstract Wikipedia, which rely on Wikidata’s data to provide content to readers in an automated fashion.

Target groups

Project shapers

Project shapers have their effect on Wikidata by facilitating and taking part in fundamental processes and decision-making. They create the rules and structures that hold the project together and are typically at the core of the community, with deep, overall understanding and care for the project.

Concerns and needs: Project shapers want to create processes and rules that others can and want to follow, so the project can scale and remain healthy.

Connection to strategy: Project shapers are integral to empowering the community to increase data quality, guiding the decision-making process required to come to agreements on data modeling and more. They play a crucial role in facilitating equity in decision-making, ensuring that a diverse set of voices is heard and their perspectives considered. We can support project shapers by creating tools and artefacts that make project policies and guidelines more actionable, as well as by supporting the creation of inclusive and open processes and guidelines to increase knowledge equity.

Gardeners

Gardeners bring Wikidata to life by enhancing Wikidata’s quality and enforcing the rules and agreed-upon structures. Typically, they have subject-matter expertise and care deeply about quality in their topic areas.

Concerns and needs: Gardeners currently have more work to do than is humanly possible. They require tools, rules and processes that support their quality work and that ensure others can contribute positively.

Connection to strategy: Gardeners are integral in empowering the community to increase data quality as those who tend the garden that is Wikidata. They also play an important role in increasing re-use for increased impact, as consistently modeled data is vital for easier re-use. We can support gardeners by creating more powerful tools and artefacts that empower their work (user interfaces that encompass project guidelines, tools to incorporate feedback from re-users, automated verification, etc.) and by attracting more editors to the project in order to lessen the workload.

 
Growth graphs of Wikidata Items and editorship

Representatives of diverse knowledge

Representatives of diverse knowledge share viewpoints that would otherwise be underrepresented in Wikidata’s data in the contexts of language and culture, among others, and they play an important part in bringing missing voices to the fundamental discussions shaping the project.

Concerns and needs: Representatives of diverse knowledge want to see their otherwise underrepresented knowledge reflected in Wikidata and the wider open knowledge ecosystem; they want their voices to be heard in fundamental decisions that shape the project.

Connection to strategy: Representatives of diverse knowledge are crucial in facilitating equity in decision making as those who bring important perspectives, which would otherwise be absent, to discussions and decision making processes. We can support representatives of diverse knowledge by growing expert communities in underrepresented areas, supporting project shapers and facilitators in making fundamental decisions and bringing in otherwise missing points of view.

Small and medium-size re-users

Small and medium-size re-users are building products and services on top of Wikidata’s data. They often contribute back and use Wikidata to overcome the competitive disadvantage of not owning their own knowledge graph.

Concerns and needs: Small and medium-size re-users are looking for data that is easy to access and use without the burden of collecting and maintaining it on their own.

Connection to strategy: Small and medium-size re-users are vital for increasing re-use for increased impact as those who can build new, useful applications and services on top of Wikidata’s data, thereby increasing the impact of our data in the world. These re-users tend to be more closely aligned with Wikimedia’s mission and values and include many free and open knowledge projects. We can support small and medium-sized re-users by making it easier to use Wikidata’s data and to give back to the commons.

Large re-users

Large re-users are global organizations that are integrating Wikidata’s data into the internal knowledge graphs driving their products and services. Typically, they are able to help us improve Wikidata’s data through their processes for user feedback and internal quality assurance.

Concerns and needs: Large re-users want easy access to high-quality data and are seeking ways to contribute in a meaningful way.

Connection to strategy: Large re-users are important for increasing re-use for increased impact and for empowering the community to increase data quality: they reach billions of people with their applications and services, and they have the resources to support us in securing and increasing data quality. We can support large re-users by making it easier for them to give back -- for example, by showing them how they can participate in the necessary work of maintaining Wikidata and by providing defined tools and processes to report issues they discover as they perform internal quality assurance.