User:Neil Shah-Quinn (WMF)/Data portal draft
There is a great deal of publicly-available, open-licensed data about Wikimedia projects. This page is intended to help community members, developers, and researchers who are interested in analyzing raw data learn what data and infrastructure is available. If you have any questions, you might find the answer in the Frequently Asked Questions about Data.
If you wish to browse pre-computed metrics and dashboards, see statistics.
If this publicly available data isn't sufficient, you can look at the page on private data access to see what non-public data exists and how you can gain access.
If you wish to donate or document any additional data sources, you can use the Wikimedia organization on DataHub.
Data Dumps (details)
Dumps of all WMF projects for backup, offline use, research, etc.
The API provides direct, high-level access to the data contained in MediaWiki databases through HTTP requests to the web service.
Toolforge allows you to connect to shared server resources and query a copy of the database (with some lag).
Recent changes stream (details)
Wikimedia broadcasts every change to every Wikimedia wiki using the Socket.IO protocol.
Analytics Dumps (details)
Raw pageview, unique device estimates, mediacounts, etc.
Reports in 25+ languages based on data dumps and server log files.
DBpedia extracts structured data from Wikipedia, allows users to run complex queries and link Wikipedia data to other data sets.
A collection of various Wikimedia-related datasets.
Editing metadata includes information about the users, time, and revision comment, and so on, but does not include the content of the revision itself.
This data is available from:
Raw content dataEdit
Data that includes the raw content of page revisions is available from:
Structured content dataEdit
- Wikidata Query Service
In addition to the raw data described above, there is a great deal of helpful infrastructure for research and analysis provided for people contributing to Wikimedia's mission.