Data dumps
(Redirected from How to do an HTML dump)
The Wikimedia Foundation is requesting help to ensure that as many copies as possible are available of all Wikimedia database dumps. Please volunteer to host a mirror if you have access to sufficient storage and bandwidth.
About Wikimedia Dumps
Wikimedia provides public dumps of our wikis' content and of related data such as search indexes and short URL mappings. The dumps are used by researchers and in offline reader projects, for archiving, for bot editing of the wikis, and for provision of the data in an easily queryable format, among other things. The dumps are free to download and reuse.
Note that the data dumps are not backups, not consistent, and not complete. They are still useful even so.
What we dump and when
- Content and metadata of Wikimedia projects
- Cirrus search indexes of Wikimedia projects
- Wikidata entities
- Short URL mappings
- Dump frequency
- More...
Getting the dumps
- Warning about file sizes
- Mirrors for downloading, torrents
- download: XML/SQL dumps (wiki metadata and content)
- download: Wikidata entities
- download: other dumps and datasets
- Older dumps
- Tools for downloading
- Checking the status of a dump run
Using and re-using the dumps
- XML/SQL dump format
- Wikidata entity dumps formats: JSON and RDF
- Other dump formats
- Importing the dumps
- Tools for working with the dumps
- License for text content
- More license information
Getting help
- Xmldatadumps-l mailing list for general dumps questions
- wikitech mailing list for broader technical discussions
- Phabricator project for bug reporting (requires account)
- #wikimedia-techconnect irc channel for real-time chat, time zones permitting
- Help with common import issues
Contributing
- XML/SQL dumps code besides MediaWiki core
- Technical docs for the dumps
- Developer docs for the dumps
- Contributing to Wikimedia repos
- Generating dumps yourself
FAQ, further reading
- Dumps FAQ
- Wikipedia dumps help page
- Wikidata dumps information
- More...