Open main menu

Download toolsEdit

Downloading the XML dumpsEdit

Once you've decided what files to download, it's important to pick the correct server, probably one of the mirrors. Mirrors may be much closer to you and are usually less overloaded than dumps.wikimedia.org, which also enforces strict connection and speed limits.

For the download you can use any download manager, but you may prefer a standard command-line downloader like wget or curl which handles URL selection, resuming, retrying etc. For instance, to download the latest full dump of a wiki (Meta-Wiki in the example) from the source server, in 7z format to save on size and decompression time:

wget --recursive --no-parent --no-directories --continue --accept 7z https://dumps.wikimedia.org/metawiki/latest/

or in short:

wget -r -np -nd -c -A 7z https://dumps.wikimedia.org/metawiki/latest/

If this doesn't use the full speed of your machine and network, and you're sure you can't switch to a mirror, try the axel download accelerator (man axel) to use more connections:

axel --num-connections=3 https://dumps.wikimedia.org/metawiki/latest/metawiki-latest-pages-meta-current.xml.bz2

If you need to download several files over multiple connections, look into xargs.

If you need to download a lot of dumps, scripts are available such as WikiTeam's wikipediadownloader.py.

Downloading mediaEdit

You can download media bundles for a project or use rsync to pick up media from one of our mirror sites.

Alternatively, you can use the Wikix program to read any XML dump and create a series of parallel download scripts which will run on a Linux based system. The Wikix program requires that you have the curl program installed on your Linux distribution.

The WikiTeam software provides similar capabilities as well.

Downloading XML dumps and access logsEdit

The open source package QUAC has scripts wp-get-dumps and wp-get-access that use rsync to download from mirrors.