Data dumps/Download tools

Download tools


Downloading the XML dumps


Once you've decided what files to download, it's important to pick the correct server, probably one of the mirrors. Mirrors may be much closer to you and are usually less overloaded than, which also enforces strict connection and speed limits.

For the download you can use any download manager, but you may prefer a standard command-line downloader like wget or curl which handles URL selection, resuming, retrying etc. For instance, to download the latest full dump of a wiki (Meta-Wiki in the example) from the source server, in 7z format to save on size and decompression time:

wget --recursive --no-parent --no-directories --continue --accept 7z

or in short:

wget -r -np -nd -c -A 7z

If this doesn't use the full speed of your machine and network, and you're sure you can't switch to a mirror, try the axel download accelerator (man axel) to use more connections:

axel --num-connections=3

If you need to download several files over multiple connections, look into xargs.

If you need to download a lot of dumps, scripts are available such as WikiTeam's

Downloading and decompressing the XML and SQL dumps


If you frequently download and decompress the latest dumps for a particular wiki, this Bash script automates the process. (See also the Bash and Fish completion scripts.)

Downloading media


You can download media bundles for a project or use rsync to pick up media from one of our mirror sites.

Alternatively, you can use the Wikix program to read any XML dump and create a series of parallel download scripts which will run on a Linux based system. The Wikix program requires that you have the curl program installed on your Linux distribution.

The WikiTeam software provides similar capabilities as well.

Downloading XML dumps and access logs


The open source package QUAC has scripts wp-get-dumps and wp-get-access that use rsync to download from mirrors.