Data request limitations

Wikimedia requires that users adhere to certain data request limitations to avoid putting excessive load on the servers. Wikimedia asks that users, when practicable, use the data dumps to obtain large amounts of data from WMF projects rather than making unnecessary API requests. At one time, there was a live feed available, as a paid service, to make it possible to keep one's databases in sync with Wikimedias', but this is no longer available to new customers. Alternatives include using the IRC feeds or the Toolserver.

IssuesEdit

Retrieving large numbers of page revisionsEdit

Referring to some of the features available via API query properties, Tim Starling writes: "You can use api.php with rvprop=content and rvcontinue to fetch the text of all revisions of a page. Please do this in a single thread with a substantial delay between requests, since this is a very expensive operation for our servers. Do not attempt to do it for a large number of pages, for that, use the XML download instead. Do not do it regularly or set up a web gateway which allows users to initiate these requests."[1]

Live mirrorsEdit

Live mirrors are forbidden. A live mirror is one that polls WMF for the page data every time a user requests that page from the mirror. A live mirror is not a mirror that merely polls WMF for data needed to keep the mirror up-to-date.

Polling APIEdit

Users must follow the User-Agent policy. If you run your requests in serial, rather than parallel, you are unlikely to put too much strain on the servers.

InstantCommonsEdit

No policy has yet been established limiting mw:InstantCommons use. It is considered unlikely that individual wikis using the InstantCommons feature would cause a significant increase in cost for the Wikimedia Foundation since every file only has to be downloaded once, and there are per-user bandwidth limitations.

ImagesEdit

See Data dumps#Downloading Images. You should get them from a mirror if you can.

ReferencesEdit

See alsoEdit