Data dumps/2006 notes
This page is kept for historical interest. Any policies mentioned may be obsolete. If you want to revive the topic, you can use the talk page or start a discussion on the community forum. |
Clusters
editThe wikis hosted in our Korean cluster will have a separate host, at http://download-yaseo.wikimedia.org/
Reporting
editThe backup runner script will generate some pretty HTML pages showing status as each file completes, so it should be easier to see what's done, what's in progress, and what failed.
I'm about to code up this part, shouldn't be too hard I hope. :)
File layout
editThis basic layout of file generation is complete in the script:
- public/
- dbname/
- YYYYMMDD/
- dbname-YYYYMMDD-all-titles-in-ns0.gz
- list of page names for BBC
- dbname-YYYYMMDD-table.gz
- SQL table dumps
- dbname-YYYYMMDD-pages-type.xml.bz2
- dbname-YYYYMMDD-pages-type.xml.7z
- XML page text dumps
- dbname-YYYYMMDD-abstract.xml.gz
- page extracts for Yahoo
- dbname-YYYYMMDD-all-titles-in-ns0.gz
- YYYYMMDD/
- dbname/
Static URLs
editThere will probably also be a directory with symbolic links for a static URL to whatever the latest version is of each file. Will likely look like this:
- public/
- dbname/
- latest/
- dbname-all-titles-in-ns0.gz
- list of page names for BBC
- dbname-table.gz
- SQL table dumps
- dbname-pages-type.xml.bz2
- dbname-pages-type.xml.7z
- XML page text dumps
- dbname-abstract.xml.gz
- page extracts for Yahoo
- dbname-all-titles-in-ns0.gz
- latest/
- dbname/
Images/uploads
editNot yet included, this may change in near future.