Open main menu

Format of the sql filesEdit

These are provided as dumps of entire tables, using mysqldump. They start with various commands to set up the character set correctly for the import; they also turn off certain checks related to indexes, for speed. More importantly however, they contain a DROP TABLE IF EXISTS stanza before the inserts of the actual data. This means that if you import one of these files into an existing wiki, any data you had in that table will be lost.

Each INSERT statement contains several thousand rows of data for speed purposes.

These files contain metadata from wikipedia describing its structure and organization. They do not contain any text from the pages. The precise description of each file can be found in the Mediawiki manual: Database layout; see the list of database tables for links to a general description and the list of fields (schema) for each one.

 Sample excerpt from fr wiktionary page table dump  
-- MySQL dump 10.16  Distrib 10.1.26-MariaDB, for debian-linux-gnu (x86_64)
--
-- Host: 10.64.32.116    Database: frwiktionary
-- ------------------------------------------------------
-- Server version	10.1.38-MariaDB

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8mb4 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

--
-- Table structure for table `page`
--

DROP TABLE IF EXISTS `page`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `page` (
  `page_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `page_namespace` int(11) NOT NULL DEFAULT '0',
  `page_title` varbinary(255) NOT NULL DEFAULT '',
  `page_restrictions` varbinary(255) NOT NULL,
  `page_is_redirect` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `page_is_new` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `page_random` double unsigned NOT NULL DEFAULT '0',
  `page_touched` varbinary(14) NOT NULL DEFAULT '',
  `page_links_updated` varbinary(14) DEFAULT NULL,
  `page_latest` int(8) unsigned NOT NULL DEFAULT '0',
  `page_len` int(8) unsigned NOT NULL DEFAULT '0',
  `page_content_model` varbinary(32) DEFAULT NULL,
  `page_lang` varbinary(35) DEFAULT NULL,
  PRIMARY KEY (`page_id`),
  UNIQUE KEY `name_title` (`page_namespace`,`page_title`),
  KEY `page_random` (`page_random`),
  KEY `page_len` (`page_len`),
  KEY `page_redirect_namespace_len` (`page_is_redirect`,`page_namespace`,`page_len`)
) ENGINE=InnoDB AUTO_INCREMENT=3904893 DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `page`
--

/*!40000 ALTER TABLE `page` DISABLE KEYS */;
INSERT INTO `page` VALUES (1,0,'accueil','',0,0,0.228483556876,'20190502212211','20190502211913',25526944,5565,'wikitext',NULL),(4,8,'Disclaimers','',0,0,0.436798504291,'20050627005432',NULL,33009,7,'wikitext',NULL),(5,8,'Disclaimerpage','',0,1,0.562675562949,'20050627005432',NULL,33010,20,'wikitext',NULL),(11,8,'Showhideminor','',0,0,0.867696803467,'20060307221818',NULL,510140,120,'wikitext',NULL),(12,8,'Sitetitle','',0,0,0.45467016448,'20060213091155',NULL,403956,46,'wikitext',NULL),(13,0,'lire','',0,0,0.661605208023,'20190502205550','20190502211914',26259447,14704,'wikitext',NULL),(14,0,'encyclopédie','',0,0,0.653587898078,'20190502205550','20190502211913',26105658,7683,'wikitext',NULL),(17,3,'Romanito','',0,0,0.035887225548,'20060820153200',NULL,1116028,233,'wikitext',NULL),(18,3,'Hippietrail','',0,0,0.170748967291,'20060820153215',NULL,1116029,233,'wikitext',NULL),(19,2,'Ryo','',0,0,0.306163838276,'20130321080856',NULL,5579720,189,'wikitext',NULL),(20,2,'Romanito','',0,1,0.166625788047,'20130321080857',NULL,33024,204,'wikitext',NULL),(21,0,'manga','',0,0,0.290439131339,'20190502205550','20190502211913',26165847,4675,'wikitext',NULL),(22,2,'Ske','',0,0,0.558367359246,'20180328142044',NULL,6513307,3571,'wikitext',NULL),(23,2,'Alno','',0,0,0.523059309238,'20180819092256','20151017161740',18719282,43,'wikitext',NULL),(24,3,'Alno','',0,0,0.955483742038,'20151017161655','20151017161740',18719286,54,
'wikitext',NULL),(25,3,'Koxinga/2004-2008','',0,0,0.475421285944,'20190502205550','20190502211912',8896871,29862,'wikitext',NULL),(26,0,'ouvrage','',0,0,0.639444139153,'20190502205550','20190502211913',26232093,9635,'wikitext',NULL),(27,2,'Shaihulud','',0,0,0.149461865495,'20130321080858','20160328183603',33031,52,'wikitext',NULL),(28,3,'Tipiac','',0,0,0.106349313216,'20081116000608',NULL,33032,3233,'wikitext',NULL),(29,2,'Yggdras','',0,0,0.793165417757,'20130321080858',NULL,33033,331,'wikitext',NULL),(30,2,'Tipiac','',0,1,0.198124080057,'20170715230435','20170715230557',33034,54,'wikitext',NULL),(31,2,'Guillaumito','',0,0,0.017966994558,'20130624193903',NULL,33035,459,'wikitext',NULL),(32,2,'Koxinga','',0,0,0.131457018262,'20190502205550','20190502211912',6308014,4402,'wikitext',NULL),(35,0,'siège','',0,0,0.786991455959,'20190502205550','20190502211914',26154603,14625,'wikitext',NULL),(39,1,'siège','',0,0,0.988281379914,'20050724092221',NULL,33038,698,'wikitext',NULL),(40,0,'chaise','',0,0,0.372459553821,'20190502205550','20190502211914',26055607,11065,'wikitext',NULL),(41,0,'fauteuil','',0,0,0.838967425674,'20190502205550','20190502211913',26271245,5223,'wikitext',NULL),(42,0,'meuble','',0,0,0.923668533994,'20190502205550','20190502211914',26222182,9187,'wikitext',NULL),(43,0,'armchair','',0,0,0.946918559236,'20190502205550','20190502211913',24800422,501,'wikitext',NULL),(44,0,'mardi','',0,0,0.487491951551,'20190502205550','20190502211915',26263348,8707,'wikitext',NULL),(45,0,'lundi','',0,0,0.290082083684,'20190502205550','20190502211915',26164384,9530,'wikitext',NULL),
...
 Sample excerpt from fr wiktionary page restrictions table  
-- MySQL dump 10.16  Distrib 10.1.26-MariaDB, for debian-linux-gnu (x86_64)
--
-- Host: 10.64.32.116    Database: frwiktionary
-- ------------------------------------------------------
-- Server version	10.1.38-MariaDB

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8mb4 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

--
-- Table structure for table `page_restrictions`
--

DROP TABLE IF EXISTS `page_restrictions`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `page_restrictions` (
  `pr_page` int(8) NOT NULL DEFAULT '0',
  `pr_type` varbinary(255) NOT NULL DEFAULT '',
  `pr_level` varbinary(255) NOT NULL DEFAULT '',
  `pr_cascade` tinyint(4) NOT NULL DEFAULT '0',
  `pr_user` int(10) unsigned DEFAULT NULL,
  `pr_expiry` varbinary(14) DEFAULT NULL,
  `pr_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`pr_page`,`pr_type`),
  UNIQUE KEY `pr_id` (`pr_id`),
  KEY `pr_page` (`pr_page`),
  KEY `pr_typelevel` (`pr_type`,`pr_level`),
  KEY `pr_level` (`pr_level`),
  KEY `pr_cascade` (`pr_cascade`)
) ENGINE=InnoDB AUTO_INCREMENT=12756 DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `page_restrictions`
--

/*!40000 ALTER TABLE `page_restrictions` DISABLE KEYS */;
INSERT INTO `page_restrictions` VALUES (0,'edit','sysop',0,NULL,'infinity',1766),(0,'move','sysop',0,NULL,'infinity',1767),(339,'edit','autoconfirmed',0,NULL,'infinity',9925),(339,'move','sysop',0,NULL,'infinity',9926),(819,'edit','autoconfirmed',0,NULL,'infinity',9929),(819,'move','sysop',0,NULL,'infinity',9930),(843,'edit','autoconfirmed',0,NULL,'infinity',11443),(843,'move','autoconfirmed',0,NULL,'infinity',11444),(844,'edit','autoconfirmed',0,NULL,'infinity',11333),(844,'move','autoconfirmed',0,NULL,'infinity',11334),(1866,'edit','autoconfirmed',0,NULL,'infinity',303),(1866,'move','autoconfirmed',0,NULL,'infinity',304),(1868,'edit','autoconfirmed',0,NULL,'infinity',1628),(1868,'move','sysop',0,NULL,'infinity',1629),(2131,'edit','sysop',0,NULL,'infinity',1948),(2131,'move','sysop',0,NULL,'infinity',1949),(2280,'edit','sysop',0,NULL,'infinity',12666),(2280,'move','sysop',0,NULL,'infinity',12667),(2377,'move','autoconfirmed',0,NULL,'infinity',11147),(2400,'edit','autoconfirmed',0,NULL,'infinity',11166),(2400,'move','autoconfirmed',0,NULL,'infinity',11167),(2448,'edit','autoconfirmed',0,NULL,'infinity',886),(2448,'move','autoconfirmed',0,NULL,'infinity',887),(2483,'edit','autoconfirmed',0,NULL,'infinity',11168),(2483,'move','autoconfirmed',0,NULL,'infinity',11169),(2551,'edit','autoconfirmed',0,NULL,'infinity',11353),(2551,'move','autoconfirmed',0,NULL,'infinity',11354),(2552,'edit','autoconfirmed',0,NULL,'infinity',11337),(2552,'move','autoconfirmed',0,NULL,'infinity',11338),(2553,'edit','autoconfirmed',0,NULL,'infinity',10226),(2553,'move','autoconfirmed',0,NULL,'infinity',10227),(2554,'edit','autoconfirmed',0,NULL,'infinity',11319),(2554,'move','autoconfirmed',0,NULL,'infinity',11320),(2559,'edit','autoconfirmed',0,NULL,'infinity',11453),(2559,'move','autoconfirmed',0,NULL,'infinity',11454),(2560,'edit','autoconfirmed',0,NULL,'infinity',11427),(2560,'move','autoconfirmed',0,NULL,'infinity',11428),(2561,'edit','autoconfirmed',0,NULL,'infinity',10366),(2561,'move','autoconfirmed',0,NULL,'infinity',10367),(2562,'edit','autoconfirmed',0,NULL,'infinity',11423),(2562,'move','autoconfirmed',0,NULL,'infinity',11424),(2563,'edit','autoconfirmed',0,NULL,'infinity',11419),(2563,'move','autoconfirmed',0,NULL,'infinity',11420),(2564,'edit','autoconfirmed',0,NULL,'infinity',11433),
...

Format of the XML filesEdit

The main page data is provided in the same XML wrapper format that Special:Export produces for individual pages. It's fairly self-explanatory to look at, but there is some documentation at Help:Export#Export_format.

Three sets of page data are produced for each dump, depending on what you need.

The XML content files contains complete, raw text of some or all revisions, so in particular the full history files can be extremely large. Currently we are compressing the XML content files with bzip2 or lbzip2 (.bz2 files) and additionally for the full history dump 7-Zip (.7z files).

Both stub and content files contain a header which includes a link to the xml schema, the name of the wiki project, the version of MediaWiki which produced the dump, and the number and name of all of the namespaces on the wiki. This last is useful because namespaces are included for each dumped page only by namespace number.

 Sample header from frwiktionary stub dump  
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.10/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.10/ http://www.mediawiki.org/xml/export-0.10.xsd" version="0.10" xml:lang="fr">
  <siteinfo>
    <sitename>Wiktionnaire</sitename>
    <dbname>frwiktionary</dbname>
    <base>https://fr.wiktionary.org/wiki/Wiktionnaire:Page_d%E2%80%99accueil</base>
    <generator>MediaWiki 1.34.0-wmf.3</generator>
    <case>case-sensitive</case>
    <namespaces>
      <namespace key="-2" case="case-sensitive">Média</namespace>
      <namespace key="-1" case="first-letter">Spécial</namespace>
      <namespace key="0" case="case-sensitive" />
      <namespace key="1" case="case-sensitive">Discussion</namespace>
      <namespace key="2" case="first-letter">Utilisateur</namespace>
      <namespace key="3" case="first-letter">Discussion utilisateur</namespace>
      <namespace key="4" case="case-sensitive">Wiktionnaire</namespace>
      <namespace key="5" case="case-sensitive">Discussion Wiktionnaire</namespace>
      <namespace key="6" case="case-sensitive">Fichier</namespace>
      <namespace key="7" case="case-sensitive">Discussion fichier</namespace>
      <namespace key="8" case="first-letter">MediaWiki</namespace>
      <namespace key="9" case="first-letter">Discussion MediaWiki</namespace>
      <namespace key="10" case="case-sensitive">Modèle</namespace>
      <namespace key="11" case="case-sensitive">Discussion modèle</namespace>
      <namespace key="12" case="case-sensitive">Aide</namespace>
      <namespace key="13" case="case-sensitive">Discussion aide</namespace>
      <namespace key="14" case="case-sensitive">Catégorie</namespace>
      <namespace key="15" case="case-sensitive">Discussion catégorie</namespace>
      <namespace key="100" case="case-sensitive">Annexe</namespace>
      <namespace key="101" case="case-sensitive">Discussion Annexe</namespace>
      <namespace key="102" case="case-sensitive">Transwiki</namespace>
      <namespace key="103" case="case-sensitive">Discussion Transwiki</namespace>
      <namespace key="104" case="case-sensitive">Portail</namespace>
      <namespace key="105" case="case-sensitive">Discussion Portail</namespace>
      <namespace key="106" case="case-sensitive">Thésaurus</namespace>
      <namespace key="107" case="case-sensitive">Discussion Thésaurus</namespace>
      <namespace key="108" case="case-sensitive">Projet</namespace>
      <namespace key="109" case="case-sensitive">Discussion Projet</namespace>
      <namespace key="110" case="case-sensitive">Reconstruction</namespace>
      <namespace key="111" case="case-sensitive">Discussion Reconstruction</namespace>
      <namespace key="828" case="case-sensitive">Module</namespace>
      <namespace key="829" case="case-sensitive">Discussion module</namespace>
      <namespace key="1198" case="case-sensitive">Translations</namespace>
      <namespace key="1199" case="case-sensitive">Translations talk</namespace>
      <namespace key="2300" case="case-sensitive">Gadget</namespace>
      <namespace key="2301" case="case-sensitive">Discussion gadget</namespace>
      <namespace key="2302" case="case-sensitive">Définition de gadget</namespace>
      <namespace key="2303" case="case-sensitive">Discussion définition de gadget</namespace>
      <namespace key="2600" case="first-letter">Sujet</namespace>
    </namespaces>
  </siteinfo>
 Sample excerpt from frwiktionary stub dump  
  <page>
    <title>accueil</title>
    <ns>0</ns>
    <id>1</id>
    <revision>
      <id>2</id>
      <timestamp>2004-03-22T10:24:41Z</timestamp>
      <contributor>
        <username>Brion VIBBER</username>
        <id>1</id>
      </contributor>
      <comment>salut</comment>
      <model>wikitext</model>
      <format>text/x-wiki</format>
      <text id="2" bytes="25" />
      <sha1>9xee2aots6sdcilmnrjg2vozfwd26yr</sha1>
    </revision>
    <revision>
      <id>9</id>
      <parentid>2</parentid>
      <timestamp>2004-03-22T10:56:13Z</timestamp>
      <contributor>
        <username>Ryo</username>
        <id>2</id>
      </contributor>
      <minor/>
      <comment>intro</comment>
      <model>wikitext</model>
      <format>text/x-wiki</format>
      <text id="9" bytes="363" />
      <sha1>a6zcfqmiwzlirwlhm822hkz44s3xstj</sha1>
    </revision>
...
</page>
 Sample excerpt from frwiktionary content dump  
  <page>
    <title>accueil</title>
    <ns>0</ns>
    <id>1</id>
    <revision>
      <id>2</id>
      <timestamp>2004-03-22T10:24:41Z</timestamp>
      <contributor>
        <username>Brion VIBBER</username>
        <id>1</id>
      </contributor>
      <comment>salut</comment>
      <model>wikitext</model>
      <format>text/x-wiki</format>
      <text xml:space="preserve">C'est un [[dictionnaire]]</text>
      <sha1>9xee2aots6sdcilmnrjg2vozfwd26yr</sha1>
    </revision>
    <revision>
      <id>9</id>
      <parentid>2</parentid>
      <timestamp>2004-03-22T10:56:13Z</timestamp>
      <contributor>
        <username>Ryo</username>
        <id>2</id>
      </contributor>
      <minor />
      <comment>intro</comment>
      <model>wikitext</model>
      <format>text/x-wiki</format>
      <text xml:space="preserve">Bienvenue sur Wiktionary, un projet de [[dictionnaire]] écrit collectivement.

Tout un chacun est libre de modifier n'importe quelle page, sans aucune nécessité de s'enregistrer.


La version française vient de démarrer, donc ne contient encore que peu d'éléments.


Cliquez [http://wiktionary.org/wiki/Main_Page ici] pour voir la version anglaise.</text>
      <sha1>a6zcfqmiwzlirwlhm822hkz44s3xstj</sha1>
    </revision>
...
</page>
]

UnicodeEdit

The dumps may contain non-Unicode (UTF8) characters in older text revisions due to lenient charset validation in the earlier MediaWiki releases (2004 or so). For instance, zhwiki-20130102-langlinks.sql.gz contained some copy and pasted iso8859-1 "ö" characters; as the langlinks table is generated on parsing, a null edit or forcelinkupdate to the page was enough to fix it.

Adds/Changes dumpsEdit

These dumps contain 5 files per wiki:

  • status -- if the run completes successfully for that wiki, the contents should be 'done:all'
  • maxrevid -- the latest revision id in the database at the time the dump is run, minus a configured number of hours, so that we don't dump revisions so recent no one's vetted them
  • md5sums -- contains the md5sum of the two files with dump content
  • stubs-meta-hist -- contains revision metadata for all new revisions since the previous dump (as computed from the previous run's maxrevid); format is like regular XML stubs files
  • pages-meta-hist -- contains some revision metadata and all revision content of new revisions since the previous dump; format is like regular XML content files

For more on the format of the stubs and the revision content files, see the above section 'format of the xml files'.