Data Analysis/Quirks


We use this page to collect anomalies and quirks that we find in our data due to mistakes or changes made in the past. The primary target audience are people who use Wikipedia data and gives them pointers on how to handle certain edge cases. Some of these anomalies have been reported in Bugzilla.

Data Source: XML dump filesEdit

  • Bug 27773 - Length of dump text and length field in API do not match.
  • Bug 27774 - Username to user_id match is inconsistent in revisions of dump.

Data Source: DatabaseEdit