Research:Wiki archaeology

This page documents historical artifacts that affect data analysis of Wikimedia projects.

English WikipediaEdit

AnomaliesEdit

  • New users appear in logging table as log_type="newusers" AND log_action="newusers" between 20050907221649 and 20060421232859.
  • The archive only contains the rev_ids for revisions to pages deleted after mid-2008. "Beginning in MediaWiki 1.5, the content of the pages remains in the text table; the deletion time is logged in the logging table.". See mw:Manual:Archive_table.
  • On April 10th, 2008, Brion Vibber introduced a change to Wikimedia's MediaWiki configuration code that hard coded against the ability to delete the Main page for English Wikipedia only. See line #1461 of CommonSettings.php
  • From 2005-11-23 22:43:42 to 2012-02-29 23:01:40 information about a page move can be extracted from rev_comment with the following regex match:
WHERE rev_comment RLIKE 'moved \\[\\[([^\]]+)\\]\\] to \\[\\[([^\]]+)\\]\\] (over redirect)?:.*'
  • From 2012-02-29 23:01:40 to ??? information about a page move can be extracted from rev_comment with the following regex match:
WHERE rev_comment RLIKE '.*moved .*\\[\\[([^\]]+)\\]\\] to \\[\\[([^\]]+)\\]\\].*:.*'
  • logging table has duplicate copies of user account creation for 165 users, one with log_timestamp from 2006, and the rest with log_timestamp starting with 20080514. The only difference between duplicates appears in log_id. You can find the log_user for duplicate records using
SELECT COUNT(*) as no_dup, log_user 
FROM enwiki.logging 
WHERE 
  log_action="create" AND 
  log_type="newusers" 
GROUP BY log_user 
ORDER BY COUNT(*) DESC;
  • user_registration (user's registration date) was backfilled some users that registered before the field was added (using their first edit timestamp), but not in a consistent way (apparently only for English Wikipedia). See bugzilla:22097#c0.

See alsoEdit