Research:Wiki archaeology
This page is currently a draft. More information pertaining to this may be available on the talk page. Translation admins: Normally, drafts should not be marked for translation. |
This page documents historical artifacts that affect data analysis of Wikimedia projects.
English Wikipedia
editAnomalies
edit- New users appear in logging table as
log_type="newusers" AND log_action="newusers"
between 20050907221649 and 20060421232859. - The
archive
only contains the rev_ids for revisions to pages deleted after mid-2008. "Beginning in MediaWiki 1.5, the content of the pages remains in the text table; the deletion time is logged in the logging table.". See mw:Manual:Archive_table. - On April 10th, 2008, Brion Vibber introduced a change to Wikimedia's MediaWiki configuration code that hard coded against the ability to delete the Main page for English Wikipedia only. See line #1461 of CommonSettings.php
- From 2005-11-23 22:43:42 to 2012-02-29 23:01:40 information about a page move can be extracted from rev_comment with the following regex match:
WHERE rev_comment RLIKE 'moved \\[\\[([^\]]+)\\]\\] to \\[\\[([^\]]+)\\]\\] (over redirect)?:.*'
- From 2012-02-29 23:01:40 to ??? information about a page move can be extracted from rev_comment with the following regex match:
WHERE rev_comment RLIKE '.*moved .*\\[\\[([^\]]+)\\]\\] to \\[\\[([^\]]+)\\]\\].*:.*'
- logging table has duplicate copies of user account creation for 165 users, one with log_timestamp from 2006, and the rest with log_timestamp starting with 20080514. The only difference between duplicates appears in log_id. You can find the log_user for duplicate records using
SELECT COUNT(*) as no_dup, log_user
FROM enwiki.logging
WHERE
log_action="create" AND
log_type="newusers"
GROUP BY log_user
ORDER BY COUNT(*) DESC;
- user_registration (user's registration date) was backfilled some users that registered before the field was added (using their first edit timestamp), but not in a consistent way (apparently only for English Wikipedia). See bugzilla:22097#c0.