Research:Monthly wikimedia editor activity dataset

This page documents a completed research project.


The page documents a dataset that describes the activity levels of registered editors across Wikimedia's projects. This dataset contains a row for every (wiki, user, month) that contains a count of all 'revisions' saved and a count of those revisions that were 'archived' when the page was deleted.

Halfaker, Aaron (2015): Wikimedia editor activity (monthly). figshare.
dx.doi.org/10.6084/m9.figshare.1553296

Methods edit

The Wikimedia internal database replicas were queried. Two queries were deployed against all wikis and then joined. The last complete month in the dataset is June, 2015.

Editor months
SELECT
    wiki,
    month,
    user_id,
    user_name,
    user_registration,
    SUM(revisions * archived) AS archived,
    SUM(revisions) AS revisions
FROM (
    SELECT
        LEFT(rev_timestamp, 6) AS month,
        DATABASE() AS wiki,
        rev_user AS user_id,
        FALSE AS archived,
        COUNT(*) AS revisions
    FROM revision
    WHERE rev_timestamp >= "201503"
    GROUP BY LEFT(rev_timestamp, 6), rev_user

    UNION ALL

    SELECT
        LEFT(ar_timestamp, 6) AS month,
        DATABASE() AS wiki,
        ar_user AS user_id,
        TRUE AS archived,
        COUNT(*) AS revisions
    FROM archive
    WHERE ar_timestamp >= "201503"
    GROUP BY LEFT(ar_timestamp, 6), ar_user
) AS editor_months
INNER JOIN user USING (user_id)
GROUP BY wiki, month, user_id
ORDER BY wiki, month;
Local user info (attached_method)
SELECT
    DATABASE() AS wiki,
    user_id AS user_id,
    user_registration AS user_registration,
    gu_id AS globaluser_id,
    lu_attached_timestamp AS user_attached,
    lu_attached_method AS attached_method
FROM user
LEFT JOIN centralauth.localuser ON 
    lu_wiki = DATABASE() AND
    lu_name = user_name
LEFT JOIN centralauth.globaluser ON
    gu_name = lu_name
GROUP BY user_id;

Sample edit

  • wiki -- The dbname of the wiki in question ("enwiki" == English Wikipedia, "commonswiki" == Commons)
  • month -- YYYYMM
  • user_id -- The user's identifier in the local wiki
  • user_name -- The user name in the local wiki (from the 'user' table)
  • user_registration -- The recorded registration date for the user in the 'user' table
  • archived -- The count of deleted revisions saved in this month by this user
  • revisions -- The count of all revisions saved in this month by this user (archived or not)
  • attached_method -- The method by which this user attached this account to their global account
wiki month user_id user_name user_registration archived revisions attached_method
aawiki 200404 3 Angela NULL 2 2 password
aawiki 200406 2 Tim Starling NULL 10 10 password
aawiki 200408 2 Tim Starling NULL 2 2 password
aawiki 200409 14 Davidcannon NULL 2 2 primary
aawiki 200412 2 Tim Starling NULL 12 12 password
aawiki 200501 21 Rich Farmbrough NULL 16 16 password
aawiki 200502 23 ` NULL 2 2 primary
aawiki 200508 62 Afar god NULL 6 10 primary
...
enwiki 200101 9161929 KlausSeistrup 20090308075444 0 1 new
enwiki 200101 9167505 ALittleLuck 20090309011538 0 1 new
enwiki 200101 10164500 IvoryRing 20090722031343 0 1 new
enwiki 200101 10164531 StasK 20090722031938 1 2 primary
enwiki 200101 10164597 ChessyPig 20090722033150 0 1 new
enwiki 200101 11327310 Dhcp058.246.lvcm.com 20091230114723 2 3 new

Note that some users who registered their accounts before 2006 may have strange 'user_registration' dates because that field was not tracked in a reasonable way until 2006. See bugzilla:22097.

Summary analysis edit

Coming soon

References edit