Research:Daily unique page creators

Daily unique page creators
Specification
A is a user who completed at least page creations across all namespaces on date .
WMF Standard
  • = 1 page creation
Status
completed
SQL
SET @date = "20140101";
SET @n = 1;

SELECT
    COUNT(*) AS page_creators
FROM (
    SELECT
        rev_user,
        rev_user_text,
        COUNT(*) AS page_creations
    FROM (
        SELECT
            rev_user,
            rev_user_text
        FROM
            revision
        WHERE
            rev_timestamp BETWEEN @date AND
                DATE_FORMAT(DATE_ADD(@date, INTERVAL 1 DAY), "%Y%m%d%H%i%S") AND
            rev_parent_id = 0
        UNION ALL
        SELECT
            ar_user as rev_user,
            ar_user_text AS rev_user_text
        FROM
            archive
        WHERE
            ar_timestamp BETWEEN @date AND
                DATE_FORMAT(DATE_ADD(@date, INTERVAL 1 DAY), "%Y%m%d%H%i%S") AND
            ar_parent_id = 0
    ) page_creations
    GROUP BY
        rev_user,
        rev_user_text
) page_creator
WHERE page_creations >= @n;

Daily unique page creators is a standardized metric used to measure the number of users who create new pages on a wiki in a given day. It's used as a proxy for editing population size.

Discussion edit

Identifying page creations edit

Regretfully, MediaWiki does not track a history of page creation events. However, by using the rev_parent_id field, this metric makes a close approximation. rev_parent_id usually points to the previous revision. For the first revision of a page, rev_parent_id = 0. This metric approximates page creations by looking for these parentless revisions.

Time lag edit

As this is a daily metric, a full 24 hours must elapse after the beginning of the date (UTC) in order to calculate an uncensored value.

Analysis edit

Discussion edit

Notes edit