Research:Daily unique anonymous editors

Daily unique anonymous editors
Specification
A is an unregistered user who completed at least edits on date via the same IP address.
WMF Standard
  • = 1 edits
Status
completed
SQL
SET @date = "20140101";
SET @n = 1;

SELECT 
    COUNT(*) 
FROM (
    SELECT
        rev_user_text,
        SUM(revisions) AS revisions
    FROM (
        SELECT
            rev_user_text,
            COUNT(*) AS revisions
        FROM revision
        WHERE
            rev_timestamp BETWEEN @date AND
                DATE_FORMAT(DATE_ADD(@date, INTERVAL 1 DAY), "%Y%m%d%H%i%S") AND
            rev_user = 0 
        GROUP BY 1
        UNION
        SELECT
            ar_user_text AS rev_user_text,
            COUNT(*) AS revisions
        FROM archive
        WHERE
            ar_timestamp BETWEEN @date AND
                DATE_FORMAT(DATE_ADD(@date, INTERVAL 1 DAY), "%Y%m%d%H%i%S") AND
            ar_user = 0 
        GROUP BY 1
    ) AS user_revisions
    GROUP BY 1
) AS editors
WHERE revisions >= @n;

Daily unique anonymous editors is a standardized metric used to measure the number of logged-out editors who save edits to a wiki on a given day. It's used as a proxy for editing population size.

Discussion

edit

Using IP as an identifier

edit

The current metric depends on counting IP addresses within the specified period as a proxy for distinct anonymous editors. A unique IP address doesn't necessarily identify a unique user due to IP rotation, IP addresses shared among multiple editors, proxies etc.

Time lag

edit

As this is a daily metric, a full 24 hours must elapse after the beginning of the date (UTC) in order to calculate an uncensored value.

Edits on deleted pages

edit

This metric includes edits on existing pages as well as pages that have been or will later be deleted. This allows us to define a metric as stateless, in other words historical values will not change in the future depending on the status of a page (existing/deleted/moved) at the time the metric is computed. Deletion-related activity is tracked via a separate set of metrics.

Analysis

edit

Discussion

edit

Notes

edit