Research:Daily unique bot editors

Daily unique bot editors
Specification
A is a user who is a flagged bot and completed at least edits on date .
WMF Standard
  • = 1 edits
Status
completed
SQL
SET @date = "20140101";
SET @n = 1;

SELECT
    COUNT(*)
FROM (
    SELECT
        rev_user,
        SUM(revisions) AS revisions
    FROM (
        SELECT
            rev_user,
            COUNT(*) AS revisions
        FROM revision
        INNER JOIN user_groups ON
            ug_user = rev_user AND
            ug_group = "bot"
        WHERE
            rev_timestamp BETWEEN @date AND
                DATE_FORMAT(DATE_ADD(@date, INTERVAL 1 DAY), "%Y%m%d%H%i%S") AND
            rev_user > 0
        GROUP BY 1
        UNION ALL
        SELECT
            ar_user AS rev_user,
            COUNT(*) AS revisions
        FROM archive
        INNER JOIN user_groups ON
            ug_user = ar_user AND
            ug_group = "bot"
        WHERE
            ar_timestamp BETWEEN @date AND
                DATE_FORMAT(DATE_ADD(@date, INTERVAL 1 DAY), "%Y%m%d%H%i%S") AND
            ar_user > 0
        GROUP BY 1
    ) AS user_revisions
    GROUP BY 1
) AS editors
WHERE revisions >= @n;

Daily unique editing bots is a standardized metric used to measure the number of bot accounts that edit a wiki in a given day. It's used as a proxy for editing population size.

Discussion edit

Identifying bot accounts edit

Bot accounts are identified using the bot flag strategy. This allows for straightforward and efficient bot detection, but it is also possible that some unflagged bots will not be counted.

Time lag edit

As this is a daily metric, a full 24 hours must elapse after the beginning of the date (UTC) in order to calculate an uncensored value.

Edits on deleted pages edit

This metric includes edits on existing pages as well as pages that have been or will later be deleted. This allows us to define a metric as stateless, in other words historical values will not change in the future depending on the status of a page (existing/deleted/moved) at the time the metric is computed. Deletion-related activity is tracked via a separate set of metrics.

Analysis edit

Discussion edit

Notes edit