Research talk:Active editor spike 2015/Work log/2015-03-15
Latest comment: 9 years ago by Halfak (WMF) in topic Sunday, March 15, 2015
Sunday, March 15, 2015
editToday, I'm going to try to kick off a job to re-generate modeling monthly active editors for the new timespan.
First I want to check how far my old data goes.
> select month, sum(revisions) from editor_month GROUP BY month order by month DESC; +--------+----------------+ | month | sum(revisions) | +--------+----------------+ | 202505 | 5 | | 201406 | 8906963 | | 201405 | 26546076 | | 201404 | 21148315 | | 201403 | 23712258 | | 201402 | 21989246 |
So it looks like I want to trim off 201406 and re-generate that data.
> select count(*) from editor_month where month >= "201406"; +----------+ | count(*) | +----------+ | 190671 | +----------+ 1 row in set (13.18 sec)
It looks like that will get quite a few rows.
> delete from editor_month where month >= "201406"; Query OK, 190671 rows affected (27.89 sec)
Done OK. Now to update my data gathering query so that I can add only the missing data since may, 2014.
SELECT
wiki,
month,
user_id,
user_name,
user_registration,
SUM(revisions * archived) AS archived,
SUM(revisions) AS revisions
FROM (
SELECT
LEFT(rev_timestamp, 6) AS month,
DATABASE() AS wiki,
rev_user AS user_id,
FALSE AS archived,
COUNT(*) AS revisions
FROM revision
WHERE rev_timestamp >= "201406"
GROUP BY LEFT(rev_timestamp, 6), rev_user
UNION ALL
SELECT
LEFT(ar_timestamp, 6) AS month,
DATABASE() AS wiki,
ar_user AS user_id,
TRUE AS archived,
COUNT(*) AS revisions
FROM archive
WHERE ar_timestamp >= "201406"
GROUP BY LEFT(ar_timestamp, 6), ar_user
) AS editor_months
INNER JOIN user USING (user_id)
GROUP BY wiki, month, user_id
ORDER BY wiki, month;
... And that's going to take a bit. --Halfak (WMF) (talk) 18:57, 15 March 2015 (UTC)