Research talk:Active editor spike 2015/Work log/2015-04-08
Wednesday, April 8, 2015
editToday, I'm going to look at registration rates and activation rates broken down by mobile/desktop. To do this, I'm going to make use of Schema:PageContentSaveComplete and Schema:ServerSideAccountCreation -- both of which record the mobile/desktop source.
Registrations
edit> select left(timestamp, 6) AS month, count(*) FROM ServerSideAccountCreation_5487345 GROUP BY 1; +--------+----------+ | month | count(*) | +--------+----------+ | 201306 | 154651 | | 201307 | 306863 | | 201308 | 341139 | | 201309 | 357668 | | 201310 | 356127 | | 201311 | 350107 | | 201312 | 340671 | | 201401 | 359553 | | 201402 | 317216 | | 201403 | 323158 | | 201404 | 335925 | | 201405 | 363648 | | 201406 | 326076 | | 201407 | 421697 | | 201408 | 716665 | | 201409 | 543066 | | 201410 | 536955 | | 201411 | 525470 | | 201412 | 499713 | | 201501 | 532291 | | 201502 | 494341 | | 201503 | 527792 | | 201504 | 134090 | +--------+----------+ 23 rows in set (7.27 sec)
It looks like we have registrations going back to July, 2013.
> select left(timestamp, 6) AS month, count(*) FROM ServerSideAccountCreation_5233795 GROUP BY 1; +--------+----------+ | month | count(*) | +--------+----------+ | 201302 | 2468 | | 201303 | 332050 | | 201304 | 218731 | | 201305 | 293814 | | 201306 | 126206 | +--------+----------+ 5 rows in set (0.77 sec)
We can go back to March, 2013 with the previous version of the schema. It looks like the change only affected a description.
OK. So now to gather monthly registration rates per wiki.
SELECT
wiki,
month,
mobile,
SUM(registrations)
FROM (
(SELECT
wiki,
LEFT(timestamp, 6) AS month,
event_displayMobile AS mobile,
COUNT(DISTINCT event_userId) AS registrations
FROM ServerSideAccountCreation_5487345
WHERE event_isSelfMade = 1
GROUP BY wiki, month, mobile)
UNION ALL
(SELECT
wiki,
LEFT(timestamp, 6) AS month,
event_displayMobile AS mobile,
COUNT(DISTINCT event_userId) AS registrations
FROM ServerSideAccountCreation_5233795
WHERE event_isSelfMade = 1
GROUP BY wiki, month, mobile)
) AS wiki_month_registration_counts
WHERE month BETWEEN "201303" AND "201503"
GROUP BY wiki, month, mobile;
OK... That's running. Now to gather the edit data.
Edits
edit> select left(timestamp, 6) AS month, count(*) FROM PageContentSaveComplete_5588433 GROUP BY 1; +--------+----------+ | month | count(*) | +--------+----------+ | 201311 | 654534 | | 201312 | 23699979 | | 201401 | 29744876 | | 201402 | 23836212 | | 201403 | 24008427 | | 201404 | 22866635 | | 201405 | 28359010 | | 201406 | 24691518 | | 201407 | 26508595 | | 201408 | 25410517 | | 201409 | 23319162 | | 201410 | 26306705 | | 201411 | 26241917 | | 201412 | 27645244 | | 201501 | 26267808 | | 201502 | 23990825 | | 201503 | 31280861 | | 201504 | 6764302 | +--------+----------+ 18 rows in set (6 min 19.56 sec
What? Did we really only enable PageContentSaveComplete at the end of 2013? I thought it had predated ServerSideAccountCreation. Looks like the revision history disagrees. Oh well. Mobile hasn't been around that long anyway.
Time to write the query. --Halfak (WMF) (talk) 16:26, 8 April 2015 (UTC)
OK Query is:
SELECT
wiki,
LEFT(timestamp, 6) AS month,
rev_user AS user_id,
SUM(event_isMobile) AS mobile_edits,
SUM(!event_isMobile) AS desktop_edits
FROM revision
INNER JOIN log.PageContentSaveComplete_5588433 ON
event_revisionId = rev_id AND
wiki = DATABASE()
WHERE
rev_timestamp BETWEEN "201312" AND "201504" AND
timestamp BETWEEN "201312" AND "201504";
Since the user_id field isn't in the schema, I need to join against each wiki's revision table. :( This is a good excuse to fix up the utility that enables that. --Halfak (WMF) (talk) 17:10, 8 April 2015 (UTC)
https://github.com/halfak/Multiquery Done
Now to run that query against the whole set of DBs. --Halfak (WMF) (talk) 19:56, 8 April 2015 (UTC)