Research talk:Onboarding new Wikipedians/Rollout/Work log/2014-03-24
Monday, March 24th
editBack at it. Today, I'd like to finish off my analysis of the first 30 days of GS deployment. The last thing I did was produce a breakdown of the % of editors making it to various places in the funnel. To help me make sense of this, I'll start by making a sketch of what I found so far.
[Registration: 336,310] --> [Desktop: 273,169 (81.23%)] --> [GS wiki: 218,968 (80.16%)] --> [No CTA: 75,341 (34.41%)] \ \ '--> [Edit current: 19,982 (9.13%)] '--> (18.77%) '--> (19.84%) '--> [Suggest only: 66,767 (30.49%)] '--> [Edit & suggest: 56,878 (25.98%)]
Now, I need to figure out:
- How many of those users made an article edit in the first 24h?
- How many of those article editors made a GS edit?
- What proportion of those GS edits were reverted?
--Halfak (WMF) (talk) 15:40, 24 March 2014 (UTC)
I have the following query kicked off to gather data on users cross-wiki. I'll have to pull this into our server the contains the ServerSideAccountCreation log in order to filter it down to eligible users.
SELECT DATABASE() as wiki, user_id, SUM(rev_id IS NOT NULL) AS day_revisions, SUM(rev_id IS NOT NULL AND page_namespace = 0) AS day_main_revisions, SUM(ct_rev_id IS NOT NULL) AS day_gs_revisions FROM user LEFT JOIN revision ON rev_user = user_id AND rev_timestamp BETWEEN user_registration AND DATE_FORMAT(DATE_ADD(user_registration, INTERVAL 1 DAY), "%Y%m%d%H%i%S") LEFT JOIN page ON rev_page = page_id LEFT JOIN change_tag ON ct_rev_id = rev_id WHERE user_registration BETWEEN "20140211181300" AND "20140313181300";
OK. Here's the query that's pulling in eligible users so that I can do a set intersection between the two:
SET @deployment = "20140211181300"; CREATE TEMPORARY TABLE staging.month_gs_impression SELECT wiki, event_userId AS user_id, event_ctaType AS cta_type FROM log.GettingStartedRedirectImpression_7355552 WHERE timestamp BETWEEN @deployment AND DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S") GROUP BY 1,2; CREATE UNIQUE INDEX wiki_user ON staging.month_gs_impression (wiki, user_id); CREATE TABLE staging.month_gs_user SELECT ssac.wiki, ssac.event_userId AS user_id, cta_type, timestamp AS user_registration_approx FROM log.ServerSideAccountCreation_5487345 ssac LEFT JOIN staging.month_gs_impression impression ON ssac.wiki = impression.wiki AND ssac.event_userId = impression.user_id WHERE event_isSelfMade AND NOT event_displayMobile AND ssac.timestamp BETWEEN @deployment AND DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S") AND ssac.wiki IN ("astwiki", "bswiki", "cawiki", "dawiki", "dewiki", "elwiki", "enwiki", "eswiki", "fawiki", "frwiki", "fowiki", "glwiki", "hewiki", "huwiki", "iswiki", "itwiki", "kowiki", "lbwiki", "mkwiki", "mlwiki", "nlwiki", "plwiki", "ptwiki", "ruwiki", "simplewiki", "svwiki", "viwiki", "ukwiki", "zhwiki", "jawiki") GROUP BY 1,2; CREATE UNIQUE INDEX wiki_user ON staging.month_gs_user (wiki, user_id); SELECT NOW(), COUNT(*) FROM staging.month_gs_user;
After I intersect the two, then I can answer all but the revert question. --Halfak (WMF) (talk) 23:55, 24 March 2014 (UTC)