Research talk:Onboarding new Wikipedians/Rollout/Work log/2014-03-27
Thursday, March 27th
editToday, I'm working on measuring the revert rate of GettingStarted edits. In order to do this, I plan to sample one edit per newcomer on a GS wiki during the 30 days in question. In order to do this, I'll need to gather all edits that newcomers made in their first 24h. I'll need the set labeled by wiki, user, rev_id, page_id and gs_tagged (boolean -- tagged as "gettingstarted edit"?).
So first things first, I need those edits from each of the GS wikis.
SELECT
DATABASE(),
rev_id,
user_id,
rev_page AS page_id,
gs_edit.ct_rev_id IS NOT NULL AS gs_edit
FROM user
INNER JOIN logging ON
log_user = user_id AND
log_type = "newusers" AND
log_action = "create"
INNER JOIN revision ON
rev_user = user_id AND
rev_timestamp BETWEEN
user_registration AND
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL 1 DAY), "%Y%m%d%H%i%S")
INNER JOIN change_tag gs_edit ON
ct_rev_id = rev_id AND
ct_tag = "gettingstarted edit"
INNER JOIN page ON
rev_page = page_id
WHERE
page_namespace = 0 AND
user_registration BETWEEN "20140211183000" AND "20140313183000";
Time to run that cross-wiki. --Halfak (WMF) (talk) 19:43, 27 March 2014 (UTC)
Fixed the query since I forgot the to join to change_tag and get "gettingstarted edit"s. --Halfak (WMF) (talk) 19:49, 27 March 2014 (UTC)
So... there's no index on user_registration in most MediaWiki databases. This means that running this query on just users in the relevant month will take quite a long time. In the meantime, I wrote a script to grab the first revision for each user and check if it was reverted. Time to go work on some schemas while I wait. --Halfak (WMF) (talk) 22:02, 27 March 2014 (UTC)