User:Halfak (WMF)/Pre-registration anonymous activity
This analysis was included in a report that can be found at: Research:Anonymous_editor_acquisition/Volume_and_impact |
Using the cu_changes table, we generated a dataset that contains a newcomers' pre-registration and first session activity. See the following diagram to get a sense of the kind of activity we're interested in:
In order to generate this, wegathered a sample of newly-registered users from the cu_changes table using this query:
SQL source code
|
---|
SET @month_end = (SELECT max(rc_timestamp) FROM recentchanges);
SET @month_start = DATE_FORMAT(@month_end - INTERVAL 30 DAY, "%Y%m%d%H%i%S");
SELECT
user_id as id,
user_name as name,
user_registration as registration,
user_editcount as editcount,
cuc_ip as registration_ip
FROM user
INNER JOIN cu_changes ON
cuc_user = user_id AND
cuc_type = 3 AND
cuc_actiontext LIKE "User account % was created"
WHERE user_registration BETWEEN @month_start AND @month_end
AND user_editcount > 0
AND user_id % 10 = 0
|
Using the registration_ip from this dataset, we scan the revision
and archive
looking for revisions with a user_text
field that corresponds to the registration_ip
. Using the edit session method with a 1 hour cutoff, we then gather the last session before registering (if any) and the first session since registering.
Results
editIn my sample, 6.9% of newcomers who ended up making at least one edit in their first week edited as an IP before registering their account. I'm generating some histograms now to get a sense for how much they before and after registering.
So how do users who edit before registering behave compared to users who don't? The following plots include data from both the pre-registration session and the post-registration session.
Data Table
|
---|
pre_session revisions.geo_mean revisions.geo_se main_revisions.geo_mean 1: FALSE 1.723199 0.01047070 1.115439 2: TRUE 4.281658 0.03594024 3.452403 main_revisions.geo_se revert_prop.mean revert_prop.sd productive.k n 1: 0.009144279 0.2321521 0.4122159 2342 4564 2: 0.037101374 0.3916604 0.4287481 232 340 productive.prop productive.se revert_prop.se 1: 0.5131464 0.007398557 0.006101715 2: 0.6823529 0.025248611 0.023252133 |
What if we limit the analysis to just the post-registration session?
Data Table
|
---|
pre_session revisions.geo_mean revisions.geo_se main_revisions.geo_mean 1: TRUE 2.178460 0.04449625 1.728482 2: FALSE 1.723199 0.01047070 1.115439 main_revisions.geo_se revert_prop.mean revert_prop.sd productive.k n 1: 0.038133895 0.3089849 0.4439494 195 340 2: 0.009144279 0.2321521 0.4122159 2342 4564 productive.prop productive.se revert_prop.se 1: 0.5735294 0.026821492 0.024076534 2: 0.5131464 0.007398557 0.006101715 |