Research talk:Autoconfirmed article creation trial/Work log/2017-08-16
Wednesday, August 16, 2017
editToday I'm working on visualizing data about activity for newly registered accounts, whether they reach autoconfirmed status, etc. Similarly as for the work done on Aug 8, we are mainly interested in understanding if there's a difference between accounts that get auto created.
Note that we combine three types of account creation ("create", "create2", and "byemail") into a single type (called "create"). The number of accounts created for others ("create2" and "byemail") are consistently low (see the Aug 8 work log). Secondly, we suspect that these generally behave like "regularly created" accounts.
Proportion of accounts with non-zero edits
editH2 concerns the proportion of accounts that are created and make at least one edit. We first plot the proportion of all registered accounts that made at least one edit, measured per day:
There is a lot of information in this plot. First, we can see that the proportion is stable from 2009 and roughly halfway through 2012, with 30% (or slightly more) of all registered accounts making at least one edit. There is a spike in 2011, which appears to coincide with the missing data on autoconfirmed accounts. The proportion declines in the second half of 2012 and maintains a lower level of around 27–28% until the second half of 2014. We then see some massive drops due to the SUL finalization project, which was completed in the first half of 2015. From then on, we see a slightly upwards trend, but still at a lower level (20–25%) than what we saw previously.
How does the trends differ between autocreated other accounts? In the plot below, we make separate plots of those:
Here we can see clearly that the proportion of autocreated accounts with non-zero edits is consistently much lower than the rest. We can also see that the proportion of other accounts that make at least one edit is more similar throughout the years, whereas in the overall plot we had a larger difference. For example, we see that in the period 2012–2014 the proportion tends to be between 30% and 40%, which is also where it has been since 2016. Lastly, we can see a slight but downwards trend for non-zero edits by autocreated accounts, a trend that is easier to see if we plot that proportion by itself:
In the plot above, it is easier to see that the proportion has decreased from around 6% to around 4%. We know from the analysis done on Aug 8 that the raw number of autocreated accounts appears to have increased, which can explain the decrease if most (or all) of them do not make any edits.
Proportion of accounts reaching autoconfirmed status in 30 days
editH3 concerns the proportion of accounts reaching autoconfirmed status, limited to the first 30 days. Similarly as we did previously, we can generate plots of these. First, the overall proportion of registered accounts that reach the threshold:
Overall, we see a stable or slight downwards trend until SUL finalization started, and since then an upwards trend. If we also compare this with the plot of overall number of created accounts, we might conclude that the number of accounts reaching the threshold is fairly stable, while the number of created accounts fluctuates.
Perhaps more interesting is the proportion of accounts making at least one edit that reaches the threshold. This is plotted below:
While there is a lot of noise in this due to the large fluctuations, we can see that there has been little change in the overall proportion. It does change from year to year and season to season (e.g. notice the increase in the second half of 2016), but tends to be in the 10–12% range.
We get further insights when we split the plot into autocreated accounts and the rest:
Here it is fairly clear that the proportion of non-autocreated accounts that make at least one edit and reach the threshold is rather stable around 10%. It is also clear that a lot of the fluctuation occurs in the autocreated accounts, likely due to the low number of those types of accounts (as we saw previously, most autocreated accounts do not make any edits in the first 30 days).
The spike in September 2014 is two specific days (September 15 and 17) where an unusually high number of autocreated accounts had non-zero edits (169 and 158, respectively, which is 162% and 173% (respectively) of the median (97.5) for that month). This spike is completely driven by contributors going through The Wikipedia Adventure (TWA). There were 146 autocreated accounts in total across both days, of which 117 (80.1%) went through TWA. That leaves 23 accounts that did not, which is on par with the median of 12.