Research talk:Autoconfirmed article creation trial/Work log/2018-02-09

Friday, February 9, 2018

Today I'll continue working on getting final results for our hypotheses ready by wrapping up the writing for H6, and getting through H7.

H7: The average number of edits in the first 30 days since registering is reduced.

The preliminary analysis for historical data for H7 is on our August 17 work log. Similarly as for other activity-related statistics, we find that it is most meaningful to calculate the average for accounts that make at least one edit in the first 30 days. Otherwise we are simply filtering on whether accounts edit or not.

Looking at the second half of 2017, we find an increase in number of average edits around the time ACTRIAL starts. This is also present in the second half of 2015 and 2016. It is unclear whether this increase is the same or different from those years. Let's focus on the last two years of activity, and add a trend line to make it easier to see:

The graph above suggests that the activity level for autocreated accounts has increased from about 2.5 to 3 pages on average, after which it tends to fluctuate around a given mean. We can see that the activity level after the start of ACTRIAL appears to not be out of the ordinary. For non-autocreated accounts, the average number of edits is somewhat lower, and follows the general activity level of Wikipedia. And again there does not appear to be anything out of the ordinary happening around ACTRIAL.

In order to determine whether there is a change in the average number of edits during ACTRIAL, we first make a comparison of the first two months of the trial as a whole. Based on the graphs and analysis of previous hypothesis, we compare against the same time period of 2014, 2015, and 2016 taken together.

We first look at the number of edits for autocreated accounts, and find that there is no significant change. Because the distribution of edits is skewed, a few accounts make lots of edits while lots of accounts make few edits, we log-transform the data before applying a t-test. Secondly, we also use the non-parametric Mann-Whitney U test on non-transformed data. The t-test suggests a pre-ACTRIAL geometric mean of 2.57 edits, while for ACTRIAL it is 2.55 edits, a non-significant difference (t=0.616, df=8757.8, p=0.54). We find a similar result with the Mann-Whitney U test: W=45818000, p=0.67.

For non-autocreated accounts, we also find a very small change in the number of average edits, from 2.36 pre-ACTRIAL to 2.39 during ACTRIAL. This is found to be a statistically significant change with both the t-test (t=-3.284, df=161330, p=0.001) and the Mann-Whitney U test (W=1.5281e+10, p < 0.005). As we also discussed when looking at statistics on diversity of contributions, we note that for non-autocreated accounts our datasets contain a large number of accounts, which means that a small change in the mean number of edits is going to be statistically significant. There are 315,197 in the dataset of accounts from 2014, 2015, and 2016, and 97,532 in the ACTRIAL dataset. This leads us to also want to approach this using a time series perspective to see whether the change comes through an upwards trend in previous years or if it is particular for 2017.

As we have done for other analyses, we switch to monthly data as that facilitates training forecasting models. The graph for average number of edits per month over time looks like this:

The monthly plot makes the lower average number of edits for non-autocreated accounts perhaps easier to see. It also appears that the average was slightly higher in the early years of the graph. In the more recent years activity tends to follow a yearly cycle, while before then it might have been a two-year cycle. This makes it unclear whether we will see a difference for the first months of ACTRIAL.

We first investigate the time series of average number of edits for autocreated accounts. It does not appear to have a clear seasonal (yearly) cycle, and is not stationary. Examining the ACF and PACF of the time series suggests an ARIMA(2,1,0) or ARIMA(0,1,3) model. R's auto.arima function selects an ARIMA(0,1,2) model. Checking alternative models, we find that allowing the mean to change improves it clearly, so we choose to include a drift term. Using the model to forecast the first three months of ACTRIAL results in the following graph:

We can see in the forecast graph that the true averages are very similar to the forecast, bringing us to the same conclusion as our previous tests, that ACTRIAL has had no effect on the overall activity level of autocrated accounts.

The approach for building a forecasting model for non-autocreated accounts is similar. Examining the time series shows that it's not clear whether it has a seasonal (yearly) cycle or not. The time series is stationary after differencing. The ACF and PACF for the time series (after differencing) suggests an ARIMA(0,1,2) or ARIMA(3,1,0) model as candidates. R's auto.arima function selects an ARIMA(1,1,1)(2,0,0)[12] model. We compare it to several candidate models, both with and without a seasonal component, as well as with and without drift. Neither appear to lead to any improvement, and we therefore use the auto-selected model. The forecast graph for non-autocreated accounts then looks like this:

We can see that the increase in average number of edits in September is outside the forecasted interval. If we look back at 2016, it appears that we had a similar increase then, making the 2017 one not out of the ordinary even though the model does not expect it. For October and November, the true value is within the 95% confidence interval of the forecast, indicating that the overall significant difference we found previously is not as important.

H7 hypothesizes that the average number of edits is reduced, but we find no indication of that occurring during ACTRIAL. Instead, activity levels are as we would expect. In conclusion, H7 is not supported.

Add topic