Research:MoodBar/Controlled experiment results

Methods edit

For more information on the design of the controlled experiment, see the related page.

Metrics edit

Still active at   days
total number of users still active at   days since registration.[1] The metric is computed, checking for the presence of edits in the main namespace with timestamp after   days since the registration date.
Group size
the number of active users, that is, the size of the group on which the number of still active users is computed, sampled by day of registration.
Retention probability
With the above two we compute the probability of retention at   days as the number of surviving users over the group size. This metric is grouped by day of registration, so that we have 5 distinct retention time series.

Sample edit

We considered all active users registered since December 14, 2011-where "active" is defined as having clicked at least once on the edit button. Active users are split in three groups according to their date of registration (using the CentralAuth DB to consider only local enwiki users). Two groups have MoodBar enabled (historical and treatment) while the third one is a control group. The eligibility windows corresponding to the three group are summarized in Table 1:

Table 1: Eligibility window periods
Period Starting date Ending date Description
historical December 14, 2011 00:00:00 May 22, 2012, 23:59:59 MoodBar phase 3
treatment May 23, 2012, 00:00:00 June 13, 2012, 23:59:59 Enhanced tooltip deployed.[2]
control June 14, 2012 00:00:00[3] June 29, 2012 23:59:59 MoodBar blackout

Data were collected two months past the end of the eligibility window (August 28) to allow for a correct estimation of the retention at 30 days.

Results edit

Retention time series analysis edit

We define retention as a measure of activity after a given time span. We analyzed the retention of active new users over time by grouping them by the registration date. The data presents a strong weekly periodicity, as shown by Figure 1, where the group size time series (Fig. 1, top panel) is decomposed using moving averages into a trend, a seasonal, and a random component.[4] In this and in all plots of this section Time (sometimes called Lag) is measured in weeks. The data present a non-monotonic trend, with a rise in the first 8 weeks and a downward part for the remaining 23 weeks. Because the earliest data are from December 2011, the upward trend corresponds roughly to the months of January and February, while the downward portion corresponds to the March-July period. It is in fact well known that the number of new accounts registrations follows an annual pattern with similar characteristics, at least for the period of the year covered by our data.[5] The outlier in the random panel corresponds to the day of the SOPA/PIPA blackout.[6]

The number of users still active is strongly correlated to the group size, as can be seen from the scatter plot in Figure 2. This is for obvious reasons: some days of the week see more registrations than others, and thus when we group users by day of registration and count how many are still active at a later date, then we see also a higher volume on those days. This correlation disappears when we take the ratio to compute the retention probability. However, the result time series still presents a slowly decaying autocorrelation, as can be seen in Figure 3, which suggests the presence of a trend in the retention time series itself. We thus detrend the retention series using exponential smoothing.[7] The resulting time series is uncorrelated (Figure 4) and thus can be analyzed by means of a regression analysis.

 
Figure 1:
Decomposition of the group size into trend, seasonal, and residuals via moving averages.
 
Figure 2:
Scatter plot of number of still active users at different days since user registration (panels) on the group size.


 
Figure 3:
Autocorrelation plot of the daily retention probability.
 
Figure 4:
Autocorrelation plot of the daily retention probability centered on its mean after exponential smoothing.

Analysis of MoodBar effect on retention edit

In order to detect differences in the retention due to the effect of MoodBar, we perform a regression analysis. Starting from the raw data (Figure 5), we pre-process the data as described in the previous section to remove the weekly seasonality and the trend. Using the detrended and de-seasonalized dataset (Figure 6), we apply the regression analysis.

 
Figure 5:
Retention probability, raw data.
 
Figure 6:
Detrended and deseasonalized retention probability and linear fit (black line).

We fit a linear model to the data. The regressors are the age   at which the retention is measured (5-levels categorical variable) and the user group, which is our instrumental variable and corresponds to the eligibility windows reported Table 1. The residual deviance of the fit is   on 943 degrees of freedom, which means that the model fits the data reasonably well. The fit shows that both groups with MoodBar enabled (the historical and treatment groups) feature no statistically significant difference in retention from the control group, which had MoodBar disabled and thus unavailable. Figure 6 shows the fit of the model (solid black line) to the data (shaded area with colored contour), showing almost no difference across the three groups.

Call:
glm(formula = ret ~ age + group, family = gaussian(), data = RET.det, 
    weights = group.size)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.83908  -0.40839  -0.06274   0.35594   2.98096  

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.3310350  0.0036931  89.637   <2e-16 ***
age2            -0.0493948  0.0019727 -25.040   <2e-16 ***
age5            -0.0906009  0.0019727 -45.928   <2e-16 ***
age10           -0.1277728  0.0019727 -64.772   <2e-16 ***
age30           -0.1960255  0.0019727 -99.371   <2e-16 ***
grouphistorical  0.0001098  0.0035402   0.031    0.975    
grouptreatment  -0.0007302  0.0039780  -0.184    0.854    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for gaussian family taken to be 0.4230601)

    Null deviance: 5265.03  on 949  degrees of freedom
Residual deviance:  398.95  on 943  degrees of freedom
AIC: -4786.3

Number of Fisher Scoring iterations: 2

[1] "Residual deviance test for the GLM:"
     res.deviance  df p
[1,]     398.9457 943 1
> 

Discussion and summary edit

The results we report in this document are of negative nature: according to our analysis there is no significant difference in retention between users with MoodBar and users without MoodBar. However, there are some external limitations to the present analysis that induce us not to consider this result as definitive. The most important is the lack of data on more than one year of newly-registered active users.[8] As can be seen in Figure 5 -- and quite interestingly indeed -- the retention features a peaking pattern over the 8 months of data at our disposal. In the present analysis, we treat this pattern as a trend in the data and apply known time series analysis techniques for detrending the signal in our data. However, we cannot completely rule out that this pattern is instead due to specific seasonal behaviors in the user population and that it should be instead regarded as an annual periodicity. A simple hypothesis is that during the Winter period people tend to perform more indoor activities -- editing Wikipedia being one of these -- and thus the proportion of users still active is higher in users who register during this period than among those who register during the Spring/Summer, when people -- at least in the Northern Hemisphere -- tend to prefer outdoor activities to online presence.

The second limitation of the present analysis lies in the tiny effect size that we wish to detect. Every day, only around 2% of all new active users send an item of feedback. We know from a previous observational study that these users tend to enjoy higher retention probabilities, but because here we look at the average difference in retention on the overall population of active users, the size of the effect due to MoodBar that we wish to detect is indeed very small, which means that it is critical to employ the correct detrending/deseasonalization technique, something that it is not in the possibilities of the current dataset.

In summary, we present here an analysis of an experiment on the effectiveness of MoodBar in enhancing editor retention on the English Wikipedia. Because of the impossibility to perform a fully controlled experiment and for known self-selection biases, we resorted to manipulating an instrumental variable (the availability of MoodBar to a group of newly-registered Wikipedia users) and checked whether the probability of retention at   would decrease from the baseline scenario in which MoodBar is enabled by default on all newly-registered user accounts. We analyzed a sample of newly registered active user over an 8-month period. Our analysis shows that user retention features a clear -- and unexpected! -- seasonal component over the observation period, and strong autocorrelations. After accounting for these temporal factors, our analysis shows no significant difference in retention due to the presence of MoodBar. Further collection of retention data on a longer period is desirable in order to establish whether the present result is genuine or a byproduct of the lack of data.

Code edit

The scripts used to collect, analyze, and present the data are available here and are distributed under the General Public License, V2.

References edit

  1. This is essentially survival(t) aggregated over the day of new account registration.
  2. This deployment involved an enhancement of MoodBar's UI, as proposed here, and it was required in order to increase the sample size for our experiment on the effect of MoodBar on user retention.
  3. Users registered in this day are further excluded from the analysis because deployment of the blackout code takes several hours and during this time it is not possible to ascertain whether a user had MoodBar enabled or not.
  4. These are the residuals obtained substracting the trend and the seasonal components from the observed data.
  5. See here for a year-to-year comparison.
  6. These plots include the SOPA/PIPA outlier. In fact, the analysis shows that, in terms of retention, that day is not particularly different from other days.
  7. We also tried detrending using an en:ARIMA model, with similar results.
  8. Whether a user belongs to the active users group or not is tracked by the EditPageTracking extension.