Research:Newcomer survival models
How does long-term engagement work? What’s the difference between editors that peter out after 5 edits and those that make it to 100?
Inspired by the work that the growth team (formerly "E3") is doing to make joining Wikipedia a better experience for new users, I sought to better understand what retention looks like. When considering the 1 edit or 5 edit threshold currently employed, I suspected that a better way of operationalizing user activity would be by using edit sessions.
There are many types of "good" new contributors to Wikipedia. Very few would go on to become prolific Wikipedians with thousands of edits while many more may only stick around to write their pet article. In directing our efforts toward retention, we'll benefit from a more nuanced understanding of what characteristics of users predict their future engagement level independent of their experiences editing and interacting with other Wikipedians.
From previous work looking into predictors of retention, we know that we get strong predictive power that newcomers will last a long time from behavioral measures of the users first actions (e.g. the duration of their first session) and the reaction they receive (or lack thereof) from other Wikipedians(e.g. reverts, deletions and warnings).
In the past, E3/the Growth team has been highly successful in encouraging more newcomers to make their first edits, but we have struggled to demonstrate such success at higher thresholds. The purpose of this study is twofold: vet the current use of a 5 edit threshold for judging medium-term engagement and look for predictors of long-term engagement that the growth team might target.
Using thresholds to identify changes in engagementEdit
Current retention and engagement measures are based on the number of edits that newcomers perform. One common strategy that has recently been employed to evaluate the effectiveness of new features in engaging newcomers is to choose a threshold of edits and measure the proportion of newcomers who cross that threshold. Two commonly used thresholds are at 1 and 5 edits. While measurements of these proportions are likely to hold good information value, it would be nice if they represented some sort of qualitative significance.
If, for example, once an editor completes their 5th edit, they have crossed some threshold of engagement/investment that predicts the will be more likely to survive subsequent thresholds, then we that would be evidence of a qualitative threshold being located at 5 edits. While a threshold bearing mere quantitative significance is still useful for judging success, a qualitatively significant threshold would be desirable for two reasons:
- Real effects on the (somewhat nebulous concept of) medium- to long-term engagement would most obviously measurable at such a threshold.
- The proportion crossing this threshold confers some meaning that corresponds to experience or psychological effect that can be targeted in design.
I gathered a psuedo random sample of editors who registered their accounts on English Wikipedia in Feb. of 2012 (auto-created and email-created accounts were discarded). If a user's ID divided evenly by 5, I included them in the set. This allows me to conveniently regenerate data, but still select the same users and is sufficiently random-ish for my purposes. I tracked their editing activity using both the revision and archive tables and build summary statistics.
In order to get a better sense for what "survival" looks like for new editors, I wanted to plot a en:hazard function for newcomers after some time or number of actions. Hazard functions were invented to be used in survival models for exploring the effectiveness of treatments in a medical setting, so the terminology sounds a little gruesome.
- time unit
- Survival models evaluates survival in terms of units of time. This is usually represented by the number of hours, days, weeks, months that a patient survives, but can also represent the number of treatments, interactions, etc. In this study, I'll be using calendar time as well as the number of edits and number of edit sessions.
- Death refers to the point at which survival stops. I've operationalized editor death as 30 days with no editing activity based on the observation that editors rarely return to edit once they have been gone for 30 days.
- Hazard refers to the probability for "death" at a given time unit.
- time horizon
- With temporal data, there's always a point in time in which the data stops. Usually this refers to the date when the data was gathered. My time horizon for this dataset was "20130919123723".
- censored data
- Data points that are not complete before the time horizon are considered "censored". In medical studies, these are usually patients that survived until the end of data collection. In my case, "death" means not editing for 30 days, so editors who edit within 30 days of my time horizon were considered "censored". In this dataset, I had 214 censored users.
I suspected that by using edit sessions (see Research:Metrics/edit_sessions) as my time unit, I could gain some particularly useful insights into what survival of newcomers looks like. An edit session represents an editing interaction that a user has with Wikipedia. Sessions can be brief or long and can contain many or few activities, but I suspect that the number of times that an editor comes back to Wikipedia to make some edits will be telling of what's going on with them.
Figure 3 shows the declining population of surviving editors by the number of sessions they completed. Figure 4 shows the "hazard" of death by the number of sessions survived. Most readers should find figure 4 more informative. Reading the graph, we can see that 71.7% of newcomers who had at least one edit session will drop off before their second. However, for newcomers who have a second edit session, only 42.5% will drop off. Similarly, only 29.3% of newcomers will drop off after their third session. At this point, the gains in survival (loss in hazard) accumulate more slowly.
Time since registrationEdit
Time spent editingEdit
Cox regression modelEdit
In order to look for predictors of survival, I constructed a Cox proportional hazard regression model using sessions as the time unit. By using sessions as a time unit, I can generate interesting statistics during each session to use as predictors of survival. This first model is very simple and merely incorporates the duration in time of each session (session.minutes) as well as the number of main namespace edits made (ns0_revisions).
coxph(formula = Surv(session.start, session.stop, death) ~ session.minutes + ns0_revisions, data = merged_stats) n= 23456, number of events= 8946 coef exp(coef) se(coef) z Pr(>|z|) session.minutes -0.0102642 0.9897883 0.0006754 -15.197 <2e-16 *** ns0_revisions -0.0024103 0.9975926 0.0042239 -0.571 0.568 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 session.minutes 0.9898 1.010 0.9885 0.9911 ns0_revisions 0.9976 1.002 0.9894 1.0059 Concordance= 0.6 (se = 0.011 ) Rsquare= 0.02 (max possible= 0.998 ) Likelihood ratio test= 462.7 on 2 df, p=0 Wald test = 328.7 on 2 df, p=0 Score (logrank) test = 301.3 on 2 df, p=0
Curiously, the amount of time spent editing appears to be a better predictor of survival than the number of revisions made. For each additional minute spent editing in a session, the proportional hazard of "death" drops by .01.
In order to get a sense for what this means, I produced two plots that use this model to make predictions about the population proportion and hazard given three quantiles of session.minutes (50%, 75% and 95%).