Research:Onboarding new Wikipedians/Rollout
On February 11th, Extension:GettingStarted was deployed on 29 wikis. Later, it was updated to the current state of 30 wikis, including all of the top 10 Wikipedias by pageviews.
The purpose of this study is to measure the scale at which GettingStarted operates (e.g. how many newcomers on Wikimedia Projects received a GettingStarted intervention?) and to get a sense for the impact that the new features have on newcomer behavior.
RQ 1: How is GettingStarted being used?
- How is GettingStarted being used?
- How many newly registered users saw each CTA?
- How many of those editors edit -- through GS or otherwise?
- Are GettingStarted edits reverted more often than non-GettingStarted edits?
RQ 2: How has GettingStarted affected newcomer activation and productivity?
- How did the proportion of new editors (editor activation) change after GettingStarted was deployed?
- How did the proportion of productive new editors (editor productivity) change after GettingStarted was deployed?
Code repository: https://github.com/halfak/Measuring-the-impact-of-GettingStarted
Based on config and Server admin log we can determine when GettingStarted was deployed.
In order to measure the usage of GettingStarted, we observe and compare the number of newly registered users across Wikimedia projects with the number of users with a recorded impression of GettingStarted (see Schema:GettingStartedRedirectImpression). We also observe the number of edits made via GettingStarted through the application of a change tag: "gettingstarted edit".
Assuming a natural experimentEdit
In order to address RQ 2, we'll be assuming that a natural experiment took place immediately after GettingStarted was deployed. We take advantage of this by comparing metrics of new user activation and productivity before and after deployment. Since the only way to take advantage of GettingStarted's functionality is to be served a CTA immediately after registering an account, there shouldn't be substantial concern about measuring those editors who registered immediately before GettingStarted's deployment.
As opposed to controlled experiments, natural experiments have the potential for confounds to affect inference about causation. A trend that was taking place in a wiki independent of the deployment of the GettingStarted deployment will look like an effect of GettingStarted in the analysis. Thus, it's important when viewing the results to consider this potential issue.
In order to compare new editor fitness before and after deployment, we sampled newly registered users from the two weeks immediately before and after the deployment dates. Figure #Natural experiment sample periods depicts these sample periods visually.
In order to determine how many observations would need to be sampled, we performed a power analysis for several baseline rates and expected changes. Figure #Power analysis plots the p-value of a Chi-squared test for various levels of baselines and changes. We chose a minimum number of observations at 500 since that was the smallest number of observations that will still let us identify significance for large effects. We define "large effects" as twice the observed effect in English Wikipedia for GettingStarted (which ranged from 1.5-3% depending on the metric, so we settled on 5%). 16 wikis had at least 500 newly registered users in the sample periods (es, fr, zh, ru, de, pt, it, fa, nl, pl, vi, sv, uk, ko, hu, he, el). We set the maximum number of observations at 2000 since most changes would appear to be significant at that number of observations and setting an upper bound reduces the processing time necessary.
- Boolean measures
- New editor rate (new editors / newly registered user)
- Productive new editor rate (productive new editors / newly registered user)
- Returning new editor rate (returning new editors / newly registered user)
Differences in proportions between before and after periods are identified using a en:Chi-squared test.
- Scale measures
- Revisions in 24h
- productive edits in 24h
- edit sessions in first week
- time spent editing in first week
Differences in expected values between before and after periods are identified using a logged en:t-test.
RQ 1: How is GettingStarted being used?Edit
What proportion of users saw/used a GettingStarted CTA?Edit
In order to get a sense for what proportion of newly registered users were affected by the deployment of GettingStarted, ran a set of queries to count the number of newly registered users we saw across all Wikimedia projects and tracked their activities as they navigated various funnels that GettingStarted provides. Figure #Group funnel proportions displays the proportion and raw counts of users who made it to each step in the funnel.
Who saw GettingStarted's CTA? Since the GettingStarted experience is currently only available for desktop users. (TODO: link to design docs for GS like experience on mobile) Of the 336,310 newly registered user who registered during our 30 day period after deployment, 273,169 (81.23%) of them registered though the desktop interface. 218,968 of these desktop users registered on one of the 30 wikis were GettingStarted was deployed. 143,627 of the desktop users who registered on GettingStarted wikis saw a GettingStarted CTA. In other words:
42.7% of newly registered users across all projects had the opportunity to take advantage of GettingStarted.
Which CTAs did they see? Of these users who saw a change to the their post-registration experience, the plurality (46.49%) saw the CTA that only asked them if they would like to see suggested tasks for them to perform (see Suggest only CTA). Most often, the "Edit this page" option was not available because the redirect page was a protected article (54.55%) or a page in the Project namespace. The next most common CTA was the combined "Edit this page or Find easy tasks" (see Edit & Suggest CTA). 39.6% of users who saw any CTA saw this one. Finally, 13.91% saw the CTA with only the option to "Edit this page" (see Edit only CTA). These users were predominantly on wikis that lacked suggested tasks (98.9%).
Reverts of GettingStarted editsEdit
One of our concerns with tagging edits "via Getting Started edit suggestions" was that it might draw additional attention from Wikipedians and encourage extra scrutiny of edits made through GettingStarted. If GS tagged edits are receiving extra scrutiny, then we'd expect the rate of reverts for these edits to be higher. To check this hypothesis, we gathered all of the 1st edits performed by newcomers who registered during our 30 day period and detected which revisions were reverted within 48 hours.
Figure #Comparison of revert rates plots the difference between the revert rate of 1st edits not made through GettingStarted with the revert rate of 1st edits made through GettingStarted. Note that in all but a couple of cases, the 95% confidence interval's error bars cross the zero line. This means that there's no significant difference between the revert rate for GettingStarted and non-GettingStarted edits on those wikis. However, there are three Wikis that did see significant differences: viwiki and cawiki, saw higher revert rates for GS edits and enwiki saw lower revert rates for GS edits.
It's important to note that, which such a high number of tests at a 95% error cutoff, we should expect to see a 1-2 wikis report a Type I error. With this in mind, the significant differences observed for viwiki and cawiki should be taken with a grain of salt. However, with English Wikipedia, we had such a large number of observations that the result is clearly significant. It appears that GettingStarted edits are reverted significantly less often than than non-GettingStarted edits.
RQ 2: How has GettingStarted affected newcomer activation and productivity?Edit
In order to look for evidence of changes in the activation and productivity due to the introduction of GettingStarted, we used an array of metrics to measure newcomer performance before and after the deployment of GettingStarted.
The figures below plot the difference between metrics before and after the deployment. When the plotted value is above zero, that means an increase in the metric was observed. Overall, the results fail to demonstrate a clear difference in the before and after state of these Wikis.
While some wikis show significant differences under some metrics, this type of statistical error is expected to happen with 95 confidence intervals in about 1/20 tests. Here, we see 10 instances of significant results out of 112 tests:
- Dewiki showed a significant drop in the rate of new editors
- Plwiki showed a significant increase in the rate of returning new editors
- Eswiki, Itwik and Plwiki show a significant increase in the number of productive edits newcomers performed in their first day.
- Plwiki saw a significant increase in the number of newcomer edit sessions while Frwiki saw a significant decrease
- Plwiki and Ukwiki saw a significant increase in the amount of time spent editing while Frwiki saw a significant decrease
Given the lack of a clear trend cross-wikis and the lack of an obvious correlation between the availability of suggested tasks in the user experience and performance outcomes, it's not clear from these results that GettingStarted is having a measurable effect in the short term. Future work may reduce noise and potential confounds by running a controlled experiment on these wikis.