Research:Task recommendations/Experiment one

This document describes the first controlled experiment to test task recommendations for Wikipedia editors by the Growth team.

Research questions


This experiment will address RQ 0 for our task recommendations research, which asks How will personalized recommendations impact editor productivity and retention? Our two hypotheses are:

Hypothesis 0.1: Delivering task recommendations will increase new editor activation rates.

Hypothesis 0.2: Delivering task recommendations will increase new editor retention rates.

By isolating the effect of the two delivery methods, we will also address RQ 2, which asks How does the delivery mechanism modulate the effect of recommendations on user behavior? Our hypotheses include:

Hypothesis 2.1a: Recommendations delivered via a post-edit modal will increase the overall number of recommendations that are accepted, but reduce the rate of acceptance per recommendation set.

Hypothesis 2.1b: Delivering recommendations in a post-edit modal will result in lower overall productivity than delivering recommendations on demand.

Hypothesis 2.2: Delivering recommendations via a flyout will decrease the number of recommendations viewed and accepted overall, but will result in a higher rate of acceptance per recommendation set.



Experimental conditions


We will deliver this experiment to all newly-registered user accounts, with four conditions:

  1. Control: where new editors receive none of the recommendation interfaces.
  2. Post-edit + flyout: where new editors receive both the flyout and post-edit recommendations, as described in the design specification.
  3. Post-edit: where new editors will receive only the post-edit recommendations, and will not see the link to the flyout.
  4. Flyout: where new editors will receive only the flyout with recommendations.

Each of these conditions should represent approximately 50% of the newly-registered user population. It will run for two weeks, on the following projects:

  1. English Wikipedia (enwiki)
  2. German Wikipedia (dewiki)
  3. French Wikipedia (frwiki)
  4. Spanish Wikipedia (eswiki)
  5. Italian Wikipedia (itwiki)
  6. Dutch Wikipedia (nlwiki)
  7. Hebrew Wikipedia (hewiki)
  8. Russian Wikipedia (ruwiki)
  9. Swedish Wikipedia (svwiki)
  10. Chinese Wikipedia (zhwiki, localized in zh-hans)
  11. Persian Wikipedia (fawiki)
  12. Ukrainian Wikipedia (ukwiki)

For the largest four projects (English, German, Spanish, and French), we will analyze the first week's metrics independently, since statistically significance will be achieved within that time period.


User behavior
In order to get a sense for the effectiveness of task recommendations on improving the user experience, we will focus our measurements on editor productivity and editor retention. Since it helps us with reporting our progress on objectives, we'll also be observing editor activation rates.
Use of recommendations
We'll also be measuring how many task recommendations are requested (Schema:TaskRecommendationLightbulbClick) delivered (Schema:TaskRecommendation) to how many users (Schema:TaskRecommendationImpression) and at what rate users click to accept recommendations (Schema:TaskRecommendationClick) and continue to make an edit to the recommended article (revision and archive tables).



The following EventLogging schemas correspond to this experiment: