Research:Visual editor for anonymous users, 2016

In March 2016, the WMF Editing department will conduct an A/B test of the visual editor for anonymous users on the English Wikipedia. The study plan and, later, the results will be posted here.

Methods edit

This study relies on the functionality of the single edit tab. It was not possible previously.

When an anonymous user clicks the edit button on a wiki with the single edit tab, they get sent to the default editor on that wiki. They also get a cookie with the name VEE whose content represents the editor they most recently used and therefore will get the next time they opened the editor: either wikitext or visualeditor. Every time they switch the editor, the content of the cookie updates. Every time they open or switch the editor, the expiration resets to 30 days.

This presence of cookie allows the software to determine whether an anonymous user has opened the editor in the past 30 days.[1] If not, we can be fairly confident that they are not a active anonymous user who already prefers one editor to another.

Users who have no cookie will become our experimental cohort. Instead of all being directed to the English Wikipedia's default editor (wikitext), we will randomly direct half of them to the visual editor (treatment group) and half to the wikitext editor (control group).

If they save edits, we will save their group in a hidden edit tag, so we can collect all the edits made by either group and investigate pattern.

Important assumptions edit

  • ORES scores provide a reasonably accurate ranking of the quality of anonymous edits specifically (this could be an issue because the most powerful predictor in a ORES score right now is whether an edit is anonymous).
    • Do qualitative sanity checking of ORES scores—from each strata of probability.

Outcomes of interest edit

We will look only at edits within content namespaces. The visual editor doesn't work on most other pages, including talk pages and templates, and most of our metrics don't apply to other pages (for example, edits to talk pages are almost never reverted, and citations and images are rarely used.)

Main (success criteria) outcomes edit

  • ORES scores of edits made (Mann-Whitney U)
    • Likely to use the damaging model. May use the goodfaith model, although it would be surprising if the visual editor actually increased the proportion of good-faith edits.
    • Use the filter rate at 90% recall to decide whether a change is substantial?
  • Proportion of edits reverted​ within 48 hours (Chi-squared)
  • Net change in images and citations (Mann-Whitney U)
    • Can use revscoring library to get this data pretty easily

Secondary outcomes edit

These are things we're just tracking out of curiosity.

  • Number of edits made
    • It's not clear which direction we want this to move. If anonymous users edit more, it could be a sign of greater productivity—or it could be a sign that they're making more formatting mistakes which they have to go back and correct.
  • Number of registrations
    • It's possible to do this with the checkuser logs up to 3 months after the experiment. However, we're not sure whether we'll do this as part of this experiment, because (1) we already studied VE's effect on registered user productivity in May 2015 and found no effect and (2) it's not clear which direction we want registrations to move. A decrease could mean that anonymous users have become less excited about editing, or it could be that they see less need to register because they don't need to do so to get the visual editor.

Not tracking edit

  • Blocks as a proportion of total edits (Chi-squared)
    • Since IP blocks apply to a specific IP without any additional qualification, it may be hard to link blocks to a specific fingerprinted user (who may be a subset of the edits from one IP). In addition, this isn't a priority because VE is unlikely to have an effect on behavior malicious enough to generate blocks. Studies of VE for new registered users found no such effect.

To do edit

  • Decide procedure for bucketing users
  • Do power analysis to determine necessary sample size.
  • Announce plans on Wikitech and the Village Pump.
  • Consider the playing-around effect.
  1. Assuming that they are using the same browser and have not cleared their cookies. These seem fairly reasonable assumptions.