This page documents a completed research project.


Methods edit

To test the effects of VisualEditor, we performed a controlled test on the English Wikipedia. During an 86-hour assignment period, newly registered user accounts were randomly bucketed (round robin) into two experimental conditions.

control
New users who received an odd user ID on registration were given an experience that matches the way that Wikipedia worked before VisualEditor was deployed, i.e. the default wikitext editor.
test
New users with an even user ID had VisualEditor enabled and could edit in VisualEditor (via the Edit link) or in wikitext (via the Edit source link). Section edit links for users in this condition opened the VisualEditor screen (see browser support).

During the assignment period, a total of 19,145 new accounts were registered. After filtering out accounts that were "autocreated" for users with accounts that were previously registered on other language wikis, 7602 users were placed in the control condition and 7572 users were placed in the test condition.

We monitored the activity of these users for 72 hours (observation period) following each user's registration date and generated a set of metrics based on their activity intended to help answer our research questions. Note that on July 1st at 21:00 UTC, the VisualEditor was enabled for all registered users and that our data analysis stops at that point so as to not be confounded by the switch.

Metrics edit

Edit
A revision to an article in the main namespace (including new page creation)
Productive edit
A revision in the main namespace that was not reverted within 48 hours.
Productive user
A user who made at least one productive edit during the 72-hour observation period after registration
Block
A user who was blocked during the observation period (detected using a query to the logging table)
Edit sessions
A proxy for the number of times an editor came to Wikipedia and began editing. See Research:Metrics/edit sessions
Time spent editing
The sum total time covered by edit sessions (with a supplemental 430 seconds added per Research:Metrics/edit sessions)

Some of these measures rely heavily on detecting reverted revisions. We opted to use the identity revert method for detection.

Timeline edit

2013-06-25 07:00:00 – 2013-06-28 21:00:00 (UTC)
Newly registered accounts are bucketed into experimental and control conditions (assignment period)
2013-06-28 21:00:00 – 2013-07-01 21:00:00 (UTC)
Bucketed user activity is monitored (observation period)

Results and discussion edit

Quantity of contribution edit

The analysis seems to consistently suggest that newcomers with VisualEditor enabled performed less work than editors using the standard wikitext editor. Figures 1, 2 & 3 suggest that (1) the average number of article edits performed, (2) the average number of productive edits, and (3) the average amount of hours spent editing by newcomers in the test condition during their first three days was substantially lower than that of newcomers in the control condition.

The smoothed histograms presented in Figures 4, 5 & 6 help us get a sense for where this difference manifests. Note the consistency with which the line for control appears above test for all but the lowest amounts of activity (3–4 article edits, 1–2 productive edits and 15 minutes of editing time).

 
Figure 1. Article edits. The geometric mean number of article revisions is plotted with standard error bars by condition.
 
Figure 2. Productive edits. The geometric mean number of productive edits is plotted with standard error bars by condition.
 
Figure 3. Hours spent editing. The geometric mean hours spent editing per user is plotted with standard error bars by condition.
 
Figure 4. Histogram of article edits. A smoothed histogram of the number of productive edits is plotted by experimental condition.
 
Figure 5. Histogram of productive edits. A smoothed histogram of the number of article revisions saved is plotted by experimental condition.
 
Figure 6. Histogram of hours spent editing. A smoothed histogram of the number of hours spent editing is plotted by experimental condition.

Discussion edit

While the results indicate that users in the test condition performed significantly less work compared to control, this might be due to a number of factors that affected user experience, in particular:

  • a potential confusion for those users who previously edited anonymously and attempted to use wikimarkup in visual editor
  • an inconsistent experience produced by known browser support limitations: users with a blacklisted browser didn't see the VisualEditor UI when clicking on the edit link; users with browsers that were neither blacklisted or whitelisted may still have experienced a number of bugs preventing them from successfully completing an edit via visual editor.

Burden and productivity of newcomers edit

 
Figure 7. Block proportion. The proportion of blocked users is plotted with standard error bars by condition.
 
Figure 8. Aggregate revert proportion. The aggregate proportion of article revisions that were reverted (not by self) is plotted with standard error bars by condition.
 
Figure 9. Reverts performed by agent type. The proportion of reverted main revisions is plotted by the type of agent performing the revert for both experimental conditions.

We could not find any strong evidence that the VisualEditor affects the amount of burden inflicted by newcomers on current Wikipedians. Figure 7 does not show a significant difference in the proportion of newcomers who were blocked between the experimental conditions. Similarly, figure 8 also does not show a significant difference between the proportion of article revisions that were reverted between the groups. While these error bars appear deceivingly wide in the plots, it's important to note the scale of the Y axis. Both differences represent less than 0.25%.

We also did not observe a significant difference in how newcomers revisions were reverted. Figure 8 shows that roughly similar proportions of revisions were reverted by various agents.

  • self - Reverts performed by the editor who saved the original edit
  • tool-assisted - Reverts performed using Rollback, WP:Huggle or WP:Twinkle
  • vandal-bot - Reverts performed by User:Cluebot NG or User:XLinkBot
  • other-bot - Reverts performed by other robots.
  • manual - Reverts that were performed manually using Wikipedia's standard interface (note that this includes use of WP:Undo)


 
Figure 10. Productive editor proportion. The proportion of editors who make at least one productive edit is plotted with standard error bars by condition.
 
Figure 11. Aggregate productive edit proportion. The aggregate proportion of productive edits is plotted with standard error bars by condition.

We could not find any evidence of a difference between the quality of work performed by editors in the experimental conditions. Figure 10 shows an insignificant difference of less than 1% in the proportion of editors who made at least one productive edit. However, figure 11 suggests that the aggregate proportion of revisions to articles that were productive by users in the test condition was slightly larger, but this difference is only marginally significant (χ2 = 2.46, p = 0.116) and such aggregate measures are susceptible to the effects of outliers (individuals who perform many edits).


Discussion edit

At the time of the experiment, VisualEditor itself had many known bugs. The result of this is that, during the experiment, users in the test condition were exposed to bugs that might have resulted in unintentional damage to an article, which could have boosted the revert rate for this group of users. This unintentional damage manifests in two ways:

  1. “dirty diffs” and other problems between Parsoid and the VisualEditor
  2. for those users in the test group who had previously edited anonymously (and been exposed to wikitext), there is anecdotal evidence that such users were mistakenly typing wikitext into the VisualEditor, which leads to <nowiki> tags being placed and the reader being exposed to markup. It is somewhat surprising, in this situation, to see that the revert rate for the test group was not significantly boosted.

Editing ease edit

control (no VE) test (VE)
total 7602 7572
> 1 edit 2382 2295
proportion 0.313 0.303
 
Figure 12. Proportion of users who edit. The proportion of users who save at least one edit is plotted by experimental condition with (normal approx) standard error bars.

To look for evidence that the VisualEditor changed the ease of editing, we measured the proportion of newly registered users in each condition that managed to save at least one edit in their first 72 hours. Figure 12 shows a marginal difference between the proportion of users who save at least one edit. Newcomers with the VisualEditor enabled were slightly less likely to save a single edit than editors with the wikitext editor (χ2 = 1.821, p < 0.177).

Discussion edit

As mentioned in the discussion of Quantity of contribution, several known and unknown VisualEditor bugs may have prevented newcomers from saving changes to articles. The decreased probability of successfully saving an edit discussed above could be the result of such bugs.

Future analysis should make use of EventLogging to identify where the edit funnel (from a click on the edit button, to a save attempt and a save success) breaks down for users with VisualEditor enabled, as a function of their browser.


General discussion edit

The results of the test vary. While the overall sample of actions was large enough to gain statistical significance, there was no statistically significant difference in the likelihood of a user being blocked or reverted. There was also an absence of statistical significance in the aggregate proportion of productive edits, the proportion of editors who make at least one productive edit, or the number of editing sessions each editor participates in. Broadly speaking, there is no discernible difference in the quality of edits and editors.

Users with VisualEditor enabled were found to perform significantly less work by several measures. Users with the VisualEditor enabled make fewer edits to articles, fewer productive edits and spend less time editing than users with VisualEditor disabled.

Limitations edit

There were several major limitations to applying these results to an evaluation of the effectiveness of VisualEditor, both in terms of applying to newcomers overall, and in terms of applying to logged-in newcomers. In particular:

Limited observation period edit

Our bucketing and observation periods were very brief. Newcomers were bucketed into the experimental conditions over an 86 hour period and observed for a 72 hour period. Users who register accounts during a certain part of the week (e.g. weekdays vs. weekend) may have different outcomes with the visual editor. Our 86 hour bucketing period was limited to weekdays.

The 72 hour observation period is limiting as well. It could be that editors exhibit interesting behavior after the first 72 hours since they registered their accounts. If so, our analysis would not capture such behavior.

Users with previous anonymous activity edit

The A/B test was only applied to logged-in users. This causes two issues: the first is that it cannot be applied to newcomers overall, which includes IPs, because the standards of behavior and demographics of each group may be very different. The second is that users who show an interest in creating an account and then editing for it may, in part, have previously contributed from IP addresses - using wikimarkup. They then create accounts, are presented with the VisualEditor, and instead of having to learn how to use it, have to learn both how to use it and to overwrite their previous domain knowledge around markup. If this is the case, it would not be surprising to find that a chunk of the users in the test group became disheartened and left, dragging down the results from the group overall.

Browser support edit

Browser support during the test was handled via a blacklist and a whitelist.

  • The whitelist consisted of “those browsers known to work” with the current version of the software: Firefox ≥ v11, Iceweasel ≥ 10, Safari ≥ 5, Chrome ≥ 19. Users with whitelisted browsers in the test group were displayed the regular VE interface when clicking on the Edit button.
  • The blacklist consisted of those browsers that would, under no circumstances, work - Android, Blackberry, and all Opera and IE versions. Users with a blacklisted browser only saw the wikitext editor.

Any browser not covered by those groups (such as Firefox < v11, Iceweasel < 10 ) would be let through to edit using the VisualEditor, albeit with a warning. This was based on feedback from community members that these browsers worked with the VE—something that later turned out to be incorrect. While the browsers in question only comprise about 2.1% of requests (Safari, not being independently tracked, is hard to discern) it meant that a chunk of the users in the testing group may have been, by technological choice, incapable of correctly using the VisualEditor, and were presented with it anyway. The lack of instrumentation for VE to collect user-agent data kept us from logging and diagnosing browser-specific issues.

Summary edit

It appears that issues with VE that prevented newcomers from using it to complete edits had a significant effect on the experiment and the analysis. Newcomers with VE enabled performed less wiki-work and spent less time editing overall. They were also marginally less likely than users with the wikitext editor to eventually save an edit.

We didn't see any meaningful differences in the amount of burden that VE newcomers placed on current Wikipedians or in the quality/productivity of their contributions. A similar proportion of VE newcomers' revisions need to be reverted (which is, in itself, a surprise - since the VE was creating "dirty diffs" that needed to be reverted, one would expect to see a higher revert rate for users with VE) and a similar proportion of VE newcomers needed to be blocked in comparison with wikitext newcomers.