Research:Post-edit feedback/PEF-2

This page in a nutshell: This page will host the results from the second iteration of R:PEF

This page documents the results of the second iteration of the Post-edit feedback experiment. The goal of the experiment was to determine whether receiving feedback had any significant desirable or undesirable effect on the volume and quality of contributions by new registered users, compared to the control group.

We measured the effects on volume by analyzing the number of edits, contribution size and time to threshold for participants in each experimental condition; we measured the impact of the experiment on quality by looking at the rate of reverts and blocks in each experimental condition.

Prior to performing the analysis we generated a clean dataset from the entire population of participants in the experiment to filter out known outliers and focus on genuinely new registered users.

Unless otherwise noted, all analyses refer to a 2-week interval since registration time to include a supplementary week after the 1-week treatment period. We report when significant differences emerge comparing the in-treatment and post-treatment period.

Research questions

RQ1. Does receiving feedback increase the number of edits?
RQ2. Does feedback lead to larger contributions?
RQ3. Does receiving feedback shorten the time to the second contribution?
RQ4. Does feedback affect the rate at which newcomers are blocked?
RQ5. Does feedback affect the success rate of newcomers?

Sample groups
Treatment	1 edit	5 edits	10 edits	25 edits	50 edits	100 edits
Control	3535	843	389	118	42	13
Historical	3607	853	371	99	33	11

Edit volume - Edit Count & Bytes Added

The following analysis address the question:

RQ1. Does receiving feedback increase the number of edits?

Edit count is the most direct measure of editor activity. We measured the total edit counts of new editors that were added by experimental condition in the first 14 days of activity since registration. New users would not receive the treatment message until after completing their first edit. Therefore, for each editor included in the experiment the first contribution was omitted.

RQ2: Does feedback lead to larger contributions?

The bytes added are computed in four ways for each editor:

Net - the net sum of bytes added or removed
Positive - the sum of bytes added

Below are the means of byte changed normalized by edit count for each group. We considered logarithmic transformations of bytes changed to work with normally distributed data. In order to perform the log operation on the distribution over "Net" byte count the net negative samples were ommited (about 15% of total samples). The samples for total bytes added for any given editor were normalized by edit count, so for example, if an editor had made five edits contributing 100,200,300,50, and 50 bytes the sample for this editor would be $(100+200+300+50+50)/5=175$ . Furthermore, the byte count data was verified to be log-normal under the Shapiro–Wilk test for each treatment and bytes added metric and, given rejection of the null hypothesis ( $alpha=0.05$ ), t-tests were performed over the transformed data sets.

Finally, the sample group was sub-sampled based on the milestones reached and analysis was executed separately for each of these groups.

At least one edit:

R Output Edit Count


[1] "Processing Metric edit_count ..."
[1] "Processing treatment control ..."
[1] "Sample Size: 5067"
[1] "Mean: 5.73810933491218"
[1] "Processing treatment historical ..."
[1] "Sample Size: 5106"
[1] "Mean: 4.76615746180964"
[1] "T-test for treatment1 historical under edit_count"

	Welch Two Sample t-test

data:  t1 and ctrl 
t = -1.332, df = 6475.449, p-value = 0.1829
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -2.4024405  0.4585367 
sample estimates:
mean of x mean of y 
 4.766157  5.738109