Research:Detox/Harassment and User Retention



One key factor in the success of open collaboration platforms, like Wikipedia, is their ability to continually attract and retain new volunteer contributors. In this study, we explore how negative interactions, like experiencing personal attacks, aggression, and toxicity affect the behavior and retention of new contributors to the Wikipedia platform. In particular, we investigate the following research questions:

  • Do newcomers in general show reduced activity after experiencing harassment? 
  • Does a newcomer's gender affect how they behave after experiencing harassment?
  • Has the effect of harassment on a newcomer's experience changed over the history of the platform?
  • How do good faith newcomers behave after experiencing harassment?
  • How does experiencing harassment compare to previously studied barriers to newcomer socialization?



We investigate the above research questions by evaluating various regression models on a sample of observational data. We do not attempt to make any causal claims, but will attempt to address several potentially confounding factors by controlling for them in our models. At a high level, we will model newcomers' engagement in the second month after registering as a function of a set of features characterizing their experience during the first month after registering. In particular, our dependent variable for all models in the study is the the number of days a user made a least one edit in any Wikipedia namespace in the second month after registering their account (m2_days_active). In order to investigate each of the research questions, we analyze a regression models using a subset of the following independent variables:

Features/Dependent Variables


m1_days_active: the number of days a user made at least one edit in any namespace in the first month after registering

m1_received_harassment: indicator for whether another user left a harassing comment on the newcomers user talk page in the newcomers first month after registering. We identify harassing comments using classifiers trained to detect personal attacks, aggressive tone, and unhealthy or "toxic" contributions from prior work. We consider a comment to be harassment if at least one of the three classifiers gave a score above the threshold maximizing its F1 score.

m1_made_harassment: indicator for whether the newcomer left a harassing comment on another user's user talk page in the first month after registration. A harassing comment is defined the same way as above.

has_gender: indicator for whether the gender of the user is known. Registered Wikipedia users have the ability to report their gender in their user preferences. However, the vast majority of users do not do so. This may be because reporting their gender is not important to them, they don't want to report a gender, or they simply are unaware of the feature. Furthermore, there is anecdotal evidence of users deliberately reporting their an incorrect gender. Overall, this means that we should expect users who report their gender to be quite different from the average user and we cannot be sure if reported genders are correct. Another caveat for the following analysis is that we do not know when the user reported their gender; they may have changed their user preference after our 2 month interval of interest, in which case there is a signal from outside of our time intervals of interest leaking into the models.

is_female: an indicator for whether the user reported having a female gender. The caveats from the above discussion of gender date in Wikipedia apply here as well.

m1_warnings: a count of the number of user warnings a user received in their first month after registration. Warning messages were identified by identifying any of the standard Wikipedia user warning templates within the Wikipedia Talk Corpus, an open data set of all Wikipedia user talk page comments.

m1_fraction_ns0_deleted: the fraction of edits the user made in their first month after registration to Wikipedia articles that were eventually deleted. An edit is deleted when the page the edit was made on is deleted.

m_fraction_ns0_reverted: the fraction of edits the user made in their first month after registration to Wikipedia articles that were eventually reverted. An edit is reverted, if another user subsequently restores the page to a state from before the edit.



We collected the above set of features for a mixed sample of 52,389 newcomers from 2001-2015. It mainly consists of a random sample of 50,000 newcomers who made at least one edit in their first month. However, we also included all  newcomers who both made an edit and received harassment in their first month. This has the benefit of giving more accurate coefficient estimates for m1_received_harassment, the main dependent variable of interest in this work, at the cost of giving potentially biased coefficient estimates for the remaining dependent variables compared to a true random sample.



The analysis is still WIP, but you can review our work on github.