Research talk:Detox

Latest comment: 1 year ago by Nemo bis in topic Tool deleted

Discuss Applications


Comment in this section

Offer an API like that of ORES to let the community make their own applications. If editors want to know their civility score, let them request a bot through established channels, or write javascript to tell them.

Hold a contest for editors to cooperate to achieve the greatest number of new vital, high readership, or high importance good articles with the least amount of incivility per talk page bytes added.

Study the articles with the most and least reverts per time and compare them to the articles with the most and least talk page incivility.

Publish the civility scores of prominent Wikipedians and Foundation officials because Wikipediaocracy will anyway.

Investigate the correlations, if any, between the time to resolve disputes, the extent to which resolved disputes persist over time, and the extent of incivility per byte of talk page content.

Award the most civil editors valuable cash prizes. EllenCT (talk) 02:52, 16 July 2016 (UTC)Reply

create a detox tool talk page filter, for editors to set on their user space. Slowking4 (talk) 15:45, 5 August 2016 (UTC)Reply

Discuss Papers


Comment in this section

Hi, the way you are using the term "anonymity" both in the WMF blog post and in the paper itself strikes me as inappropriate. I commented on the blog post:

Interesting work, but as others have said, your reasoning in the passage “Registered users make two-thirds (67%) of attacks on English Wikipedia, contradicting a widespread assumption that anonymity is the primary contributor to the problem.” is deeply flawed.

Take the example of Qworty, whose exploits were covered by Andrew Leonard in Salon:

or Johann Hari:

Both had longstanding, very active registered Wikipedia user accounts. Yet their anonymity – no one knew their identity for years – was a key factor in enabling both their editing behaviour and their abrasive style of interaction (note Hari’s subsequent apology).

Safeguarding the anonymity of registered users is one of Wikipedia’s most cherished values (Wikipedia is similar to Reddit in this respect). This establishes the notion in contributors’ minds that they can never be “found out” if they do something underhanded, or morally questionable, and that their actions in Wikipedia are not part of their actions in the “real world” that they have to be personally accountable for.

This feeling of “safety” is reinforced by the fact that anyone trying to identify them will be banned from Wikipedia. In the Wikipedia value system, trying to hold people personally accountable for their actions is actually considered “wrong” – a complete reversal of the values that apply in most other societal contexts. (IP editors are less anonymous than registered users in one respect: their IP address gives away their approximate location.)

Of course there are many good reasons for contributor anonymity, but you cannot close your eyes to the downsides. They are deeply rooted in human psychology – it’s the same mechanism that causes people to do all manner of unpalatable things they would publicly disavow as long as they feel safe from repercussions: from hit-and-run accidents to opportunity theft to lying … or swearing at other drivers from the safety of one’s own car.

So please, look at this again!

Thanks for writing this argument out - much easier to read and respond here. If you look at the paper, I think you'll be please to see that we do show that unregistered edits are more likely to be personal attacks than edits by registered users. But the interesting thing is that the total number of attacks overall in unregistered (anonymous) comments are less - thus unregistered accounts are contributing to the total number of personal attacks less than comments by registered users. W.r.t. your examples: again, I recommend you read the paper where we also show that many of the personal attacks are by contributors who have many many edits. Lucas Dixon.

Dario pointed me to the actual paper on Twitter (and to this page), saying the presentation in the paper itself is more nuanced. But I find the wording there similarly problematic. It says, "Wikipedia users can make edits either under a registered username or anonymously. In the latter case, the edits are attributed to the IP address from which they were made."

This elides the fact that user registration in Wikipedia is anonymous. Usernames in Wikipedia (much like user names in Reddit, or the pseudonyms used in past centuries for the publication of anonymous pamphlets) are explicitly designed to safeguard contributors' anonymity, with strict site policies designed to protect authors from unwanted disclosure of their real names.

I appreciate the desire for clarity, and I agree the blog post could provide a definition, like we do in the paper. But I don't agree agree with your criticism of the paper here: you selected one sentence, but if you read the rest of the paper you'll see we also write: "Wikipedia users can make edits either under a registered username or anonymously. In the latter case, the edits are attributed to the IP address from which they were made" - and we provide a footnote about the consequences of IP address usage, which I think is also clear. The paper also clearly states that anonymous contributions still have a higher proportion of attacks than registered edits. The interesting result is that the majority of personal attacks on the platform are not from unregistered contributions, but from registered accounts. I think the blog post does convey that, although as you point out, and I agree, the term anonymous is not well defined, and many people have different interpretations. I'll clarify that in the paper abstract to avoid further confusion there too. Lucas Dixon

Registration of a user account in such an environment (as opposed to sites like Facebook asking for real-name registration) cannot be considered equivalent to non-anonymity. In practice, registered users are more anonymous in Wikipedia than IP editors, whose publicly posted IP addresses freely give away their locations, whereas it is considered a severe breach of Wikipedia site policy to even speculate on a registered user's location.

"registered users are more anonymous in Wikipedia than IP editors": that's a particularly interesting point, although not one I expect most contributors think about. I wonder how one would tell how many contributors think about this when making edits? It's also false for users who use proxies (which is pretty much everyone on a mobile phone) in the sense that anyone can identify your location from the IP; similarly for users of anonymization tools like Tor, or VPNs (probably a minority for most countries). So my expectation is that most people who contribute would still consider "registered users less anonymous than a unregistered" (claim1), and the reputation associated and built up with an account is likely to be the key characteristic that affects behavior (claim2). Real name policies would add more reputation associated to claim2, and private posting would maybe make it less. So overall, I still find the notion of registered users being less anonymous than unregistered seems reasonable. But curious about other arguments in this space? Lucas Dixon

To investigate the effect of anonymity in Wikipedia, you would have to compare the relatively small population of Wikipedia users who have voluntarily disclosed their actual (and demonstrably true) identity to the much larger population of pseudonymous contributors, and/or compare user's pre-disclosure and post-disclosure behaviour.

I think it is interesting to try and measure levels of anonymity, but quite difficult as its hard to define how "real" someone's online identify is; I guess a few metrics are has a real photo, real name, correct DOB, etc. Lucas Dixon

Would it be possible to tidy up the language around this issue in future publications? Regards, Andreas JN466 17:06, 10 February 2017 (UTC)Reply

I'm curious what you are suggesting, I think the language in the paper is specific an accurate. Lucas Dixon
You see, this is what happens when you use the word "anonymous" to mean something it doesn't. Ars Technica now tells its readers, "New study of Wikipedia comments reveals most attackers aren’t anonymous." Nowhere does the piece explain that the actual truth is the exact opposite: most of the attackers ARE anonymous contributors. Just to be clear: writing under a pseudonym, while hiding your actual identity, is a form of anonymous speech (see e.g. ).
I think how 'wrong' this is depends crucially on result of the discussion above: do people consider "registered users" less anonymous than unregisterd users. However, I do agree that this is likely to get mixed up with the real names policies etc, which are a separate discussion that is less clearly relevant to our research. Lucas Dixon
Both IP editors and pseudonymous editors are generally considered to be editing anonymously. Cf. [1].
Pseudonymous editors are –
  • less anonymous in that they may accumulate a consistent contributions history over time, and
  • more anonymous in that Wikipedia policy sternly forbids any speculation on their identity.
Unregistered (IP) editors are –
  • more anonymous in that the person's contributions may be spread across many different IP addresses, and
  • less anonymous in that any Wikipedian is entitled to point out on Wikipedia that an IP editor is editing from an IP address belonging to a company or government department – Wikipedia actually provides ready-made tools for such research (Whois, Geolocate etc.) – whereas making the same assertions about a registered editor (i.e. speculating about their location and place of work) would be a policy violation potentially worthy of a site ban. Andreas JN466 02:06, 12 February 2017 (UTC)Reply
Thanks for writing this out; my analysis is basically the same. The other dimension to anonymity is who it is w.r.t.; the observer is important. e.g. Wikimedia has the IP of all users over some period of time. So anonymous to Wikimedia is different to anonymous w.r.t. to other contributors; similarly for your ISP, etc. The important point of analysis is who has what information. Having said all that, I think the primary distinctions w.r.t. identity are: 1. is there a common identity/pseudonym that ties contributions together, and 2. is there some platform enforced link to real names or other characteristics of citizenship (e.g. DOB, photo, etc) - we can then ask how effective the enforcement is etc. So overall, I think (1) is the most important hallmark of most online notion of identify; but I also agree with your point that many analysis contrast anonymity with (2), so not being clear that one is considering (1) does lead to confusion. Lucas Dixon.
Note that I'm not claiming that Wikipedia editors editing under their real names are automatically less likely to engage in toxic behaviour than anonymous contributors. While in the cases described in the press articles linked above, anonymity was an acknowledged factor contributing to poor behaviour, we can all think of counterexamples, i.e. real-name editors that have very abrasive styles of interaction. The point is that this is simply not what you investigated, and the reporting makes it look like it is. Andreas JN466 15:14, 11 February 2017 (UTC)Reply
Again, I would recommend reading the paper and ipython notebooks: we do show that unregistered edits are *more* likely to be personal attacks than edits by registered users - but the interesting thing is that the total number of attacking unregistered comments is the minority; which suggests that the notion of forcing users to register would address only the minority of personal attacks; that is what we investigated. Lucas Dixon.

My suggestion, Lucas, would be to leave the word anonymous out of it. For example, you could refer to "unregistered users, whose edits are publicly attributed to their IP address" vs. "registered users, most of whom choose not to disclose their identity, but contribute under a pseudonym". Andreas JN466 01:53, 12 February 2017 (UTC)Reply

Thanks! We've updated the Blog posts to clarify this further. And the paper already says that unregistered edits are attributed to IP, but I've have a look at the abstract, and try to avoid further potential confusion there. Lucas Dixon.
Thanks for looking into it, Lucas, and thanks to you and your colleagues for liaising with Ars Technica for the update to their article. --Andreas JN466 13:31, 15 February 2017 (UTC)Reply

Discuss Getting Involved


Comment in this section

  • (moved from the research page) I am very interested in taking part in this project, but would you please direct me to where I would best register my interest? I don't see a link here and I would like to know where I can put my name on the table. Would you please also leave a message at my talk page or ping me so that I know where to respond? I find this project really fascinating and I think it's quite useful probably. The trick will be in defining what is actually harassment or personal attack, verses what is calling out harassment or personal attack done by others. It's really important not to let this thing Boomerang back on to the people who are trying to make a friendlier environment that is more collaborative. There is definitely a huge problem with in Wikipedia of people being toxic obstructionist editor's instead of collaborative. There is a great problem of lack of Integrity in Dialogue on talk pages. And there is really no enforcement mechanism at all that actually works. We have these boards and we have administrators but they really don't seem to work for my experience. Instead I think that we have roving gangs of editors enforcing certain agendas by being toxic and Wiki lawyering. SageRad (talk) 12:34, 30 May 2016 (UTC)Reply
    Hi SageRad, after your comment the Research:Detox/Participants has been created where you can state your interest to participate. Basvb (talk) 09:06, 26 June 2016 (UTC)Reply

Discuss Other


Comment in this section

Some thoughts -

  1. Is there so much harassment that we need machine tools to identify it?
  2. Human editors already detect a lot of harassment, and so far as I understand, could make an identical "harassment identification" to this tool.
  3. If reports from humans are not accepted, and so far as I know, they are not, then why is it useful to have an automated process to replicate something that can be done readily by humans?
  4. There was a proposal in the talk for a "chill out" button, which I think would be a human-equivalent tagging tool equivalent to machine detection. I wonder how human-detection and reporting would compare to machine collection.
  5. I wonder why it seems desirable to accept machine reports, but not human reports of the same behavior.
Blue Rasberry (talk) 15:17, 25 June 2016 (UTC)Reply
1. Yes 2. Perhaps 3. Men don't like women telling them that they are using harassing language. The theory is that they might accept it better from a machine. 4. Interesting idea 5. See answer to 3. --WiseWoman (talk) 20:03, 15 July 2016 (UTC)Reply

Re (2) is it possible to prevent abuse of human reporting systems? EllenCT (talk) 20:32, 15 July 2016 (UTC)Reply

1, 2, 3ː the data seems to indicate that only between 10 and 20 percent of personal attacks (depending on what threshold you choose for personal attacks) have a warning or blocking response within a week. --iislucas (talk)
5ː I think human reporting is also valuable, and even more so if cross checked to create unbiased high quality machine models. --iislucas (talk)
This bug-laden tool will be the source of many jokes before it is ultimately abandoned as unworkable. I certainly hope the WMF isn't sinking very much money into it. — Aggressive 0.03, Friendly 0.05. Machine censorship of potty words has been around for 15 or 20 years, good luck reinventing the wheel. —Aggressive 0.01, Friendly 0.08. More friendly advice for you in the Wikipediocracy thread... Carrite (talk) 01:01, 21 September 2016 (UTC)Reply

Why am I reminded of the Morality Box in Demolition Man? How many credits will we get fined for writing Ach du Scheiße? Grüße vom Sänger ♫(Reden) 12:53, 17 February 2017 (UTC)Reply



Apparently the tool works just by scoring short sequences of letters. This does seem a simple-minded approach: the WMF participants in its development will no doubt be aware of the impressive performance over years of the vandalism detector ClueBot NG, so why not adapt ClueBot's basic neural net methodology for this somewhat similar purpose of detecting incivility? Noyster (talk) 19:51, 21 September 2016 (UTC)Reply

The tool did use a neural net, specifically a many-layered perceptron. There are difficulties with the scoring- a clearly obnoxious comment ranking at 97% can be brought down to 10% by the addition of one paragraph of neutral text, then to 3% by another. Rich Farmbrough 17:16 26 June 2017 (GMT).



While the researchers may release the metadata under any license they choose, the actual text of the comments is released under the appropriate license prevailing when it was made (GFDL or CC-by-SA 3.0 and later I believe).

Rich Farmbrough 17:12 26 June 2017 (GMT).

UX tweak in labeling interface


I was having a great time labeling edits damaging/good faith; very cool workflow.

One problem is that if I want to check the current article version and click on the blue article title, it reloads the same tab and you lose progress in your workset.

Instead, it should default to open in a new tab. This will help with verification and accuracy and maintain users in their workflow.

Thanks and great work! Jake Ocaasi (WMF) (talk) 21:07, 4 July 2017 (UTC)Reply

Ocaasi (WMF) are you referring to the Wiki labels interface? If so, I believe the best places to provide design feedback are the WikiLables Phabricator board or the issues page on GitHub. Cheers, Jmorgan (WMF) (talk) 17:47, 10 July 2017 (UTC)Reply


  • That writing was fucking great. ---> Scores 0.99 on aggression and 0.98 on attack.
  • Your sheer incompetency is mind-boggling. Go on. ---> Scores 0.28 on aggression and 0.25 on attack.

  • To be honest, this "research proposal" just looks like someone with an axe to grind rather than someone with a genuine concern, and I wouldn't be surprised if one of the long-term cranks who occasionally try to piggyback their pet grudges onto legitimate discussions about gender coverage is somehow involved ---> Scores 0.01 on aggression as well as attack.
  • An arbitration caase would be less of a shitsshow, but you haven't got a good enough casus belli to get ArbCom to accept the case. --> Scores 0.22 on attack and 0.15 on aggression.

  • Jimmy may well have sensible people whispering in one ear, sadly it's the people who are whispering in his other ear that are the root cause of this smallish clusterfuck (and pretty much all the other clusterfucks Jimmy has managed to generate) --> Scores 0.43 on aggression and 0.46 on attack.
  • Hey, I wrote some stupid shit 20 years ago, too. The stuff I did was even worse. --> Scores 0.99 on aggression and 0.95 on attack

-- Winged Blades of Godric (talk) 15:34, 17 February 2019 (UTC)Reply

  • I tried it out and got similar responses to WBG. The tool picks up swear words, but not the context. And it doesn't respond to typical harassment phrases such as "stop - you don't know what you're doing" and "I think you have mental health issues" which it only counts as 15% and 12% harassment. For the tool to become useful it would need the committed input of the enwiki community. People who are experienced in typical harassment on Wikipedia who can suggest phrases. The problem, though, is often not the phrase because "I see you're editing the Foo article again" can be positive ("and I'm pleased you're back, we need your help" 1% aggressive) or negative ("and you're messing it up as usual" 6% aggressive). Harassment is felt when somebody feels hounded. So it may not be the phrase itself, it may be how many times somebody is contacted over their good faith editing. And the tool is unable to tell the difference between "You've been given several warnings by four different admins, yet you are still making gross errors in BLP articles. I have no option but to block you." (50% aggressive) and "I've warned you about messing with my article. You are incompetent and shouldn't be editing Wikipedia." (23% aggressive) However, having a tool that puts up alerts that admins could look into might be useful. SilkTork (talk) 14:51, 22 June 2019 (UTC)Reply

Who is the Foundation contact for this project?


The main page for the project lists User:Ewulczyn (WMF) as the Foundation contact. That account was globally locked on 4th May 2017, and the userpage says he no longer works for the Foundation. Who is the Foundation contact now? DuncanHill (talk) 17:07, 22 June 2019 (UTC)Reply

Tool deleted

Tracked in Phabricator:
Task T226427

The tool has now been deleted. I am concerned that papers based on it may still be being circulated, and its finding relied upon. As has been shewn it produced results that not only failed the "Scunthorpe" test, but were homophobic and racist. I would appreciate some feedback about withdrawing anything based on it. DuncanHill (talk) 13:53, 25 June 2019 (UTC)Reply

See also this discussion on en-wiki about some of the unacceptable results the tool was producing. DuncanHill (talk) 10:22, 26 June 2019 (UTC)Reply
I'll copy here the gist, the proof that the classifications were homophobic: «"I am gay" scores as 90% attack, 90% aggressive. "I am straight" as 0% attack, 1% aggressive.»
WMF took it down. Nemo 10:13, 11 November 2022 (UTC)Reply

Media coverage


Mistakes similar to Detox are becoming more common and the press coverage can be quite harsh. Nemo 10:05, 11 November 2022 (UTC)Reply

Return to "Detox" page.