Research:Detox/Resources
Detox Specific Resources edit
Relevant Wikipedia Policies edit
Community Discussion/ Proposals on Talk Page Abuse and Toxicity edit
WMF Projects / Discussion on Harassment edit
Data Sources edit
Wikimedia Sources edit
- Administrators Board
- Users Blocked for Harassment (query)
- Edit Filter (about, rules e.g. 294, 478)
- deleted or suppressed talk page comments (very promising)
- Wikilabels (very promising)
External Publicly Available Sources edit
- Stanford Politeness Corpus
- MPQA A corpus with annotations for private states (e.g. beliefs, emotions, sentiments, speculations, etc.)
- Kaggle competition to detect insults (very promising)
- Internet Argument Corpus A set of 390,704 posts in 11,800 discussions extracted from the online debate site 4forums.com. Includes: degrees of agreement with a previous post, cordiality, audiencedirection, combativeness, assertiveness, emotionality of argumentation, and sarcasm.
- Alignment and Authority in Wikipedia Discussions (AAWD) Corpus A set of English, Russian, and Mandarin Wikipedia talkpage threads annotated for agreement/disagreement and other social cues.
Other Potential Sources edit
- Contact "League of Legends" team for training corpus
- Crowd-source using CrowdFlower, Mechanical Turk, or similar.
Related Work edit
- Abuse in Online Games
- A Computational Approach to Politeness
- A Sentiment Analysis Approach for Online Dispute Detection
- Antisocial Behavior in Online Discussion Communities
- How Community Feedback Shapes User Behavior
- Online Harassment Resource Guide
- Like trainer, like bot? Inheritance of bias in algorithmic content moderation (research with the data from the Detox project)
Talk Page Parsing Utilities edit
For distilled notes on some of the above resources, click here.