Research:Detox/Resources
Detox Specific Resources
editRelevant Wikipedia Policies
editCommunity Discussion/ Proposals on Talk Page Abuse and Toxicity
editWMF Projects / Discussion on Harassment
editData Sources
editWikimedia Sources
edit- Administrators Board
- Users Blocked for Harassment (query)
- Edit Filter (about, rules e.g. 294, 478)
- deleted or suppressed talk page comments (very promising)
- Wikilabels (very promising)
External Publicly Available Sources
edit- Stanford Politeness Corpus
- MPQA A corpus with annotations for private states (e.g. beliefs, emotions, sentiments, speculations, etc.)
- Kaggle competition to detect insults (very promising)
- Internet Argument Corpus A set of 390,704 posts in 11,800 discussions extracted from the online debate site 4forums.com. Includes: degrees of agreement with a previous post, cordiality, audiencedirection, combativeness, assertiveness, emotionality of argumentation, and sarcasm.
- Alignment and Authority in Wikipedia Discussions (AAWD) Corpus A set of English, Russian, and Mandarin Wikipedia talkpage threads annotated for agreement/disagreement and other social cues.
Other Potential Sources
edit- Contact "League of Legends" team for training corpus
- Crowd-source using CrowdFlower, Mechanical Turk, or similar.
Related Work
edit- Abuse in Online Games
- A Computational Approach to Politeness
- A Sentiment Analysis Approach for Online Dispute Detection
- Antisocial Behavior in Online Discussion Communities
- How Community Feedback Shapes User Behavior
- Online Harassment Resource Guide
- Like trainer, like bot? Inheritance of bias in algorithmic content moderation (research with the data from the Detox project)
Talk Page Parsing Utilities
editFor distilled notes on some of the above resources, click here.