Research:Detox/Resources
Detox Specific ResourcesEdit
Relevant Wikipedia PoliciesEdit
Community Discussion/ Proposals on Talk Page Abuse and ToxicityEdit
WMF Projects / Discussion on HarassmentEdit
Data SourcesEdit
Wikimedia SourcesEdit
- Administrators Board
- Users Blocked for Harassment (query)
- Edit Filter (about, rules e.g. 294, 478)
- deleted or suppressed talk page comments (very promising)
- Wikilabels (very promising)
External Publicly Available SourcesEdit
- Stanford Politeness Corpus
- MPQA A corpus with annotations for private states (e.g. beliefs, emotions, sentiments, speculations, etc.)
- Kaggle competition to detect insults (very promising)
- Internet Argument Corpus A set of 390,704 posts in 11,800 discussions extracted from the online debate site 4forums.com. Includes: degrees of agreement with a previous post, cordiality, audiencedirection, combativeness, assertiveness, emotionality of argumentation, and sarcasm.
- Alignment and Authority in Wikipedia Discussions (AAWD) Corpus A set of English, Russian, and Mandarin Wikipedia talkpage threads annotated for agreement/disagreement and other social cues.
Other Potential SourcesEdit
- Contact "League of Legends" team for training corpus
- Crowd-source using CrowdFlower, Mechanical Turk, or similar.
Related WorkEdit
- Abuse in Online Games
- A Computational Approach to Politeness
- A Sentiment Analysis Approach for Online Dispute Detection
- Antisocial Behavior in Online Discussion Communities
- How Community Feedback Shapes User Behavior
- Online Harassment Resource Guide
- Like trainer, like bot? Inheritance of bias in algorithmic content moderation (research with the data from the Detox project)
Talk Page Parsing UtilitiesEdit
For distilled notes on some of the above resources, click here.