Automated detection of Wikipedia misconduct!, 15-minute video presentation of the research paper!
This page documents a completed research project.

Automatic Detection of Online Abuse was a 2018-19 English-Wikipedia research project in the School of Data Science at the University of Virginia. In the project student researchers considered 1 million user accounts in English Wikipedia which administrators had blocked for misconduct. Using machine learning, the researchers identified patterns of misconduct in those account, then developed a predictive model for identifying those misconduct patterns in any Wikipedia account. In doing so, this project modeled an automated system for identifying misconduct to supplement human labor and review.



Wikimedia projects are community-based publishing platforms including Wikipedia the encyclopedia, Wikimedia Commons the media repository, and Wikidata the structured data collection. Typical users engage constructively on the platforms in collaboration with other users to advance the mission of sharing knowledge for free with digital publishing.

When users misbehave there is a Wikimedia process by means of which they might be blocked, which means that their account loses the ability to post to Wikimedia projects. Prior to getting blocked, users engage in some automatically recorded behavior in Wikimedia projects, such as a hostile interaction, spamming, or history of conflict. Human evaluation of that behavior confirms a blockable offense so a block may be enacted.

Relying on human evaluation works in some ways but is not a solution which scales with the growth of Wikimedia projects. Problems with human evaluation include lack of consistency in the application of blocks, failures to identify bad behavior in the environment, bias to overlook bad behavior in some places but not others, and an overall inability to characterize the scope of the problem of blockable offenses.

Research Objective


To this end, the aim of this project is to empower the Wikimedia foundation by enhancing their Trust and Safety processes using a data-driven approach. This will enable a safe and secure environment for their community users to engage in an open exchange of ideas and resources. The goal is to identify the relevant data sources and develop ETL techniques (extract, transform, and load) to process the data for model building purposes. The data sourced will then be analyzed to understand user behavior and the blocking ecosystem in the Wikipedia platform. We use the data to study historical user account blocks on the basis of different types of user behavior and activity and identify trends and patterns in user behavior for blocked versus non-blocked users. In this analysis of problematic users, we consider the circumstances before the block, including the text of messages, posted, Wikimedia content editing such possibly posting spam links, and interaction with other users. We further build upon this analysis by developing two models by leveraging natural language processing along with machine learning algorithms. To that extent, a tool or a model will be developed that does the following :

  1. Detects and flags abusive content on the English Wikipedia Platform. The model will generate a "toxicity score" for each edit (or comment) made a user that will assess the abusive nature of each edit.
  2. Assess the probability of user accounts which might merit blocks in an attempt to flag accounts which might be indulging in problematic behavior. This will pave the way for the early detection of problematic users in the community.

The desired expectation is that this tool will enhance the Wikimedia Foundation’s existing Trust & Safety processes and serve as a preemptive measure in combating harassment online. With an increased accuracy in automatic harassment detection and the ability to flag problematic users in advance, the tool will assist human Wikimedia administrators in objectively issuing blocks to user accounts. The broader Wikimedia community will continue to confirm the correctness of the recommended blocks when issued. Overall, this mechanism should help resolve conflicts in a shorter time to effectively protect the community users.



Anyone may participate in Wikimedia projects by publishing content and socializing for the purpose of developing content. Wikimedia projects have various rules for governing civility in social interactions. When the Wikimedia community governance process determines that a user's activities are disturbing the peace of community spaces users may take actions, with the most extreme response in Wikimedia projects being a block or ban from a user participating in the projects.

The Wikimedia Foundation and Wikimedia community have sought to address harassment in various ways. Responses include the Wikimedia community's own responses, the Wikimedia Foundation's Community health initiative, and the efforts described on the Wikimedia Foundation blog.

A 2017 partnership between Google and the Wikimedia Foundation produced the single highest profile analysis of harassment in Wikimedia projects.

For general context to apply machine learning to Wikipedia consider this work:

Technical Approach


This project seeks to develop an objective data-driven approach to predict when a particular user account has indulged in the sort of activities that can result in an account getting blocked. The training data for this research will include a list of user accounts which have received a block, along with a record of all of their activities. Python will be leveraged as the primary language for the analysis and model building process. All the data is made publicly available by Wikipedia in form of XML dumps and Database tables. Once the data is gained access to, extensive pre-processing will be applied to the data to improve its quality. In the pre-processing module, text processing techniques such as WordNet corpus and spell-correction algorithms will be applied. In order to build a model that classifies user accounts as a block or not, the model would consider circumstances before the enacted block, including the text of messages posted, a user’s Wikimedia content edits such as posting spam links, interaction with other community users, etc. To understand these dynamics of what constitutes an account block, techniques of feature extraction will be employed onto every post associated with each account to measure different aspects of the user activity. This would include local feature extraction using text mining techniques such as TF-IDF to filter out unimportant words and reduce the dimensionality of feature space as well as sentiment analysis techniques such as N-grams, Bag-of-Words, etc. to detect the use of offensive words. This project will explore various techniques and methods of classification such as logistic regression, neural networks (CNN, RNN), SVM, etc. The core methodology in building a model to classify an account as a block or not would be to first train the model using the data corresponding to labelled account blocks, talk page comments, and their corresponding features using various modeling methods, and then choose a model after testing each of them by performing cross-validation and comparing model metrics such as R-square, ROC, AUC, etc. The finalized model will be leveraged onto the test dataset. As the dataset is time series in nature (each activity associated with a timestamp), a probability “at-risk” score will be calculated for each user using the n-nearest user comments/edits. Two reasonable thresholds over this score will then be chosen which will determine the classification of a user account as - a confirmed block, an “at risk” account or a normal account.


Monday 27 August 2018
Students select research projects from an available pool
Monday 3 September 2018
Confirmation of match between students and projects
Friday 28 September 2018
Proposal presentation
May 2019
Project ends




Automatic Detection of Online Abuse in Wikipedia
Automated detection of Wikipedia misconduct!
SIEDS 2019 Poster
  1. Research Proposal
  2. Quarterly Research Progress Presentations
  3. Data Product - A model that detects abusive content in the Wikipedia user community and a model that predicts and flags users who are at risk of getting blocked in the future.
  4. Technical Paper published in the IEEE SIEDS 2019 journal
  5. Research Poster for IEEE SIEDS 2019
  6. Presentation of research at the IEEE SIEDS Conference in Charlottesville, Virginia
  7. Presentation of research at the Applied Machine Learning Conference,TomTom Summit 2019
  8. Other media
  9. Research Paper - File:Automatic Detection of Online Abuse and Analysis of Problematic Users in Wikipedia preprint.pdf
  10. Paper Detailing Ethical Implications of this research
  11. Data and Code Artefacts
  12. Powerpoint Presentation - Automatic Detection of Online Abuse in Wikipedia
  13. Video presentation Automated detection of Wikipedia misconduct!
  14. bluerasberry (31 July 2019). "Most influential medical journals; detecting pages to protect". The Signpost. 



Research Team


The research team working on this project is comprised of three students - Charu Rawat, Arnab Sarkar, and Sameer Singh. All three students graduated with a master’s degree in Data Science from Data Science Institute at the University of Virginia. The students worked independently on completing specific aspects of the project as well as collaborate with each other to work as a group to develop and produce the project deliverables. All three students in this team brought a diverse range of skills and ideas to the table, all of which contributed towards achieving the goals outlined for this project.

Awards and Honours


This research was presented at the IEEE SIEDS 2019 Conference, Charlottesville, USA on 26 April 2019, where it won the Best Paper Award in the "Data Science for Society Track". The paper was also published in the IEEE SIEDS Journal on 13 June 2019. The research was also selected to be presented at the Applied Machine Learning Conference, Tom Tom Festival 2019 on 11 April 2019.



Automatic Detection of Online Abuse/notes

  1. Special:Blocklist does not function correctly and hasn't since 2014. Garbage In = Garbage Out. See T199174 at phabricator for details.