Research:NLP to reduce false positives in vandalism prediction

08:18, 15 March 2017 (UTC)
Duration:  2017-05 – 2017-07

This page is an incomplete draft of a research project.
Information is incomplete and is likely to change substantially before the project starts.

After the Objective Revision Evaluation Service (ORES) has been introduced to Wikipedia, the detection of bad-faith and damaging edits became more efficient. The accuracy of ORES is higher than the accuracy of Huggle and STiki, the previous generation of Wikipedia's counter-vandalism automated tools. Still, a sizable amount of false positives - sloppy edits made in good faith - is being mistakenly labeled by ORES as made in bad faith. One of the reason is that current machine learning models weight strongly against anonymous editors. Advanced natural language processing (NLP) techniques may help improve accuracy and reduce false positives.

Reducing false positives is very important for growing the Wikipedia editors community. As previous research has shown, a big share of edits that may appear to an algorithm as vandalism is in fact just some first tries of the recently joined editors. Discouraging them by labeling and disparaging their faithful input negatively impacts new editors survival rate. The output of this work would be a new NLP module for a current version of ORES which would account for language patterns specific for intentionally misleading edits and therefore improve vandalism prediction.

Methods edit

We plan to invest in advanced NLP strategies aiming at improving the accuracy of the labeling. Potential directions of the research include:

  • Syntactic validation. Very often, the syntactic structure of damaging bad-faith edits is broken, which could be a good feature for detecting them. To do so we will parse edited sentences with the syntactic parsers of NLP libraries such as NLTK and SpaCy, and detect glaringly malformed sentences.
  • Detecting out of contexts tokens. Many damaging bad-faith edits are out of context. This type of insertions may be detected with classifiers based on context-aware statistical language models (Naive Bayes, Recurrent Neural Networks, etc.). Another possibility is employing information theoretical analysis, i.e. measuring the mutual information of two revisions (before and after edit).


We'll evaluate our work in many ways, but we will specifically focus on reducing false positives for at-risk populations, specifically anonymous editors and newcomers. We will establish if advancing the signal we get from NLP techniques can help mitigate the negative effects that the current machine prediction of vandalism has on these populations.

Expected Impact

Results from this work:

  • will directly impact the ability of people to contribute to Wikipedia without unnecessary scrutiny and prejudice
  • will be publishable in venues that are interested in work patterns in Wikipedia and sensitive to issues of fairness and justice

Timeline edit

Please provide in this section a short timeline with the main milestones and deliverables (if any) for this project.

Policy, Ethics and Human Subjects Research edit

It's very important that researchers do not disrupt Wikipedians' work. Please add to this section any consideration relevant to ethical implications of your project or references to Wikimedia policies, if applicable. If your study has been approved by an ethical committee or an institutional review board (IRB), please quote the corresponding reference and date of approval.

Results edit

Once your study completes, describe the results and their implications here. Don't forget to make status=complete above when you are done.

References edit