Research:Revision scoring as a service/Archived

21:23, 23 August 2014 (UTC)
Duration:  2014- – ??
This page documents a completed research project.

This page has been moved.
The text of this page has been moved to: mw:Wikimedia Scoring Platform team.
If you found this page by following a link, please go back and update it, or notify an admin. This research project has graduated to a team within the Technology Department at the Wikimedia Foundation. Please see mw:Wikimedia Scoring Platform team for up to date information.

The following information is kept for historical reasons.

Many of Wikipedia's most powerful tools rely on machine classification of edit quality. Regrettably, few of these tools publish a public API for consuming the scores they generate – and those are only available for English Wikipedia. In this project, we'll construct a public queryable API of machine classified scores for revisions. It's our belief that by providing such a service, we would make it much easier to build new powerful wiki tools and extend current tools to new wikis.

Contact us


We're an open project team. There are many ways you can get in contact with us.



Current tools


English Wikipedia has a lot of machine learning classifiers applied to individual edits for the purposes of quality control:

  • ClueBot NG is a powerful counter-vandalism bot that uses Bayesian language learning machine classification.
  • Huggle triages edits for human review, in a local fashion, based on extremely simple metadata
  • STiki calculates its own vandalism probabilities using metadata, consumes those of ClueBot NG, and makes both available as "queues" for human review in a GUI tool.

Availability of scores


All of these tools rely on machine generated revision quality scores -- yet obtaining such scores is not trivial in most cases. STiki is the only system that provides a queryable API to its internal scorings. ClueBot NG provides an IRC feed of its scores, but not a querying interface. The only one of these tools that runs outside of English Wikipedia is Huggle, but Huggle produces no feed or querying service for its internal scores.

Importance of scores


This lack of a general, accessible revision quality scoring service is a hindrance to the development of new tools and the expansion of current tools to non-English Wikis. For example, Snuggle takes advantage of STiki's web API to perform its own detection of good-faith newcomers. Porting a system like Snuggle to non-English wikis would require a similar queryable source of revision quality scores.

Scoring as a service


We can do better. In this project, we'll develop and deploy a general query scoring service that would provide access to quality scoring algorithms and pre-generated models via a web API. We feel that the presence of such a system would allow new tools to be developed and current tools to be ported to new projects more easily.





The system has four scoring models:

  • Reverted: This model is automatically trained based on reverted/non-reverted edits.
  • Damaging: This model predicts whether an edit is damaging or not. It's trained on user-labelled damaging edits and is more accurate than the reverted model.
  • Good faith: This model predicts whether an edit was done in good faith or not.
  • wp10: This model rates an article based on wp10 rating scale.

Objective revision evaluation service (ORES)

Main article: ORES

The primary way that wiki tool developers will take advantage of this project is via a restful web service and scoring system we call ORES. ORES provides a web service that will generate scores for revisions on request. For example, asks for the score of the "reverted" model for revision #34854258 in English Wikipedia.

Revscoring library


To support ORES and to enable python developers who would rather apply revision scoring models directly, we provide a high quality python library with some key features to make the construction of new, powerful scoring strategies easy.

Wiki labels


Main article: Wiki labels

Most models will need to be trained on a per-language/wiki basis. If a new wiki-language community wants to have access to scores, we'd ask them to provide us with a random sample of labeled revisions from which we can train/test new models. To make this work easier, we are constructing a human computation interface to make this type of data easy to crowd-source. Since this is a common problem, we're keeping an eye on generalizability of the system to a wide range of hand-coding/labeling problems.



Tools that use ORES


Other possible uses


See also