Research:Revert
A revert is a type of edit which removes the effects of a previous edit. This action typically results in the article being restored to a version that existed sometime previously. A partial revert involves reversing only part of a prior edit, while retaining other parts of it. In Wikipedia, reverts are commonly used to remove inappropriate changes to articles. This page describes types of reverting actions and revert detection methods.
Types of reverting actions
editIdentity revert
editAn identity revert is an edit to an article that creates a new revision that exactly matches a previous revision -- removing the changes made by any intervening edits. According to work by Kittur et al. [1] and Flöck et al. [2] , this is the most common type of revert. Yasseri et al. have used this definition to devise a measure of controversy [3].
For example, in the sequence of revisions below (1-3), revision #3 reverts revision #2 by creating an exact copy of revision #1:
- "This is an article"
- "This is not an article"
- "This is an article"
In this case, revision #3 is referred to as the reverting revision, revision #2 is the reverted revision and #1 is the reverted-to revision.
Full revert plus changes
editAnother potential editing pattern is where an editor chooses to restore an old revision, similar to an identity revert above, but before saving the old version, the editor makes another change. Thus, the intermediate edit's contributions are fully reverted, but no identical revision is produced.
For example:
- "This is an article"
- "This is not an article"
- "This is an encyclopedia article"
Although revision #3 removed all changes made by revision #2 (removing "not"), it also adds the word "encyclopedia". In this case, it is clear that revision #3 is a reverting revision, revision #2 is a reverted revision, but revision #1 wasn't exactly reverted-to.
Partial revert
editA partial revert refers to an edit that removes some part of a change made by another revision, but not the entire change.
For example:
- "This is an article"
- "This is not an encyclopedia article"
- "This is an encyclopedia article"
In this case, revision #2 makes two changes by adding the words "not" and "encyclopedia" to the article. Revisions #3 only removes the word "not" from the article. In this example, revision #3 is a partially reverting revision, revision #2 was partially reverted, but revision #1 wasn't exactly reverted-to.
Detection
editEdit tags
editThe MediaWiki software automatically adds edit tags to reverts performed using the "undo" (examples) and "rollback" (examples) features. In 2020, an additional "manual revert" tag was introduced to detect reverts not done through either of these two features (examples), see phab:T256001 for details. Like any edit tag, these can be also be queried in the change tag database table (for Wikimedia projects, see also Research:Data regarding database access).
Researchers have earlier developed various methods for detecting reverts:
Identity revert via checksum with history
editResearch by Kittur et al. suggests that 94% of reverts can be detected by matching MD5 checksums of revision content historically[1]. Yasseri et al. detected some 5 million reverts using this method in their study of edit wars.[3]
Python code
|
---|
revisions = [] #result of http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=User:EpochFail&rvprop=ids|sha1&format=jsonfm&rvlimit=500
class History:
'''
A datastructure for efficiently storing and retrieving a
limited number of historical records
'''
def __init__(self, maxlen=15):
'''Maxlen specifies the maximum amount of history to keep'''
self.maxlen=maxlen
self.d = {} #Dictionary to allow fast lookup based on keys
self.l = [] #List to preserve order for history
def add(self, key, value):
'''Adds a new key-value pair. Returns any discarded values.'''
self.l.append((key, value))
sublist = self.d.get(key, [])
sublist.append(value)
self.d[key] = sublist
if len(self.l) > self.maxlen:
okey, ovalue = self.l.pop(0)
self.d[okey].pop(0)
if len(self.d[okey]) == 0: del self.d[okey]
return ovalue
def __contains__(self, key):
'''Checks if the key is contained in the history using the "in" keyword'''
return key in self.d
def get(self, key):
'''Gets the most recently added value for a key'''
return self.d[key][-1]
def upTo(self, key):
'''Gets the recently inserted values up to a key'''
for okey, ovalue in reversed(self.l):
if okey == key: break
else: yield ovalue
history = History(15) #History capped at 15 revisions (common practice)
for rev in revisions:
if rev['sha1'] in history: #Identity revision found in history
reverted = list(history.upTo(rev['sha1']))
if len(reverted) > 0: #Found reverted revisions
print "reverting: %s, reverted: %s, reverted-to: %s" % (
rev['revid'],
[r['revid'] for r in reverted],
history.get(rev['sha1'])['revid']
)
else: #noop -- same checksum as last revision
pass
history.add(rev['sha1'], rev)
|
Revert patterns
editRevert patterns in the edit history can be used to identify whether a revert did indeed remove inappropriate changes to articles. Research by Kiesel et al. [4] suggests that 6% of all identity reverts are so-called pseudo reverts (to a blank page or to the previous revision). They also analyzed cases for which it is unclear whether inappropriate changes were removed. For example if someone reverts their own work (9%) or edit wars (11%).
Full and partial revert detection via diffing
editDiffing-based strategies for partial revert & full revert with changes detection have been developed [2]. These strategies increase accuracy of revert detection at the cost of performance (due to the computational complexity of difference detection). While Kittur et al.'s work suggests that only 6% of reverts are not identifiable via identity match[1], more recent work by Flöck et al. suggests that this method identifies 12% more reverts ("full reverts plus changes" ) than the checksum method alone and can on top identify partial reverts that are not detectable with full-revision-checksum approaches.[2]
See [2] for Python code demonstrating such a strategy.
Cutoffs for time to revert and edit radius
editSince it is theoretically possible that a revision could be reverted years after it was originally saved, observations taken at any time would truncate any future reverts (specifically, they would be right censored). In order to minimize this issue and compare editors' contributions fairly, it's necessary to choose a cutoff time and count only reverts within occurred within that period after the original edit. 48 hours is a common cutoff, as research suggests that, at least for the English Wikipedia, nearly all reverts take place within 48 hours.[5] Furthermore, it is common practice to also only count reverts that happen within a certain number of subsequent edits on the same page (in the mwreverts package, this is called the "radius", with a default value of 15, but this example and other analyses based on it have been using a lower value of 5).
Edit tag for reverted edits
editIn 2020, an edit tag marking reverted edits - as opposed to reverting edits - was introduced in MediaWiki (examples), see phab:T254074 for details.
Data sources
editMore datasets are always in demand for revert identification. wikitech:Analytics/Data Lake/Edits/Mediawiki history has fields such as "revision_is_identity_reverted".
About 5 million identity reverts detected by analysing the text of the articles are provided by Yasseri et al. here.
See also
edit- Research:Content persistence, in some senses a generalization
- mw:Manual:Reverts ("This page serves as an overview of various types of reverts supported and recognized by MediaWiki, mainly for developers.")
- The "Global revert rate" (across all Wikimedia projects) is one of the Wikimedia Foundation's core metrics, updated monthly under mw:Wikimedia_Product#Metrics (6.7% as of January 2020).
- Wikistats reports on article revert trends (2010)
- Revert counts per Wikipedia project (2016)
- Edit & revert trends for top 50 Wikipedias (2016)
- en:Wikipedia:Reverting
- mwreverts Python library providing a set of utilities for detecting reverts (see also notebook demonstrating its usage on PAWS)
- phab:T152434 "Add method to Revision to check if it was a Revert, and whether an edit was Reverted"
- phab:T216297 "Develop method for identifying reverts in EventBus data"
- Search for coverage of research results about reverts in the archives of the Wikimedia Research Newsletter (2011-)
- Reverted contribution rate Quarry query
References
edit- ↑ a b c Aniket Kittur, Bongwon Suh, Bryan A. Pendleton & Ed H. Chi (2007). He says, she says: conflict and coordination in Wikipedia. In Proceedings of the CHI'07. 453-462. DOI=10.1145/1240624.1240698 PDF
- ↑ a b c Fabian Flöck, Denny Vrandecic and Elena Simperl. Reverts Revisited – Accurate Revert Detection in Wikipedia. HT’12, June 25–28, 2012, Milwaukee, Wisconsin, USA. PDF
- ↑ a b Yasseri, T., Sumi, R., Rung, A., Kornai, A., & Kertész, J. Plos One (2012). Dynamics of conflicts in Wikipedia. Plos ONE. DOI=10.1371/journal.pone.0038869
- ↑ Johannes Kiesel, Martin Potthast, Matthias Hagen & Benno Stein (2017). Spatio-temporal Analysis of Reverted Wikipedia Edits. In Proceedings of the ICWSM'17. 122-131. [1]
- ↑ R. Stuart Geiger & Aaron Halfaker. (2013). When the Levee Breaks: Without Bots, What Happens to Wikipedia's Quality Control Processes? WikiSym. [ pdf ]
Further reading
edit- Fabian Flöck, Denny Vrandecic and Elena Simperl. Reverts Revisited – Accurate Revert Detection in Wikipedia. HT’12, June 25–28, 2012, Milwaukee, Wisconsin, USA. (The paper discusses different revert types and also contains an overview of the "state-of-the-art in revert detection". Review in the May 2012 Wikimedia Research Newsletter: "New algorithm provides better revert detection")
- Ekstrand, Michael D.; Riedl, John T. (2009-10-25). "rv you're dumb: identifying discarded work in Wiki article history" (PDF). Proceedings of the 5th International Symposium on Wikis and Open Collaboration. WikiSym '09 (Orlando, Florida: Association for Computing Machinery): 1–10. ISBN 978-1-60558-730-1. doi:10.1145/1641309.1641317.
- Sumi, R., & Yasseri, T. (2011, October). Edit wars in Wikipedia. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing (pp. 724-727). IEEE.