Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 12 • Issue: 12 • December 2022 [contribute] [archives]

Graham's Hierarchy of Disagreement in talk page disputes.

By: Tilman Bayer

"How to disagree well: Investigating the dispute tactics used on Wikipedia"Edit

Graham's hierarchy of disagreementt

This paper,[1] presented earlier this month at the Empirical Methods in Natural Language Processing conference, applies a modified version of Graham's hierarchy of disagreement to classify talk page comments on the English Wikipedia. As explained by the authors:

"[English] Wikipedia recommends the hierarchy of disagreement formulated by Graham (2008) as a guide for constructive dispute resolution [in the w:Wikipedia:Dispute resolution policy]. Graham’s hierarchy posits that there are seven levels of disagreement, ranging from namecalling (at the bottom) to refuting the central point. [...] Despite its popularity, this hierarchy has not been verified empirically."

The authors call these "rebuttal tactics", and distinguish them from a second category of dispute tactics, "attempts to promote understanding and consensus (referred to as coordination tactics)." Coordination tactics are classified with a separate set of "non-disagreement labels" which is combined from comment types identified in several previous research publications about Wikipedia talk pages (e.g. a paper by Ferschke et al. that was summarized in our March 2012 issue: "Understanding collaboration-related dialog in Simple English Wikipedia"):

  • "Bailing out" ("An indication that an editor is giving up on a conversation and will no longer engage.")
  • "Contextualisation" (where "an editor 'sets the stage; by describing which aspect of the article they are challenging. This does not directly disagree with anyone")
  • "Asking questions"
  • "Providing clarification"
  • "Suggesting a compromise"
  • "Coordinating edits" to the article page ("This can signal that a compromise has been found.")
  • "Conceding / recanting"
  • "I don’t know" (i.e. "Admitting that one is uncertain. This signals that an editor is receptive to the idea that there are unknowns which may impact their argument.")
  • "Other"

The authors provide a dataset "of 213 disputes (comprising 3,865 utterances) on Wikipedia Talk pages, manually annotated with the dispute tactics employed in the process of resolving a disagreement between editors", allowing multiple labels for each comment ("up to three rebuttal strategies and two resolution strategies per utterance", see examples below).

These discussions are drawn from the authors' own "WikiDisputes" dataset, which provides information "which is annotated according to whether the dispute was resolved without the need for a moderator." This allows the researchers to identify relations between specific dispute tactics and the risk of a conversation escalating. For example, they

find that a lower mean rebuttal level in a disagreement is correlated with less constructive dispute resolutions, providing empirical validation of the ordering proposed by Graham (2008) and recommended by Wikipedia to its editors.

In particular, they examine the effect of personal attacks, finding e.g. that conversations can still recover after a personal attack happens:

"We define recovery in terms of having an utterance labeled as rebuttal level 5 or higher and no further personal attacks. By this definition, half of the disputes were found to recover after a personal attack, indicating that personal attacks do not necessarily result in conversational failure."


"Of the escalated disputes with personal attacks, only 44.3% are found to recover, whereas 59.2% of resolved disputes recover post attack. This indicates that although personal attacks also occur in non-escalated disputes, participants are better adept at moving beyond them. We further find that immediate retaliation (i.e. a personal attack being followed by another personal attack) occurred in 25.7% of cases. In disputes where at least one personal attack had occurred, the probability that the initial offender will re-offend in the same conversation is 53%, while the probability of another user using a personal attack at some point subsequently is 64%."

The study proceeds to use machine learning for automatically classifying talk page comments with these multi-labels. A BERT-based model performed best (according to three different performance metrics), but still struggled with some of the labels:

"The label most frequently correctly predicted is coordinating edits (111 of 137 cases), which is also the most common label in the training set. The next most correctly predicted label, proportionally, is contextualisation (75%, or 24 of 32 cases), despite not being a commonly used label. This is likely due to the additional positional information available to the model, since this label is often applied to the first utterance in a conversation. On the other hand, refutation and refuting the central point are never correctly predicted (out of 44 cases), with counterargument often mistakenly predicted instead."

Lastly, they apply this to the separate task of predicting whether a conversation will escalate, already examined in their earlier paper that gave rise to the "WikiDisputes" dataset. Namely, they use "multitask training with escalation as the main task and tactics as the auxiliary task, such that the features that are predictive of dispute tactics are incorporated in the escalation predictions." This improves upon their earlier prediction algorithm, "indicating that knowledge of these dispute tactics is useful for tasks beyond classifying the tactics employed."

The following table (adapted from Figure 1 in the paper) shows the labeling of several comments by two different users in one talk page discussion:

"An example from the WikiTactics dataset"
Utterance Coordination tactic(s) Rebuttal tactic(s)
The community put WP:ENGVAR in place exactly because there is no rational way to resolve a style dispute like this. The notion is that if English style X is established in article, don't change it without prior consensus. Without that [policy], articles would be beset by endless edit wars over style issues that would become a time sink across the encyclopedia. Contextualisation
Hi, I am aware of WP:ENGVAR and would like to point out to you the policy says that one should "use the variety found in the first post-stub revision that introduced an identifiable variety". In the case of this article, that is "a herb", which was introduced in the original article. I will leave the current wording for a few weeks to see if anyone else decides to weigh in, and intend to then change the page to align with policy. Suggesting a compromise DH6: Refutation
It is impossible to get local consensus on this kind of thing, which is why ENGVAR exists. Leave it alone, or waste the community's time with an RfC but stop wasting your time and mine making useless arguments here. I don't care if it says "an" or "a" - what is not acceptable is messing with it. DH4: Repeated argument
DH3: Policing the discussion
I admit that when I made those edits, I didn't realise it was actually a ENGVAR issue but rather just a mistake, hence my zeal in making the changes. To emphasise: the policy exists to recanting unamIbiguously resolve these debates and for this article, it should be "a herb". I see no real arguments for the contrary, and for what it's worth, my having made policy-incorrect edits (in good faith), doesn't diminish the fact that policy is clear on this one. Conceding / recanting DH4: Repeated argument
I have warned you to walk away from being a style warrior and wasting everyone's time. You will do as you will. DH1: Ad hominem attack
No one further has weighed in on this and so I am making the change in accordance with policy, as I have done on each of the herb-related pages. Each of these articles is now in accordance with WP:ENGVAR. Please do not edit it without an RFC or DR. We are now within the spirit and letter of policy on each of these pages and I hope we can draw a line under this ridiculous matter. Coordinating edits DH3: Policing the discussion


Other recent publicationsEdit

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"Analyzing Digital Discourses: Between Convergence and Controversy"Edit

From the abstract:[2]

"This study analyses Wikipedia’s sites for negotiating convergence, conflict and identity, concentrating on two aspects. First, convergence and conflict at the macro-level of intercultural comparison are investigated using the example of the construction of concepts of nationalism, citizenship, identity and tribe in their English and German language versions. Second, the English articles serve as a basis to examine the types of convergence and conflict tendencies at the micro-level of the Talk-section."

From the paper's section on talk pages:

"[...] in our data, criticism of content (81 instances/31% of all 259 conflictual codings) is the most frequent conflictual category [...], followed by general metapragmatic criticism concerning clarity and more general stylistic features [...], metapragmatic criticism related to Wikipedia's principles (each comprising about half of the total of 81 metapragmatic tokens), or a mixture of both [...].

Giving reasons for disagreeing is the mitigating strategy used most frequently in all for Talk1-sections, followed by suggesting, inviting and hedged imperatives to induce further improvement of an article, agreement and additional explanation to clarify an issue [...]."

Discursive Perspective on Wikipedia: More than an Encyclopaedia? (book)Edit

From the publisher's description:[3]

"This book provides a concise yet comprehensive guide to Wikipedia for researchers and students of linguistics, discourse and communication studies [...]. Drawing on Herring's situational and medium factors, as well as related developments in (critical) discourse studies, the author studies the online encyclopaedia both theoretically and empirically, examining its origins, production and consumption before turning to a discussion of its societal significance and function(s). "

"What’s hot and what's not in lay psychology: Wikipedia’s most-viewed articles"Edit

From the abstract:[4]

"We studied views of articles about psychology on 10 language editions of Wikipedia from July 1, 2015, to January 6, 2021. We were most interested in what psychology topics Wikipedia users wanted to read, and how the frequency of views changed during the COVID-19 pandemic and lockdowns. [...]. We made two important observations. The first was that during the pandemic, people in most countries looked for new ways to manage their stress without resorting to external help. [...] We also found that academic topics, typically covered in university classes, experienced a substantial drop in traffic, which could be indicative of issues with remote teaching."

"Building a Public Domain Voice Database for Odia"Edit

From the abstract and paper:[5]

"The pilot detailed in this paper is about creating a large freely-licensed public repository of transcribed speech in the Odia language as such a repository was not known to be available. The strategy and methodology behind this process are based on the OpenSpeaks project [which is hosted on the English Wikiversity at ].

"The 'Methodology' section details the process of collecting words [from a dump of Odia Wikipedia], compiling a wordlist [making use of Wikidata lexeme forms to generate additional forms], recording the pronunciation of those words, and uploading the speech data to Wikimedia Commons using Lingua Libre."


  1. Kock, Christine De; Vlachos, Andreas (December 2022). How to disagree well: Investigating the dispute tactics used on Wikipedia. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 3824–3837.  Data
  2. Kleinke, Sonja; Landmann, Julia (2021). "Cross-Cultural Observations on English and German Wikipedia Entries at the Interface of Convergence and Controversy". In Johansson, Marjut; Tanskanen, Sanna-Kaisa; Chovanec, Jan. Analyzing Digital Discourses: Between Convergence and Controversy. Cham: Springer International Publishing. pp. 135–162. ISBN 9783030846022.    Google Books
  3. Kopf, Susanne (2022). A Discursive Perspective on Wikipedia: More than an Encyclopaedia?. Cham: Springer International Publishing. ISBN 9783031110238.   
  4. Ciechanowski, Kaśmir; Banasik-Jemielniak, Natalia; Jemielniak, Dariusz (2022-10-12). "What’s hot and what's not in lay psychology: Wikipedia’s most-viewed articles". Current Psychology. ISSN 1936-4733. doi:10.1007/s12144-022-03826-0. 
  5. Panigrahi, Subhashish (2022-04-25). "Building a Public Domain Voice Database for Odia" (PDF). Companion Proceedings of the Web Conference 2022. WWW '22. New York, NY, USA: Association for Computing Machinery. pp. 1331–1338. ISBN 9781450391306. doi:10.1145/3487553.3524931. 

Wikimedia Research Newsletter
Vol: 12 • Issue: 12 • December 2022
About • Subscribe: Email      [archives][Signpost edition][contribute][research index]