Research:Wikipedia Edit Types/Content Maintenance

Tracked in Phabricator:
Task T334760

Content creation is often viewed as synonymous with edits on Wikipedia but in fact many (productive) edits do not add new facts but instead aim to maintain and curate the existing content. Automatically distinguishing between these different high-level actions (or intents) is not easy, making it difficult to fully quantify and account for the amount of work that must go into the maintaining of Wikipedia content. This project seeks to develop methods for assigning the extracted edit types to these high-level categories of contribution types. This will enable us to better understand the balance of work on Wikipedia and identify gaps and challenges to maintaining content.

Many taxonomies of these contribution types that touch on the difference between content generation and other forms of edits have been created (see for example this list of related work). Guiding the taxonomy used in this project are a few goals:

  • Simplicity: we will stick to a relatively simple taxonomy so that we might reasonably map (rule-based) the edit types these contribution types without training complex models. Future work may explore richer taxonomies of contribution types or editor intentions as in Yang et al.[1]
  • Coverage: the taxonomy should reasonably cover any edit to Wikipedia. It does not need to cover log actions however.
  • Creation vs. Maintenance: the main goal of the taxonomy is to put data to the question of how much edit activity is content creation as opposed to the many tasks that are required to support that content. We also consider complementary taxonomies, however, such as those related to the size, perceived difficulty, or impact of a given edit.

Contribution Types edit

The high-level set of contribution types considered in this work are:

  • Content creation: adding new facts to a Wikipedia article as through new parameters in a template or new sentences.
  • Content annotation: creating linkages through Wikimedia to help curate or add metadata to the article.
  • Content maintenance: reworking existing content without adding new info.
  • Vandalism / Patrolling: edits that are part of reverts and so don't affect the page content.

Note that contribution types are not exclusive -- e.g., an edit could add a new sentence (creation) and a category (annotation).

Initial Results edit

See task T334760 for some initial results of applying these taxonomies to a subset of edits from French Wikipedia. Some high-level takeaways:

  • Content generation only happens in about 20% of edits.
  • Less than half of edits change the text of the article, ranging from 40% of edits by experienced editors (100+ edits) to 60% of edits by IP editors and newcomers (<10 edits).
  • Content generation is more common on desktop than mobile interfaces.

References edit

  1. Yang, Diyi; Halfaker, Aaron; Kraut, Robert; Hovy, Eduard (2017). "Identifying Semantic Edit Intentions from Revisions in Wikipedia" (PDF). aclweb.org: 2000-2010. doi:10.18653/v1/D17-1213. Retrieved 15 October 2021.