Research:Prioritization of Wikipedia Articles

Duration:  2020-06 – ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


This page encompasses a number of projects all related to the prioritization of wiki work. From a technical standpoint, the research generally focuses on three core technologies: methods for determining article importance, methods for determining article quality, and methods for building lists of Wikipedia articles to prioritize based on their importance and quality. From an ethics and governance standpoint, the research must address large questions such as how we measure and support equity within Wikipedia content and how to provide tools that are universal (can be used by all language communities) but contextual / distributed (each language community can operationalize their own topics and concepts of importance and quality).

Background edit

Ideally, all artifacts in a peer production community would be of the highest possible quality. However, all peer production communities — even the very large English Wikipedia community — have a limited number of contributors and all contributors have a limited amount of available time. Given these limitations, some artifacts necessarily will be of lower quality.

— Warncke-Wang et al.[1]

Knowledge equity: As a social movement, we will focus our efforts on the knowledge and communities that have been left out by structures of power and privilege. We will welcome people from every background to build strong and diverse communities. We will break down the social, political, and technical barriers preventing people from accessing and contributing to free knowledge.

Prioritization of content -- i.e. ranking Wikipedia articles by their importance -- is a necessary component of efforts aimed at achieving Knowledge Equity. It is an acknowledgment that additional support is required to overcome the limited resources and existence of systemic biases that arise in the absence of organized efforts to cover diverse topics. For editors and communities willing to focus on reducing knowledge gaps within Wikipedia, the Wikimedia Foundation can provide tools and guidance to help support these efforts and move us closer to Knowledge Equity.

This concept of prioritization -- and relatedly, importance -- is not new or singular to the Wikimedia Foundation but can be found in many places across the wikis: e.g., Vital Articles lists, WikiProject importance assessments, criteria for inclusion in offline wikis. It is already embedded in many of the technologies that recommend content to read or edit on the wikis. There are indications, however, that these existing approaches -- ranking content by pageviews or centrality or even randomly -- do not promote knowledge equity and that how we approach prioritization should be reevaluated in line with commitments to anti-racist, pro-inclusion technologies by the Wikimedia Foundation[2] and identifying topics for impact made by Movement Strategy.

Technologies edit

There are three core technologies that need to be in place for tools to be built that can effectively prioritize content according to a wide range of criteria and needs. Each is described below:

Article Importance edit

The goal is to provide a wide variety of approaches to ranking Wikipedia articles by their importance (to readers, to equity, to other articles, etc.).

Article Quality edit

Prioritization also requires some indication of how high quality an article is and therefore the expected benefit from improving it. Articles that demonstrate the greatest misalignment -- i.e. high importance, low quality -- are generally content that should be most highly prioritized. For more background, see ORES Article Quality models and Warncke-Wang et al.[1]

Topic Spaces edit

While there are use-cases that require prioritizing content across an entire Wikipedia language edition, people will also often be interested in prioritizing content that is relevant to a specific topic space -- e.g., articles related to ocean sustainability. For this to be effective, there will need to be technologies that assist people in building lists of content (akin to campaign worklists or WikiProjects). Closely related is language-agnostic topic classification, though that approach depends on a pre-defined taxonomy while topic spaces or list-building allows for more ad-hoc, user-defined topics.

Current Systems edit

There are already a variety of recommender systems and related technologies that do explicitly prioritize articles for editing or creation. I have been working to analyze these systems to understand what impact they are having on outcomes like equity as well as to understand how much editor interest affects what recommendations are actually acted upon.

References edit

  1. a b Warncke-Wang, Morten; Ranjan, Vivek; Terveen, Loren; Hecht, Brent (2015). "Misalignment Between Supply and Demand of Quality Content in Peer Production Communities" (PDF). Proceedings of the 9th International Conference on Web and Social Media, ICWSM 2015. Retrieved 18 August 2020. 
  2. Maher, Katherine; Uzzell, Janeen (4 June 2020). "We stand for racial justice.". Medium. Retrieved 18 August 2020.