Research:Characterizing Existing Practices for Identifying and Mitigating Knowledge Gaps

Duration:  2020-07 – ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


This is the first study of my proposed dissertation work, which aims to evaluate approaches for identifying knowledge gaps. This work is related to the Wikimedia Foundation's strategic direction for 2030, which identifies knowledge gaps as a primary focus of research and development in the next five years [1]. The second study of my dissertation work is outlined here. If you are interested in participating, please fill out this screener and consent form. Participants must be Wikipedia editors who speak English and will be compensated for their time. Additionally, Feel free to reach out to me at my user page or maddock@u.northwestern.edu if you have any thoughts and suggestions. Thanks!

Study Title: Characterizing Existing Practices for Identifying and Mitigating Knowledge Gaps

PI: Darren Gergle

IRB Study #: STU00212033

Overview edit

As researchers and community members have better characterized and documented Wikipedia's biases, organizations and groups of editors have attempted to add and improve content which cultivates equitable and balanced knowledge representation. As one example, the organization Whose Knowledge identifies as a “global campaign to center the knowledge of marginalized communities.” In 2019 Whose Knowledge led multiple initiatives to add missing images of influential women to Wikipedia, particularly focusing on influential women of color.

In this research I plan to document and characterize existing methodologies for identifying knowledge gaps, with the end goal of understanding how tools could augment these processes. While prior research has not developed methods for systematically identifying latent knowledge gaps, lack of academic work does not necessarily indicate that these methods do not exist. Editors have undoubtedly developed methods for identifying and mitigating knowledge gaps. Specifically, I aim to answer the following research questions:

RQ 1: What processes do editors currently use to identify and mitigate knowledge gaps?

RQ 2: How can socio-technical systems better facilitate existing knowledge gap identification and mitigation processes?

Specific Areas of Study edit

Identification of knowledge gaps edit

Existing practices can inform successful tool design. Editors may leverage processes for identifying knowledge gaps which could be augmented to work more effectively or scaled to work across the entire encyclopedia. Although these processes are insufficient at present given the breadth of Wikipedia, they represent a reasonable starting point for this work. However, current research does not investigate the extent to which editors systematically identify gaps, if at all. What process do editors use to identify missing information?

Triage edit

Not all knowledge gaps are created equal; some missing content only leads to incomplete information, while systematically excluded content can lead to inequitable, under-representation. For instance, while lower numbers of articles about female historical figures would indicate problematic under-representation and gender bias, an incomplete article about Western Military History (which is a traditionally over-produced topic [2]) would not. Critically, in this example overproduction of Western Military History is related to the demographics of most Wikipedia editors, while women historical figures represent a minority interest. This connection between under-representation within the editor community and underproduction can lead to systematic knowledge gaps. When editors identify a knowledge gap, how do they triage the missing information and determine which gaps most deserve attention?

Editor Workflow Integration edit

Due to the amount of work required to maintain and improve Wikipedia, editors must allocate their attention efficiently. Editors must use their limited time to add new content, improve existing content, and police vandalism, in addition to performing a host of community building, management, and newcomer socialization tasks. Therefore, systems and tools that add to editors’ workloads are not adopted or are quickly abandoned. For example, while the AFT provided some useful information, lack of editor workflow integration and a high volume of non-actionable feedback required that editors spend large amounts of time searching for helpful suggestions. In order to avoid the adoption problems that befell prior reader-sourced systems, this project must integrate and support current practices for adding missing content. How can a tool for knowledge gap identification augment and integrate into existing editor workflows?

Methodology edit

I will use an Asynchronous Remote Community (ARC) study of these editors and analysis of page creation log data to understand existing practices for identifying knowledge gaps. At a high level, ARC studies create a small, private, online community with participants from the population of interest. Over the course of several weeks, participants participate in an online, asynchronous focus group. Using this methodology, I will conduct a single four week study with participants recruited from Wikipedia editor communities. I will specifically target editors from communities that aim to identify and fill knowledge gaps, such as WikiProject Women writers/Missing articles, WikiProject Missing encyclopedic articles, WikiProject Notability, WikiProject Requested articles, and WikiProject Intertranswiki.

Through the ARC aim to understand which processes exist for identifying and mitigating knowledge gaps. For instance, some editors might choose to allocate time to articles where known gaps exist, some might add information to articles where they are subject experts, and others might actively look for new knowledge gaps to add to “Articles for Creation”. Characterizing these different processes will inform system design and ensure that the tool augments a variety of different workflows.

How to Participate edit

If you are interested in participating, please fill out this screener and consent form.

References edit

  1. Zia, L., Johnson, I., Mansurov, B., Morgan, J., Redi, M., Saez-Trumper, D., & Taraborelli, D. (2019). Knowledge Gaps – Wikimedia Research 2030. Figshare. https://doi.org/10.6084/m9.figshare.7698245
  2. Warncke-Wang, M., Ranjan, V., Terveen, L., & Hecht, B. (2015, April 21). Misalignment Between Supply and Demand of Quality Content in Peer Production Communities. Ninth International AAAI Conference on Web and Social Media. Ninth International AAAI Conference on Web and Social Media. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10591