Research:Understanding Curious and Critical Readers

Tracked in Phabricator:
Task T293036
16:48, 15 October 2021 (UTC)
Dani S. Bassett
David Lydon-Staley
Shubhankar Patankar
Perry Zurn
Duration:  2021-September – ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

As part of the strategic direction “Knowledge as a Service”, the Wikimedia Research team is developing new approaches to support the users of the Wikimedia projects in preserving the reliability and integrity of Wikimedia projects [1] (including the problem of disinformation and misinformation[2]). The main focus has been on content directly or supporting editors[3][4]. Here we want to approach this problem from the perspective of readers who are consuming the content:

The rise of misinformation and disinformation keeps us up at night. No law or fancy new AI is going to solve the problem. We all have to be a little more vigilant, a little more thoughtful, a little more careful when sharing information -- and every once in a while we need to call bullshit when we see it.


Everyday, Wikimedia projects are accessed by millions of users from all over the world with different motivations and seeking for different information[5]. The question is how we can empower these readers to be more curious and inquisitive about the information they find on Wikimedia projects; such that, as a result, they become more resilient to unreliable content as well as misinformation and disinformation. In order to approach this question, we need to understand how readers on Wikipedia are curious in general, as well as how readers can engage critically with the information they encounter there.

Research questionsEdit

Curious ReadersEdit

How are readers curious when seeking for information in Wikipedia?

Curiosity is considered a multi-faceted trait including aspects such as deprivation sensitivity, joyous exploration, social curiosity, etc (instead of just low or high curiosity). While curiosity is generally regarded beneficial to learning,  there are now indications that there is a “dark side of curiosity”[6]. Specifically, one facet of curiosity called deprivation sensitivity was associated with errors in discerning the novelty and quality of information. In a recent study it was shown how participants with different traits pursue different strategies when browsing pages on Wikipedia[7]. Characterizing the knowledge networks constructed during the information seeking, they showed how to quantify different curiosity profiles from the browsing sessions. However, the study was restricted to only a small sample of a few hundred readers of English Wikipedia in the US and thus it is hard to generalize the findings.

Therefore, we want to explore how this framework can be applied to quantify the different forms of curiosity of readers in Wikipedia across countries and languages. This allows us to characterize whether information seeking strategies are adapted by readers when encountering controversial or unreliable content. Finally, we can empower readers to explore and discover content on Wikipedia using different curiosity styles by using the generative models developed to describe the growth of knowledge networks within a well-founded theoretical framework.

Critical readersEdit

How can we quantify the degree to which readers are critically engaging with information on Wikipedia?

One of the key mechanisms for ensuring the reliability of content on Wikipedia is the use of citations to back up facts by reliable sources (Verifiability); though direct engagement with citations has been found to be generally low[8]. Another approach aims at showing the reader “how the sausage is made” by looking at other elements of the article, such as discussions on talk pages or the version history, which can provide information about the trustworthiness of the content. In fact, using UNESCO’s information and literacy framework, the Wikipedia’s Education Team has recently compiled a guide for teachers on how to evaluate an article’s quality and reliability based on such clues. Similar recommendations can also be found in the Civic Online Reasoning curriculum from Stanford's History Education Group which describes how to use Wikipedia when reading laterally as a strategy to evaluate online information. Surveys have found that learning more about the process of Wikipedia can lead to decrease in trust[9], however, it was also shown that trust can go up or down depending on how the material was presented [10]. A more recent study showed that a compound trustworthiness indicator consistently drives readers’ perception towards the true quality of the article [11].

Therefore, our aim is to understand how much readers (not editors) engage with additional non-content elements of an article to critically assess its reliability. We will specifically focus on the extent to which talk pages, version history, or pages from other namespaces related to reliability (such as templates or policies) are part of the reading sessions. We will investigate the variability with respect to the topic and the quality of an article as well as how controversial it is. This could provide insights into which features are already used the most by readers in case of controversial content. Furthermore, analyzing readers from different countries and projects will likely reveal important differences in how process-related elements of articles are used across the globe. This is especially important in the view of most survey-based studies focusing on readers of English Wikipedia based in the United States.



In order to better understand the framing and scope of curiosity  and critical reading in the context of Wikipedia we will conduct a literature review. For the other research questions we will analyze webrequest logs to study reading patterns.

We will build a first dataset from reading sessions which capture the knowledge-building networks over longer time periods (e.g. app users). Using the proposed network-based approaches we can characterize different types of curiosity in Wikipedia and how they vary across countries and languages. We can explore models of curiosity to empower readers in being curious in ways that allows them to better examine reliability of content.

We will build a second dataset from reading sessions which capture engagement with non-content elements such as talk-pages. A first proxy will be based on the number of clicks extracted from webrequest-logs. We will build models to assess which elements are most useful for readers when encountering, e.g., controversial content.


  1. Scoping the problem space and identifying potential collaborators
  2. Generating datasets and starting exploratory analysis
  3. Refining specific questions around quantifying curiosity types and critical reading in Wikipedia
  4. Writing report and communication of results



  1. Zia, L., Johnson, I., Mansurov, B., Morgan, J., Redi, M., Saez-Trumper, D., & Taraborelli, D. (2019). Knowledge Integrity - Wikimedia Research 2030.
  2. Saez-Trumper, D. (2019). Online Disinformation and the Role of Wikipedia. In arXiv [cs.CY]. arXiv.
  3. Halfaker, A., & Stuart Geiger, R. (2019). ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia. In arXiv [cs.HC]. arXiv.
  4. Morgan, J. (2019). Patrolling on Wikipedia.
  5. Lemmerich, F., Sáez-Trumper, D., West, R., & Zia, L. (2019). Why the World Reads Wikipedia: Beyond English Speakers. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, 618–626.
  6. Zedelius, C. M., Gross, M., & Schooler, J. (2021). Inquisitive but Not Discerning: Deprivation Curiosity is Associated with Excessive Openness to Inaccurate Information.
  7. Lydon-Staley, D. M., Zhou, D., Blevins, A. S., Zurn, P., & Bassett, D. S. (2020). Hunters, busybodies and the knowledge network building associated with deprivation curiosity. Nature Human Behaviour.
  8. Piccardi, T., Redi, M., Colavizza, G., & West, R. (2020). Quantifying Engagement with Citations on Wikipedia. In arXiv [cs.CY]. arXiv.
  9. Towne, W. B., Kittur, A., Kinnaird, P., & Herbsleb, J. (2013). Your process is showing: controversy management and perceived quality in wikipedia. Proceedings of the 2013 Conference on Computer Supported Cooperative Work, 1059–1068.
  10. Kittur, A., Suh, B., & Chi, E. H. (2008). Can you ever trust a wiki? impacting perceived trustworthiness in wikipedia. Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work, 477–480.
  11. Kuznetsov, A., Novotny, M., Klein, J., Saez-Trumper, D., & Kittur, A. (2020). Templates and Trust-o-meters: Towards a widely deployable indicator of trust in Wikipedia. Unpublished.