Research:Understanding Curious and Critical Readers

Tracked in Phabricator:
Task T293036
16:48, 15 October 2021 (UTC)
Dani S. Bassett
David Lydon-Staley
Shubhankar Patankar
Perry Zurn
Duration:  2021-September – ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

As part of the strategic direction “Knowledge as a Service”, the Wikimedia Research team is developing new approaches to support the users of the Wikimedia projects in preserving the reliability and integrity of Wikimedia projects [1] (including the problem of disinformation and misinformation[2]). The main focus has been on content directly or supporting editors[3][4]. Here we want to approach this problem from the perspective of readers who are consuming the content:

The rise of misinformation and disinformation keeps us up at night. No law or fancy new AI is going to solve the problem. We all have to be a little more vigilant, a little more thoughtful, a little more careful when sharing information -- and every once in a while we need to call bullshit when we see it.

— Bergstrom, C. T., & West, J. D. (2021). Calling bullshit: the art of skepticism in a data-driven world.

Everyday, Wikimedia projects are accessed by millions of users from all over the world with different motivations and seeking for different information[5]. The question is how we can empower these readers to be more curious and inquisitive about the information they find on Wikimedia projects; such that, as a result, they become more resilient to unreliable content as well as misinformation and disinformation. In order to approach this question, we need to understand how readers on Wikipedia are curious in general, as well as how readers can engage critically with the information they encounter there.

Research questions


Curious Readers


How are readers curious when seeking for information in Wikipedia?

Curiosity is considered a multi-faceted trait including aspects such as deprivation sensitivity, joyous exploration, social curiosity, etc (instead of just low or high curiosity). While curiosity is generally regarded beneficial to learning,  there are now indications that there is a “dark side of curiosity”[6]. Specifically, one facet of curiosity called deprivation sensitivity was associated with errors in discerning the novelty and quality of information. In a recent study it was shown how participants with different traits pursue different strategies when browsing pages on Wikipedia[7]. Characterizing the knowledge networks constructed during the information seeking, they showed how to quantify different curiosity profiles from the browsing sessions. However, the study was restricted to only a small sample of a few hundred readers of English Wikipedia in the US and thus it is hard to generalize the findings.

Therefore, we want to explore how this framework can be applied to quantify the different forms of curiosity of readers in Wikipedia across countries and languages. This allows us to characterize whether information seeking strategies are adapted by readers when encountering controversial or unreliable content. Finally, we can empower readers to explore and discover content on Wikipedia using different curiosity styles by using the generative models developed to describe the growth of knowledge networks within a well-founded theoretical framework.

Critical readers


How can we quantify the degree to which readers are critically engaging with information on Wikipedia?

One of the key mechanisms for ensuring the reliability of content on Wikipedia is the use of citations to back up facts by reliable sources (Verifiability); though direct engagement with citations has been found to be generally low[8]. Another approach aims at showing the reader “how the sausage is made” by looking at other elements of the article, such as discussions on talk pages or the version history, which can provide information about the trustworthiness of the content. In fact, using UNESCO’s information and literacy framework, the Wikipedia’s Education Team has recently compiled a guide for teachers on how to evaluate an article’s quality and reliability based on such clues. Similar recommendations can also be found in the Civic Online Reasoning curriculum from Stanford's History Education Group which describes how to use Wikipedia when reading laterally as a strategy to evaluate online information. Surveys have found that learning more about the process of Wikipedia can lead to decrease in trust[9], however, it was also shown that trust can go up or down depending on how the material was presented [10]. A more recent study showed that a compound trustworthiness indicator consistently drives readers’ perception towards the true quality of the article [11].

Therefore, our aim is to understand how much readers (not editors) engage with additional non-content elements of an article to critically assess its reliability. We will specifically focus on the extent to which talk pages, version history, or pages from other namespaces related to reliability (such as templates or policies) are part of the reading sessions. We will investigate the variability with respect to the topic and the quality of an article as well as how controversial it is. This could provide insights into which features are already used the most by readers in case of controversial content. Furthermore, analyzing readers from different countries and projects will likely reveal important differences in how process-related elements of articles are used across the globe. This is especially important in the view of most survey-based studies focusing on readers of English Wikipedia based in the United States.





In order to better understand the framing and scope of curiosity  and critical reading in the context of Wikipedia we will conduct a literature review. For the other research questions we will analyze webrequest logs to study reading patterns.

We will build a first dataset from reading sessions which capture the knowledge-building networks over longer time periods (e.g. app users). Using the proposed network-based approaches we can characterize different types of curiosity in Wikipedia and how they vary across countries and languages. We can explore models of curiosity to empower readers in being curious in ways that allows them to better examine reliability of content.

We will build a second dataset from reading sessions which capture engagement with non-content elements such as talk-pages. A first proxy will be based on the number of clicks extracted from webrequest-logs. We will build models to assess which elements are most useful for readers when encountering, e.g., controversial content.


  1. Scoping the problem space and identifying potential collaborators
  2. Generating datasets and starting exploratory analysis
  3. Refining specific questions around quantifying curiosity types and critical reading in Wikipedia
  4. Writing report and communication of results



[2022-04] a first exploratory analysis to understand how readers engage with the version-history and the talk-page of an article Detailed analysis: Research:Understanding_Curious_and_Critical_Readers/Reader_interactions_with_talk-pages_and_version-history

[2022-07] a first exploratory analysis of the knowledge networks of Wikipedia readers. Previous studies (KNOT) have shown how knowledge networks capture different aspects of curiosity of readers. We find that the knowledge networks of Wikipedia readers are structurally very similar to the ones observed in the KNOT-study. We are therefore more confident that this approach can be generalized to the general population of Wikipedia readers. Detailed analysis: Research:Understanding Curious and Critical Readers/Knowledge Networks of Wikipedia Readers

[2023-06] We performed a more thorough analysis of the knowledge networks of Wikipedia readers. In particular we extend the previous exploratory analysis in several ways: i) we account for systematic differences in the average size of the networks by generating a biased sample via propensity-score matching which leads to more comparable populations of networks; ii) comparing the knowledge networks of readers from different countries and language versions; iii) provising a more quantitative and systematic comparison of the networks revealing a better picture of the similarities and nuanced differences. Taken together, this analysis provides the first replication-type study for the network-based approach to capture curiosity of readers in Wikipedia. Specifically, we not only show the consistency in the measurement of knowledge networks between lab-based studies and observations "in the wild", but also identify new aspects in these knowledge networks such as previously hypothesized curiosity types. More details: Research:Understanding Curious and Critical Readers/Detailed analysis of knowledge networks of wikipedia readers

[2023-12] We wrote a paper Architectural styles of curiosity in global Wikipedia mobile app readership[12] which is available as a preprint. We replicate previous findings characterizing different curiosity types, which allows us to  generalize the framework of knowledge networks to the larger population of readers in Wikipedia. We uncover systematic differences across countries and languages associated with population-level indicators of well-being, education, and equality, highlighting the need to take into account local context of readers.




  1. Zia, L., Johnson, I., Mansurov, B., Morgan, J., Redi, M., Saez-Trumper, D., & Taraborelli, D. (2019). Knowledge Integrity - Wikimedia Research 2030.
  2. Saez-Trumper, D. (2019). Online Disinformation and the Role of Wikipedia. In arXiv [cs.CY]. arXiv.
  3. Halfaker, A., & Stuart Geiger, R. (2019). ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia. In arXiv [cs.HC]. arXiv.
  4. Morgan, J. (2019). Patrolling on Wikipedia.
  5. Lemmerich, F., Sáez-Trumper, D., West, R., & Zia, L. (2019). Why the World Reads Wikipedia: Beyond English Speakers. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, 618–626.
  6. Zedelius, C. M., Gross, M., & Schooler, J. (2021). Inquisitive but Not Discerning: Deprivation Curiosity is Associated with Excessive Openness to Inaccurate Information.
  7. Lydon-Staley, D. M., Zhou, D., Blevins, A. S., Zurn, P., & Bassett, D. S. (2020). Hunters, busybodies and the knowledge network building associated with deprivation curiosity. Nature Human Behaviour.
  8. Piccardi, T., Redi, M., Colavizza, G., & West, R. (2020). Quantifying Engagement with Citations on Wikipedia. In arXiv [cs.CY]. arXiv.
  9. Towne, W. B., Kittur, A., Kinnaird, P., & Herbsleb, J. (2013). Your process is showing: controversy management and perceived quality in wikipedia. Proceedings of the 2013 Conference on Computer Supported Cooperative Work, 1059–1068.
  10. Kittur, A., Suh, B., & Chi, E. H. (2008). Can you ever trust a wiki? impacting perceived trustworthiness in wikipedia. Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work, 477–480.
  11. Kuznetsov, A., Novotny, M., Klein, J., Saez-Trumper, D., & Kittur, A. (2020). Templates and Trust-o-meters: Towards a widely deployable indicator of trust in Wikipedia. Unpublished.
  12. a b Zhou, D., Patankar, S., Lydon-Staley, D. M., Zurn, P., Gerlach, M., & Bassett, D. S. (2023). Architectural styles of curiosity in global Wikipedia mobile app readership.