Research:Content gaps on Wikipedia/Matrix

Sample content gap matrix - Gender gapEdit

One useful way to test this matrix framework is to take a well-documented content gap (one where there is a lot of research), and try to fit all of the existing research into one cell of the matrix.

Below is a sample content gap matrix that contains a mix of identified content gaps and hypothetical content gaps (in italics). This matrix focuses on the binary gender gap—arguably the most well-known and well-researched content gap in the Wikimedia movement. The matrix demonstrates that there is no single "gender gap". Instead there are a variety of ways in which gaps can manifest in content about or of interest to men and women!

Selection Extent Framing
Internal comparison There are more biographies of men than women on English Wikipedia (Klein et al. 2016) Biographies of women scientists are lower quality, on average, than Wikipedia articles as a whole (Halfaker 2017) The lead sections of biographies of women use terms like “divorce”, “family” and “spouse” more than biographies of men (Wagner et al. 2016)
External comparison Women scholars with high h-index are less likely to have Wikipedia articles compared to men scholars with comparable h-index (Schellekens et al. 2019) Wikipedia biographies of notable men and women are both longer and more equal in length, on average, than biographies of those same people in Encyclopedia Britannica. (Reagle and Rhou 2011)* Articles about professions on German Wikipedia are substantially more likely to include pictures of men than women, regardless of the relative proportion of men and women in that profession according to labor statistics. (Zagovora et al. 2017)
Reader needs Hypothetical: Wikipedia articles are less likely to show up in the first page of results of Google searches by women vs. searches by men. Top hits from Wikipedia’s internal search for topic keywords related to women’s interest (sourced from a magazine corpus) were more likely to be redirects than keywords sourced from mens’ magazines (Menking et al. 2017) Hypothetical: Women seeking health information on Wikipedia are less satisfied with the way medical information is presented than are men seeking health information on Wikipedia.

Works cited in this matrix (click to expand)

Halfaker, A. (2017). Interpolating Quality Dynamics in Wikipedia and Demonstrating the Keilana Effect. Proceedings of the 13th International Symposium on Open Collaboration - OpenSym ’17, 1–9.

Lam, S. (Tony) K., Uduwage, A., Dong, Z., Sen, S., Musicant, D. R., Terveen, L., & Riedl, J. (2011). WP:clubhouse?: an exploration of Wikipedia’s gender imbalance. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (pp. 1–10). New York, NY, USA: ACM.

Maximilian Klein, Harsh Gupta, Vivek Rai, Piotr Konieczny, and Haiyi Zhu. 2016. Monitoring the Gender Gap with Wikidata Human Gender Indicators. In Proceedings of the 12th International Symposium on Open Collaboration (OpenSym '16). ACM, New York, NY, USA, , Article 16 , 9 pages. DOI:

Menking, A., McDonald, D. W., & Zachry, M. (2017). Who Wants to Read This?: A Method for Measuring Topical Representativeness in User Generated Content Systems. Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing - CSCW ’17, 2068–2081.

Reagle, J., & Rhue, L. (2011). Gender Bias in Wikipedia and Britannica. International Journal of Communication, 5, 21. Retrieved from

Schellekens, M. H., Holstege, F., & Yasseri, T. (2019). Female scholars need to achieve more for equal public recognition, 1–6. Retrieved from

Wagner, C., Graells-Garrido, E., Garcia, D., & Menczer, F. (2016). Women through the glass ceiling: gender asymmetries in Wikipedia. EPJ Data Science, 5(1), 5.

Zagovora, O., Flöck, F., & Wagner, C. (2017). “( Weitergeleitet von Journalistin )”: The Gendered Presentation of Professions on Wikipedia. WebSci 2017 - Proceedings of the 2017 ACM Web Science Conference, 83–92.

Sample content gap matrix - Spanish Civil WarEdit

Another useful way to test the matrix framework is to take a specific Wikipedia topic or concept about which there are not any well-known gaps, and to attempt to create hypotheses about plausible or potential gaps for each of the cells in the matrix. The matrix below provides an example of this strategy, using the topic Spanish Civil War. All examples are hypothetical.

Selection Extent Framing
Internal comparison Spanish Wikipedia contains more articles about the Spanish Civil War than English Wikipedia. Articles about the Spanish Civil War on Spanish Wikipedia contain more images than articles about the Spanish Civil War on English Wikipedia. Articles about Francisco Franco on Spanish Wikipedia present his rise to power in a more sympathetic way than similar articles on English Wikipedia.
External comparison Wikipedia contains fewer articles about important events in the Spanish Civil War than Salvado’s Historical Dictionary of the Spanish Civil War. Wikipedia articles about important events in the Spanish Civil War are shorter on average than sections devoted to these events in Salvado’s Historical Dictionary of the Spanish Civil War. Spanish Wikipedia articles about the Spanish Civil War uses more patriotic images, and fewer images of violence and devastation, than a recent museum exhibit about the war.
Reader needs Articles about the Spanish Civil War that receive a high volume of page views on Spanish Wikipedia are under-represented on other Wikipedia languages. Articles about the Spanish Civil War in English Wikipedia overwhelmingly cite Spanish-language sources. Articles about the Spanish Civil War are not heavily linked to articles about the Catalan Independence Movement.