Research:Cover Women

1 June
[[en:Associate Professor at FIMA, Universitat de Barcelona|Associate Professor at FIMA, Universitat de Barcelona]]
Miquel Centelles, Laura Fernández
Duration:  2024-06 – 2025-06
gender, front-page, gate keeping, intersectionalities, diversity

This page documents a planned research project.
Information may be incomplete and change before the project starts.

This proposal presents a research project that will look into the most popular Wikipedia pages. This page, known as the main page, or front page from a communication perspective, will be analyzed across the seven longest-standing Wikipedia editions: English, German, Catalan, French, Portuguese, Italian, and Spanish. Grounded in a gender and intersectional perspective; this study will delve into the daily content, newsroom guidelines (principles and standards that guide the dissemination of information), and volunteer community insights. The examination will employ communication theories like gatekeeping and agenda-setting. Beyond academic research, our goal is to actively contribute to editing communities by addressing the daily challenges and needs in caring front-page content.



Wikipedia; as a key player in the public sphere, transforms information dissemination. Still; Wikipedia grapples with persistent gender bias in both editing and content(Antin et al., 2011; Bear and Collier, 2016; Wagner et al., 2016; Hinnosaar, 2019; Minguillón et al., 2021; Ferran-Ferrer, Boté-Vericad, et al., 2023) Alongside additional prejudices (Redi et al., 2021; Beytía et al., 2022); bias in contributions perpetuates imbalances in content coverage and discourages diversity, which further exacerbates the issue (Worku et al., 2020). Scholars highlight the need for a comprehensive understanding of Wikipedia's knowledge production culture to address these biases and make Wikipedia more robust, reliable, and transparent (Menking and Erickson, 2015). Reducing the gender and other intersectional biases necessitates more than acknowledging Wikipedia as a mirror of societal biases—it involves addressing the platform's deeper logic embedded in its techno-scientific project (Ford and Wajcman, 2017). We have selected the most popular Wikipedia pages for analysis. This page, commonly referred to as the main page, or front page from a communication perspective, is accessible in all language editions of the global encyclopedia, and we will conduct our study on it.

We will research into the possible gender and intersectional bias in its daily content, in its news room guidelines (principles and standards that govern the dissemination of information), and in the insights from the volunteer community who decide which information gets disseminated to the public on the main page. This research will utilize communication theories such as gatekeeping which examines the process by which information is filtered, selected, and ultimately presented to the public (Barzilai-Nahon, 2009) and agenda-setting (McCombsandShaw,1972) which studies the effect. 

Therefore, the research questions that we address are: ● ResearchQuestion1(RQ1):What insights do interviews with volunteer gatekeepers (editors of the main page of Wikipedia) provide on decision-making, biases, and strategies affecting the visibility of gender and intersectionality-related content on Wikipediaʼs front page, particularly regarding how their preferences and interests, shape the topics featured? ● RQ2:How does gatekeeping impact gender gaps in content representation on digital platforms, specifically in the peer production of knowledge (decision-making system on suitable content and what is not) within newsrooms or editorial policies, and why is understanding this phenomenon crucial for addressing gender disparities? ● RQ3:How does agenda setting influence the selection of frames and sentiment adopted by Wikipedia pages concerning specific issues or events, and how does it shape the focus and intensity of user edit activity within Wikipedia? ● RQ4:How prevalent is gender and intersectional bias in the content featured on Wikipedia's front pages? This research is necessary to draw further attention to the need for systemic change within the platform's newsroom/editorial practices to address disparities in gender and diversity representation in online knowledge and foster a more inclusive and diverse digital information landscape.


To contrast the feasibility of this proposal with seven language editions of Wikipedia, we have already conducted a micro-project with a sample of the English and Spanish Wikipedia to assess the viability of the global project. That is: a) If there are open and formalized recommendations and guidelines that determine which contents are published on the main page and if the publication criteria can be analyzed. b) At the same time, we were interested in seeing if with data wrangling techniques we could work with the biographies published on all Wikipedia main pages and analyze them from a gender and intersectional perspective using the properties of Wikidata. c) Finally, we highlight the ease of contacting the community that performs gatekeeping tasks, and we begin to prepare the relevant questions to understand the decision-making process, editorial practices, and identify the issues that may be relevant to understanding the phenomenon.

The results of this previous trial work, with two language editions, will be published soon (Ferran-Ferrer et al., 2024). The trend is not encouraging if we take into account that bias in contributions perpetuates imbalances in content coverage and discourages diversity, which further exacerbates the issue (Worku et al., 2020). To address this, scholars stress the importance of understanding Wikipedia's knowledge production culture to tackle its gender gap (Menking and Erickson, 2015). Addressing this issue requires delving into the foundational principles driving the platform's techno-scientific project (Ford and Wajcman, 2017; Geiger, 2017), necessitating the recognition and dismantling of exclusionary practices (Menking and Rosenberg, 2021). Communication theories like gatekeeping and agenda-setting provide valuable frameworks for understanding Wikipedia's potential biases. Gatekeeping theory, focusing on information f iltering processes, is applied to scrutinize stories selected for the Front Page, which attracts millions of readers monthly (Barzilai-Nahon, 2009; Wikimedia, 2023). Gatekeeping theory has previously been applied to Wikipedia by researchers to further understand biases in content selection and presentation (Li and Farzan, 2020) and to advocate for a reorganization of online spaces to democratize content and encourage dialectical gatekeeping that could reduce racial and other disparities (Ezell, 2021). Additionally, drawing from agenda-setting theory, we examine how Wikipedia's main page influences viewers and shapes news hierarchy, including its agenda-building power (McCombs and Shaw, 1972; Ren and Xu, 2023). Agenda setting can impact the choices of frames and sentiment adopted by Wikipedia pages regarding a particular issue or event (Lee, 2018) and it can play a role in shaping the focus and intensity of user edit activity in Wikipedia (Mahabir et al., 2018). This study goes beyond affirming Wikipedia's reflection of reality to delve into its systemic challenges (Ford and Wajcman, 2017). It analyses not only main page content selection but also newsroom guidelines, including interviews with gatekeepers, to enhance understanding and address systemic issues.



This research proposal outlines a study on gender representation and biases on Wikipediaʼs main page, the most visited Wikipedia page, the main page (or front page from a communication perspective), which got 46.8 billion visits last November on the English edition (Wikimedia, 2023). We will do a comparative analysis across seven longest-standing Wikipedia editions, English, German, Catalan, French, Portuguese, Italian, and Spanish, all of them born in 2001, employing a mixed-methods approach. Grounded in gender and intersectionality, the study will analyze daily content, editorial/newsroom guidelines, and insights from volunteer communities using communication theories like gatekeeping (Barzilai-Nahon, 2009) and agenda-setting (McCombsandShaw,1972). Our aim is not only academic research, but also active contribution to editing communities by addressing daily challenges in caring front-page content. Therefore, in the project's work team, we have already included seven working groups of Wikipedia users involved in gender for each language edition and the chapters of all the Wikipedias analyzed in this project (See Table 2). The first stage of the project will be: a) To conduct a scoping review, a systematic literature review using the SALSA Framework(Grant and Booth, 2009) to analyze the academic publications from 2001 to 2024. This review will concentrate on examining Wikipedia within the framework of a communication ecosystem. Then, we will employ a triangulation methodology. b) In-depth interviews with voluntary editors of the front page from all seven Wikipedia editions to ascertain decision-making processes, biases, and strategies that influence content visibility related to gender and other inter-sectionalities. The interviews will be conducted in person or online and in the native languages of the volunteer participants. We plan to make around five interviews by language edition. Contacts with the volunteers will be obtained through discussion pages related to editing the main page, as well as from user groups participating in the project, such as calls from the same chapters to their networks. The interview transcriptions will be coded and analyzed using qualitative data analysis software, and a specific codebook will be generated to facilitate the coding. This methodological approach will address RQ1andRQ3. c) News room guidelines: We will apply content analysis to main-page, or front-page editorial guidelines, for each language edition, and we will explore what leads the decision-making of the gatekeepers who determine story prominence. The content of these guidelines will be coded and analzed using qualitative data analysis software, and a specific codebook will be generated to facilitate the coding. This research strategy will tackle RQ2. The analysis of the qualitative approach to agenda setting and gatekeeping practices (RQ1-3) will be conducted independently with two codebooks, one for the interviews and one for the editorial policies. However, each codebook will encode elements specific to gatekeeping and agenda setting to obtain evidence that corresponds to the theoretical framework. d) Main-page content quantitative analysis: We will scrutinize the content (biographies) on the front page in each of the seven language editions for ten years, with data wrangling. To do so, first, we will identify the sections of the main page that are consistently present across all Wikipedias and are easily comparable. Wikipedia's front pages regularly feature changing content, offering a snapshot of current events, featured articles, and useful links. It's important to note that volunteers maintain these main pages and may evolve in format and content over time. For each language edition, a unique method will be employed to retrieve the content and data of its main page from the past ten years, as the URLs of previous main pages cannot be obtained from the dumps. Quantitative analysis will begin by scraping through the open-source tool Open Refine to reconcile the URIs found in the sections of Wikipedia covers in both language editions. This process will enrich them with specific properties from Wikidata to obtain values of the selected properties for study: like P21 (sex or gender), P106 (occupation), P172 (ethnic group), P103 (native language) and others. Open Refine, utilized in various contexts and applications, is essential for this research as it enables the preparation and analysis of vast amounts of data. This method will respond to RQ4. Table 1 offers a comprehensive overview of the research proposal.

Expected output


The specific research outputs that we envision for our proposed project include, but are not limited to: ● Scientific publications: We will draft scientific publications for each research question and assess whether the approach is comparative across all editions or if it is better to separate them by smaller communities, editorial process typologies, etc. This will be determined once the study is completed to decide on the best dissemination approach. ● The data set, emerging from RQ4, will be made available as downloadable dumps, and will be accessed via public APIs and a SPARQL endpoint. ● Participation at least at these conferences: ○ Wikiworkshop ○ Wikimania ○ WikiWomenCamp ○ Each user group and chapter will participate in national or regional events with Women Cover results. ● Tools to support the editorial tasks of gatekeeping, namely: ○ Guidelines for content selection on front pages that are attuned to intersectionality and gender diversity; ○ Bots and AI assistants that facilitate the content selection process for front pages, with a focus on acknowledging intersectionality and gender differences. Both tools will be developed with a focus on considering the collaborative environment and consensus-driven approach characteristic of Wikipedia. ● Resources aimed at enhancing the archiving and curation of main-page content across all Wikipedia editions outlined in this proposal. For each output, explain who the primary intended audience for the output is and what benefit, if more specifics are available, they can gain by receiving the output. If you have specific publication venues, conferences, and so on in mind, please list these.

Community impact plan


The project aligns with the Wikimedia Movement's 2030 strategy by focusing on delivering knowledge as a service and addressing equity in knowledge and communities overlooked by structures of power and privilege. Furthermore, Cover Women project will involve a teamof5researchers, professors from the University of Barcelona, one from the UOC, and a PhDstudent, with a multidisciplinary perspective, as we have individuals from the fields of communication, semantic web, digital humanities, and computer science. Additionally, this project proposal has been designed according to the needs of various activist groups regarding gender equality on Wikipedia, as well as with the boards of the chapters involved in each language edition. See Table 2 to anticipate the impact on communities we will reach. These users are groups of Wikipedia users who work to achieve a better Wikipedia by introducing a gender perspective. Since we are working with 7 different editions of Wikipedia, we have considered that having a user group of female editors for each edition and a representation from each chapter's board would be interesting to achieve the project's objectives and meet the real needs of the communities. This project will provide: a) Decade-long insights into gender and intersectionality content representation on Wikipediasʼ front pages. (RQ4) b) Beyond descriptive stats, we'll reveal bias trends. (RQ4) c) Editorial strategies for gatekeeping and agenda setting. (RQ1-3) d) Guidelines for ethical content selection using AI and bots. (RQ3) e) Technical guidance to enhance data archives on main pages. (RQ4) f) Collaborative work with volunteers ensures inclusivity, integrating advocate perspectives for a consensus-driven approach.(RQ1) Built on in-depth interviews and stable and lasting collaboration with Wikipedia chapters and user groups, this work addresses gender identity under-representation. We will utilize Wikipedia's consensus-based decision-making approach to address our research questions. This method prioritizes addressing the legitimate concerns of its editors and finding a middle ground, all while adhering to Wikipedia's established policies and guidelines. In this context, it is crucial to consider that consensus naturally evolves among editors as they make changes, the importance of quality arguments in determining consensus, the allowance for consensus to evolve based on new evidence, and the acknowledgment of decisions beyond the scope of editor consensus. This methodology underscores Wikipedia's emphasis on collaboration, incremental progress, and communal harmony in managing a large crowd-sourced encyclopedia.



