Research:Metrics for quantifying the gender content gap

Duration:  2020-05 – 2020-06
This page documents a completed research project.


As part of the knowledge gaps program, the work on the taxonomy of knowledge gaps (Phab:T242172) revealed the difficulties and caveats when attempting to measure the different gaps. First, there is not just one way of measuring a gap, as described by previous research on content gaps, in particular in the form of the content gap matrix capturing different aspects of the same gap. Second, the definition of a baseline (i.e. at which point the gap is closed) is ambiguous?

Therefore, we take an in-depth look at the content gender gap in order to get an overview on the different strategies in which this gap has been studied and measured. As one of the best documented and most-well studied gaps, our aim is to gain deeper insights on the complexities of measuring content gaps and possible solutions which could inform measurement of other less-well studied gaps.


Methodology edit

For this, I scanned a wide range of sources which approached the problem of how to measure the gender content gap in Wikimedia projects (see the extended bibliography):

I will describe different metrics that have been used. This includes a description of, both, the data used and a measurement procedure along with corresponding findings.

Data edit

What are the data used to measure content gender gaps? The most common choice is to look at articles on humans (biographies) which can be easily identified across all wikipedias from Wikidata. How is the relevant content (with respect to gender) identified? The most common approaches are i) the gender property in Wikidata and ii) comparing the number of gender pronouns in the text of the articles.

Biographies. Most of the works look at biographies of persons and their gender (men, woman, and sometimes other too).

Occupations. Some works also consider occupation titles with respect to gender.

  • Percentage of men and women working in different occupations based on census data, e.g. (Garg et al. 2018)
  • List of gendered professions in the male and female form (Zagovora et al. 2017); note that this goes beyond the grammatical gender and can constitute different word forms, such as Lehrer-Lehrerin (‘teacher’ ) or Krankenpfleger-Krankenschwester (‘nurse’) in German.

Identifying gender content.

  • Wikidata
    • The gender of a wikidata item is encoded in the property sex_or_gender (P21) = male (Q6581097), female (Q6581072), intersex (Q1097630), transgender female (Q1052281), transgender male (Q2449503)
    • Since almost every Wikipedia article has a Wikidata item, this can be used for many of the Wikipedia biographies
  • DBPedia
    • DBPedia has a gender property. This comes from automatic parsing of infoboxes but is not always included. For example, (Graells-Garrido et al. 2015) report that the infobox of Simone de Beauvoir lacks the gender metadata. Thus in many studies using dbpedia data, the gender is often inferred from pronouns (see below)
  • Gender pronouns in the text
    • (Reagle&Rhue 2011) proposed to guess the gender of an article by comparing the number of masculine (he/his) and feminine (she/her) pronouns. They proposed a classification based on the quantity x = (N_f-N_m)/(N_f+N_m) with gender=woman if x>0.25, gender=man if x<-0.25, and gender=unknown if -0.25< x <0.25.
    • (Bamman&Smith 2014) evaluated this method on a random test set of 500 articles with precision = 100% and recall 97.6%
    • ORES articletopic models incorporate this feature to help with predicting the Culture.Biography.Women topic
  • https://genderize.io/
    • API to predict gender of a person or a given name, though care should be taken given the ethical tensions involved with predicting one's gender identity from someone's name or how they present in the world (see, e.g., the paper by Raji et al.)
  • https://quicksilver.primer.ai/
    • QuickSilver auto-generates biographies of notable women scientists that could be added to Wikipedia.

Metrics edit

This section gives an overview over the different metrics used for quantifying the content gender gap and gives a brief description of the data, the approach, and the finding. In organizing the wide variety of metrics, I will follow the approach proposed by (Morgan 2019) along the dimensions of the content gap-matrix. The different subsections focus on the different aspects of the gap in terms of Selection (whether the content exists or not); Extent (how much coverage it has of a certain type); and Framing (whose priorities and perspectives are reflected). For each metric I will also indicate the baseline of comparison in terms of Internal (within Wikimedia projects), External (in comparison to or based on an external source), and Interest (in comparison or based on needs of readers, editors, etc.). For some metrics, this assignment might be ambiguous; however, this framework provides a rough organization of the different approaches.

Selection edit

The most common approach to measuring the content gender gap is in terms of selection, that is the coverage of each gender in terms of the number of articles (or items).

The number of biographies on men and women in Wikimedia projects. edit

Calculate the fraction of biographies on women (with respect to all biographies). Typically, it is reported that 15-20% of biographies are on women.

Wikidata.

  • Internal comparison
    • Denelezh reports the fraction of biographies on women in Wikidata. They also stratify on year of birth, country of citizenship, and occupation.
    • WHGI reports the fraction of biographies on women in Wikidata. They also stratify on culture, country of birth, and date of birth. (Klein & Konieczny 2015) report the fraction of women per country as the WIGI-index
    • WDCM reports the number of biographies with a given gender. They also stratify on profession and place of birth.

Wikipedia and other projects.

  • Internal comparison
    • From the analysis of biographies in Wikidata, one obtains the corresponding number of articles in every Wikimedia project via the sitelinks (Denelezh, WHGI, WDCM, User:Ijon/Content gap, Gray 2019,, Klein et al. 2016, Konieczny&Klein 2018), often with the implicit assumption that a given language should have equal numbers of biographies about men and women.
    • (Graells-Garrido et al. 2015) report the fraction of women biographies in English Wikipedia based on data from DBPedia. They also stratify on birth year, and different sub-classes of persons in DBPedia (e.g. Athlete, artist, etc)
    • (Wagner et al. 2016) report the fraction of women biographies in 20 different Wikipedias following the approach by (Graells-Garrido et al. 2015). Interestingly, they perform a regression on a measure for notability (the number of editions an article appears in) with gender as one of the independent variables showing that women are more notable than men.
    • (Vitulli 2018) subjectively states that there are more Wikipedia articles on women mathematicians today than in 2013
  • External comparison. Based on an externally generated list one calculates the coverage, that is the fraction of men and women, respectively, which have an article in a given project. Comparing coverage of such a list is often done to control for notability -- e.g., the external list is of award winners who by definition are notable. There is no consensus on the findings here: Some studies found that coverage of women is (slightly) higher, while many others still find that coverage of men is higher. This difference in results is hypothesized to be due to variation over time, by language, and due to the organized work of various WikiProjects (Halfaker 2017).
    • (Reagle&Rhue 2011) generate lists of notable persons from 6 different sources and report the fraction of men and women, respectively, which do not have an article on English Wikipedia (missing articles). They show that articles on women are more likely to be missing than articles on men; but that the coverage is better than in Encyclopedia Britannica.
    • (Wagner et al. 2015) generate lists of notable persons from 3 different sources and report the fraction of men and women which have an article for 6 different Wikipedias. They find that in all Wikipedias, that women have slightly better coverage. This is surprising since it contradicts findings from many other studies.
    • (Young et al. 2016) generate a list of notable persons from a list of CEOs and report the fraction of men and women, respectively, who are missing an article in English Wikipedia. They find that for men the fraction of missing articles is higher than for women.
    • (Adams et al. 2019) generate a list of notable persons from sociology faculty at R1-universities in the US and report the fraction of men and women, respectively, that have an article on English Wikipedia. They find that men are twice as likely to have a Wikipedia article. They also perform a regression analysis for the probability of having an article with gender as independent variables and different control-variables for notability (such as h-index).
    • (Schellekens et al. 2019) generate a list of notable persons via the top-10k scientists in three different fields using google-scholar and report the fraction of men and women, respectively, which have an article on English Wikipedia. They find that the fraction of men with articles is much larger than for women. They also perform a regression analysis for the probability of having an article with independent variables gender and notability (h-index) and interaction among the two and find that the probability for having an article is much larger even when controlling for the h-index.
    • (WHGI, Klein&Konieczny 2015, Klein et al. 2016, Konieczny&Klein 2018) report the correlation coefficient of their internal metric for the content gender gap (% of female items in Wikidata or articles in WIkipedia, e.g. stratified for each country) to external gender indicators such as the United Nations Gender Development index. They show that there are significant correlations. This not only validates the indicator, but also allows one to assess what their indicator is measuring by comparing the differences in the external indicators with high and small correlation coefficient; for example in terms of empowerment by positions of power.
  • Interest comparison. Calculating the coverage of articles that are of interest to men and women, respectively
    • (Menking et al. 2017) generate a list of keywords from magazines that are considered of interest to men and women and report the fraction of missing articles when searching for these keywords on English Wikipedia. They find that topics from women’s interest are less likely to be covered.

The number of articles on men and women in Wikimedia projects undergoing deletion process edit

This is interesting because it sheds light where the gender content gap appears (creation or disappearance of articles on women). The data shows that articles on men and women are deleted with similar probabilities suggesting that the cause of the gap is the creation of articles.

  • Internal comparison
    • (Gray 2019) reports the fraction of biography-articles in English Wikipedia which have been nominated for article-for-deletion (AFD) and survived for men and women, respectively.
    • (Manske 2019) reports for each gender the fraction of biography-articles in English Wikipedia that were actually deleted for men and women, respectively. Follow-up to (Gray 2019).
    • (Adams 2019) reports the fraction of articles (based on a wikiprojects list) in English Wikipedia that were deleted for men and women, respectively.

The number of articles using male or female word form in the page title in Wikimedia projects edit

  • Internal comparison
    • (Zagovora et al. 2017) count the fraction of articles on job professions using the male or female form. Note that this is beyond grammatical; in this example in German Wikipedia, the official job title for nurse is different for women (‘Krankenschwester’) and men (‘Krankenpfleger’). They show that most articles have male titles or the female title is a redirect.

The amount of interest received by articles on men and women in Wikimedia projects edit

Most studies show that interest in articles on women is actually higher than on men, though with some exceptions. Along similar lines, content gaps have been studied more generally by capturing the misalignment between production (supply) and consumption (demand); e.g. finding that topics where demand exceeds supply (e.g. LGBT) and vice versa (e.g. military history) (see, e.g., the paper by Warncke-Wang et al.).

  • Interest comparison
    • (Wagner et al. 2015) generate a list of notable persons from 3 different sources and report the fraction of men and women, respectively, which appeared on the front-page of English Wikipedia. They find no significant differences between men and women.
    • (Wagner et al 2016) consider biographies of men and women in 6 different Wikipedias and measure the search volume in google search in terms of the number of regions and the number of months (above a given threshold). They find that women are of interest to more regions and in more months from google search volume.
    • (Young et al. 2016) generates a list of notable persons consisting of CEOs of women and women and reports the difference in the number of pageviews, edits, and recency more generally. They report that biographies of women were viewed more, have more edits and are edited more recently.
    • (Hinnosaar 2019) conducts a survey with AmazonTurkers asking them to choose biographies they would like to edit and reports the number of pageviews for men and women biographies chosen by the respondents. They find no differences between men and women. They do, however, demonstrate that biographies of men in English Wikipedia receive fewer pageviews on average than articles about women and while 26% of biographies of men receive no pageviews on any given day, only 16% of biographies of women receive no pageviews on any given day.

(WDCM) quantifies usage of items on men and women in different Wikimedia projects via the wbc_entity_usage-table. Results vary strongly across projects.

Extent edit

The length of articles on men and women in Wikimedia projects edit

A difference in the length can then reveal a gap in the extent of articles on men and women as one of the major indicators of an article's quality (see, e.g., the paper by Warncke-Wang et al.). Reports show relatively small differences (usually articles on women are slightly longer), but some works show the opposite effect.

  • Internal comparison
    • (Graells-Garrido et al. 2015) compare the mean length (in number of characters obtained from DBPedia record of the article) for articles on men and women, respectively, for English Wikipedia. They find that articles on women are slightly longer than those on men (6013 vs 5955). This is statistically significant but the effect size is extremely small. Also it does not hold across all sub-categories.
    • (Konieczny&Klein 2018) compare the mean length (in bytes) for articles on men and women, respectively, for 25 different Wikipedias. They find that articles about women are consistently 10% smaller than articles about men.
    • (Gray 2019) report the size in terms of the number of bytes of wikicode for articles on men and women, respectively, in English Wikipedia. They find that articles on women are slightly longer.
  • External comparison
    • (Reagle&Rhue 2011) generate lists of notable persons from 6 different sources and report the difference in length (in terms of the number of words without markup) of articles on men and women, respectively, in English Wikipedia. They find that article length did not differ significantly. In comparison to Encyclopedia Britannica, they were longer and more equal.
    • (Wagner et al. 2015) generate lists of notable persons from 3 different sources and report the difference in length (in the number of words) of articles on men and women. They find that articles on women are slightly longer.
    • (Young et al. 2016) generate a list of notable persons from a list of CEOs and report the difference in length (in the number of words) of articles on men and women. They find that articles on women are longer.

The quality of articles on men and women in Wikimedia projects edit

Articles in Wikipedia are classified into different categories reflecting quality (stub, start, C, B, GA, FA).

  • Internal comparison.
    • (Halfaker 2017) calculates the average article quality using a measure called weighted sum assigning ordinal integer values 0 (stub) - 5 (FA) to the quality categories; comparing ~5000 articles on women scientists to the overall English Wikipedia. They find that the average quality for the women articles was lower before 2014 but actually exceeded the average quality after that.
    • (Vitulli 2018) qualitatively states that many of the articles on women mathematicians are stubs.
  • External comparison
    • (Young et al. 2016) generate a list of notable persons from a list of CEOs and report the difference in quality-class assignments of biographies of articles on men and women. They find no statistically significant differences.

The number of references in articles on men and women in Wikimedia projects edit

  • External comparison
    • (Young et al. 2016) generate a list of notable persons from a list of CEOs and report the difference in the number of references and in the diversity of references (distribution over different classes) in the articles on men and women. They find that articles on women have more references and higher diversity.

The number of images in articles on men and women in Wikimedia projects edit

  • External comparison
    • (Young et al. 2016) generate a list of notable persons from a list of CEOs and report the difference in the number of images in the articles on men and women. They find that articles on women have more references and higher diversity. They find no significant differences.

The structural properties of articles on men and women in the network of links in Wikimedia projects edit

Looking at the connections in a network, one can calculate different measures of the network of links among articles revealing structural differences between articles on men and women.

The number of out-going links edit
  • Internal comparison
    • (Graells-Garrido et al. 2015) compare the mean out-degree (number of outgoing links obtained from DBPedia record of the article) for articles on men and women, respectively, for English Wikipedia. They find that articles on women have slightly fewer outlinks than those on men (39.4 vs 42.1). This is statistically significant but the effect size is extremely small. Also it does not hold across all sub-categories.
Centrality edit

One common way to assess the importance of a node in a network (here: the article in the link-network) is to calculate measures of centrality such as the pagerank. Looking at the most central articles, we can calculate the fraction of articles on women; the argument is that in the absence of structural differences, this number should approximately reflect the overall fraction of articles on women.

  • Internal comparison
    • (Eom et al. 2015) count the fraction of women biographies in the top-100 articles for 24 different Wikipedias. They find that 5.2-10% of the 100 most central articles are on women but that there are strong variations across projects.
    • (Graells-Garrido et al. 2015) count the fraction of women biographies in the top-k articles for English Wikipedia. They find that women articles are strongly underrepresented in the top-k articles (varying k), e.g. for k=1000 women make less than 10% of the articles (compared to 15% overall).
    • (Wagner et al. 2016) count the fraction of women biographies in the top-k articles for English Wikipedia. Compared to the overall fraction of women articles, they find underrepresentation among the most central articles (k=100) and an overrepresentation among less central articles (k=10,000). They also compare the centrality-values of the 30 most central biographies of men and women, respectively, finding that women have smaller values.
  • External comparison
    • (Wagner et al. 2015) generate lists of notable persons from 3 different sources and report the difference in centrality based on the distribution of in-degree and k-coreness. They find that according to these measures, articles on men are more central for most languages.
Gender-assortativity edit

By counting the number of links from gender to gender (w-w, m-m, w-m, m-w), one can assess whether there is an asymmetry in the connectivity between articles on men and women.

  • Internal comparison
    • (Graells-Garrido et al. 2015) calculate a self-focus ratio for English WIkipedia. They find that articles on women are more likely to link to other women.
  • External comparison
    • (Wagner et al. 2015) generate lists of notable persons from 3 different sources and report a measure of the asymmetries in the connectivities between genders. They find that articles with the same gender tend to link to each other and articles about women tend to link more to articles about men than the opposite.

The imbalance in the use of properties used in the description of men and women in Wikidata edit

Instead of counting the difference in the number of items on men and women in Wikidata, one can compare the differences in the description of the items by counting the occurrence of the properties across men and women items.

  • Internal comparison
    • (User:Magnus_Manske 2020) counts the fraction of women items for each of 1323 different properties used in the description of humans in Wikidata (out of the total number of items using the corresponding property). They find that from the 488 properties used in >=1000 items, the majority (>450) has an underrepresentation of women items (Median 0.143, Average 0.22, Standard deviation 0.236).

The number of times articles refer to men and women edit

For any article, one can compare the number of mentions of men and the number of mentions of women.

  • Internal comparison
    • (Sen et al.) calculate the gender-focus of each article by comparing the number of out-going links to articles on women and men, respectively. They show that in the map of Wikipedia articles, there are many more areas with a gender-focus towards men.

The number of redirects on topics of interest to men and women in Wikimedia projects edit

  • Interest comparison
    • (Menking et al. 2017) generate a list of keywords from magazines that are considered of interest to men and women and report the fraction of redirects when searching for these keywords on English Wikipedia. They find that topics from women’s interest are more likely to be redirects.

Framing edit

Association between words/topics and gender edit

Quantify the association between individual words or topics with articles on men and women, respectively.

  • Internal comparison
    • (Bamman&Smith 2014) extract classes of events (or topics) for articles in English Wikipedia using an LDA-model and assign a z-score for the gender-imbalance of each class by counting the fraction of men and women articles associated with each class. They find an overrepresentation of women in classes related to, e.g. fashion or marriage, and an underrepresentation for classes related to army or law.
    • (Graells-Garrido et al. 2015, Wagner et al. 2016) calculate the association between words and gender via the point-wise mutual information for articles on men and women in English Wikipedia. They find that the words most associated with women are about arts, gender, and family; and with men are about sports
    • (Graells-Garrido et al. 2015, Wagner et al. 2016) group words used in the overview of biographies in English Wikipedia into different categories and quantify the representation in men and women biographies.
    • (Graells-Garrido et al. 2015) use categories from the LIWC-dictionary and find that men are more associated with cognitive processes and women with sexuality.
    • (Wagner et al. 2016) use family, gender, relationship, and other; and find that the 3 former are much more prominent for women biographies
    • (Konieczny&Klein 2018) quantify to what degree the appearance of the word ‘celebrity’ in an article’s text is a predictor for whether it is about a woman. Looking at articles from 7 different wikipedias, they show overall the word celebrity is a significant predictor.
    • (Brun et al. 2020) predict the gender of articles from a vector representation of adjectives with 54.6% accuracy (vs 50% baseline). Looking at the most predictive features they find adjectives associated with women (for example “beautiful”, “profit”, or “cross”) and men (for example “offensive”, “hard”, “certain”)
  • External comparison
    • (Wagner et al. 2015) calculate associations between words and gender for articles from a list of notable persons from 3 different sources in English Wikipedia. This is consistent with findings in the internal comparison. They find the most indicative words for men are related to certain domains (sports or professions) and for women are about women (husband, female, woman). Grouping words into categories Family, gender, relationship, and Other and find that 30% of the most indicative words for women fall into the former three, while for men this is only 0-4%.

Subjectivity in the description of men and women biographies edit

  • Internal comparison
    • (Wagner et al. 2016, Brun et al. 2020) compare subjectivity and positivity in adjectives contained in biographies of men and women in English Wikipedia using a subjectivity lexicon.
    • (Wagner et al. 2016) find that men articles contain more abstract adjectives in the description of positive aspects; but effect sizes are small.
    • (Brun et al 2020) find that adjectives for men are weekly subjectives; though effect sizes also seem to be small.

The number of images depicting men and women in articles of job professions in Wikimedia projects edit

  • External comparison
    • (Zagovora et al. 2017) show that for articles of profession names the majority of the images depicts men even if considering the fraction of men/women working those jobs according to official labor statistics

Association bias in embeddings trained on Wikipedia/Wikidata edit

Using machine-learning models such as word2vec, one can generate embeddings of words from Wikipedia. Each word is mapped into a 100 (or so)- dimensional vector space such that semantically- and syntactically-related words are close to each other. These embeddings are known to capture the biases, such as the gender bias, that exist in the original data. Thus, one can use the embeddings to quantify the framing in the content gender gap.

  • Internal comparison
    • (Garg et al. 2018) consider word-embeddings generated from articles in English Wikipedia (among other sources) and quantify the gender association bias by calculating the distance between specific words (jobs or adjectives) and words that represent women (she/female) and men (he/male), respectively. A bias exists if the average distance to men is bigger or smaller than the average distance to women. They validate the bias-measure for job titles showing that it correlates with statistics of %women occupation. They then find strong biases for individual adjectives; for example adjectives describing competence are more associated to men than to women.
    • (Fisher et al.2019) extend the approach pursued in (Garg et al. 2018) to embeddings of items in Wikidata.

Recommendations edit

We coalesce best practices based on the following review and also indicate some overlooked dimensions of the gender gap: Aspects:

  • Multiple metrics should be used when measuring a given content gap as they will highlight different facets and challenges to closing the gap.

Baseline of Comparison:

Missing dimensions:

  • All of the research below analyzes the content gender gap as between men and women. While sparsity of data makes it difficult to include non-binary idenities in this work, these identities should not be left out of the discussion or work to close the gaps -- e.g., campaigns to write biographies.

Research on gender gaps generally focuses on biographies as the most explicit proxy for gender representation in content. There are many articles that are not biographies that have a clear gender dimensions -- e.g., Women's Health -- that should also be tracked and prioritized for improvement in order to reduce the gender gap.

References edit