Research:Wikidata Gender Diversity

00:11, 27 September 2022 (UTC)
Marta Fioravanti
Beatrice Melis
Duration:  2022-09 – 2023-08
Wikidata, gender, modeling, data, community

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

Wikidata Gender Diversity (WiGeDi) will study gender diversity in Wikidata, focusing in particular on marginalized gender identities. It will examine how the current Wikidata ontology model represents gender, and the extent to which this representation is fair and inclusive. It will analyse the data stored in the knowledge base to gather insights and identify possible gaps. Finally, it will look at how the community has handled the move towards the inclusion of a wider spectrum of gender identities. A web application will be created to share the results publicly in a user-friendly way.


The Wikidata Gender Diversity (WiGeDi) project aims to investigate the issue of gender diversity in the Wikidata knowledge base, focusing in particular on the marginalized identities of trans, non-binary, and gender non-conforming people. All previous studies about this subject in Wikimedia projects have focused on the gender gap, defined as the gap in the representation of women versus that of men. Some of these studies (e.g. the ones by Konieczny and Klein) have acknowledged the existence of trans and non-binary people, but no research has looked specifically at how marginalized gender identities are represented, or how accurate and complete the current representation is.

Our initial study about this subject (Metilli D. & Paolini C., Non-binary gender representation in Wikidata, to be published in Ethics in Linked Data, Litwin Books, 2022; publication draft:; presentation at WikidataCon: shows that gender modeling in Wikidata has a very complex history, from which important lessons can be learned about the representation of marginalized gender identities has been approached by the community, and which steps remain to be taken to make Wikidata a more inclusive project.

The WiGeDi project aims to center marginalized gender identities by performing a broad analysis of gender diversity in Wikidata, from three different — and complementary — perspectives:

  • the modeling question, looking at how the Wikidata ontology has evolved to support a more inclusive representation of gender, e.g., by updating the properties that directly or indirectly express gender; we aim to analyze the Wikidata ontology to identify representational issues and potential areas of improvement;
  • the data question, computing statistics about non-binary gender representation in the knowledge base, and analyzing its effectiveness and accuracy from a quantitative point of view;
  • the community question, looking at how the Wikidata community has handled the evolution towards a more inclusive gender representation, looking in particular at user discussions about the topic.

Our project aims to answer all these questions by publishing a web application containing a real-time dashboard about gender diversity in Wikidata, an annotated timeline of gender modeling since the launch of Wikidata in 2012, and a browsable repository of gender-related user discussions (see section Dissemination).


Describe in this section the methods you'll be using to conduct your research. If the project involves recruiting Wikimedia/Wikipedia editors for a survey or interview, please describe the suggested recruitment method and the size of the sample. Please include links to consent forms, survey/interview questions and user-interface mock-ups.


Please provide in this section a short timeline with the main milestones and deliverables (if any) for this project.

Policy, Ethics and Human Subjects ResearchEdit

It's very important that researchers do not disrupt Wikipedians' work. Please add to this section any consideration relevant to ethical implications of your project or references to Wikimedia policies, if applicable. If your study has been approved by an ethical committee or an institutional review board (IRB), please quote the corresponding reference and date of approval.


Describe the results and their implications here. We encourage you to share preliminary data. Don't forget to make status=complete above when you are done.



  • Metilli D. & Paolini C. (in press) "Non-binary gender representation in Wikidata". In: Provo A., Burlingame, K. & Watson, B.M. Ethics in Linked Data, Litwin Books.
  • Metilli D. & Paolini C. (2021) "Non-binary gender identities in Wikidata". Presentation at WikidataCon 2021.

External linksEdit