Research:Surveys on the gender of editors

Tracked in Phabricator:
Task T227793

Created

23:04, 25 June 2019 (UTC)

Contact

Isaac Johnson

Wikimedia Foundation

Collaborators

Leila Zia

Wikimedia Foundation

Duration: 2019-January – ??

Research:Projects

This page documents a completed research project.

The goal of this work is to explore the QuickSurveys tool as an approach to surveying editors, which would allow for ongoing sampling, more control over which types of editors (e.g., # of edits, age of account) see a survey, greater privacy and lower barriers to responding. We are specifically focusing on the gender of editors, as a core concern related to building a more diverse community.

Past Efforts

There have been a variety of past efforts to gather data on editor background, motivations, task, etc. For the most recent data, see Community Engagement Insights report (2018). Other resources for past research include this list on meta, strategy document from 2016, and summary of the gender gap. This work builds upon three past efforts: the 2013 micro-surveys that also presented a very low barrier to participation, research by Hill and Shaw [1] that sought to re-weight the results of the UNU-MERIT survey, and recent reader surveys that employed various techniques to account for selection bias. For some insights into microsurveys of editors in general, see task T89970.

Survey Design

There are two options for surveying with QuickSurveys: internal (single-question survey purely on wiki) and external (provider respondent with link to site like Google Forms with survey). For this gender survey, we are avoiding the external option for two reasons:

It has increased privacy concerns due to the survey being hosted off-wiki (often a necessary trade-off, but here we have a single question and a sensitive one for certain populations so we wish to avoid this)
It creates an additional burden to answering the survey. The respondent has to agree to take a survey and then load the survey in a new tab before they can respond. We have concerns that these additional clicks could substantially (and perhaps differentially) limit participation given the extra effort to respond. This would reduce our sample size as well as potentially introduce bias.

We are therefore going with the internal survey option, which best preserves privacy and has a much lower burden to answering. Per the requirements described below, the functionality for open-text responses was added to Internal QuickSurveys for this survey.

Question Design

The current question is below -- the first three options are pre-set and the last option is an open-text field where the respondent can enter any text. Users can also just ignore the survey as it does not prevent them from reading articles.

What is your gender?
[] Man
[] Woman
[] Prefer not to say
[] Other (please describe)...

The design of this question is based on some background reading and internal feedback regarding surveying individuals about gender. While not perfect, we consider it reasonable given some of our constraints:

We use Man/Woman instead of Male/Female as we are asking about gender and not sex.
We include "Prefer not to say" as an easy means of dismissing the question if an individual does not feel comfortable sharing this information.
It is necessary to include the option to self-identify on surveys of gender. The current guidance seems to be that "In another way" or "Prefer to self-describe" are good ways to provide an open-text option -- "Other" by itself, and especially without the option to write-in a responses, can sometimes be perceived as dismissive. We ended up choosing "Other (please describe)...", as it was felt to be most clear in the context of the QuickSurveys tool.
There are only a limited number of pre-set responses that we can have in a given survey and adding additional options like "Non-binary / Third-gender" would complicate translation and our ability to cross-compare the results from different languages.
From a pilot survey of readers on English Wikipedia, we found that 3% chose not to say and 1% provided open-ended responses to a question about gender.

A few outside resources:

HCI Guidelines for Gender Equity and Inclusivity: https://www.morgan-klaus.com/sigchi-gender-guidelines
Equality and Human Rights Commission: https://www.equalityhumanrights.com/sites/default/files/rr75_final.pdf
UX Collective: https://uxdesign.cc/designing-forms-for-gender-diversity-and-inclusion-d8194cf1f51
Gender Identity in U.S. Surveillance Group (excellent resource despite the somewhat alarming name): https://williamsinstitute.law.ucla.edu/wp-content/uploads/geniuss-report-sep-2014.pdf

Sampling

The plan is that the initial surveys will sample randomly from all editors in a given language community regardless of edit count or registration date. This will provide a baseline for the gender balance in a given language community. Users who log in more frequently will be more likely to see the survey and respond, but stratifying the results by edit count in the analysis and running the survey for at least one week will help to guard against this bias.

Following the initial survey, follow-up surveys can be run that are aimed only at new editors -- e.g., editors who have registered their account in the previous month. This allows for continuous surveying of the community (to determine if the gender balance is changing) without showing the same survey to users repeatedly or storing who has already seen a given survey.

Privacy

Along with standard page view logs, QuickSurveys logs information when a survey is seen and when a survey is responded to. This most pertinent information is edit bucket (i.e. "0 edits", "1-4 edits", "5-99 edits", "100-999 edits", "1000+ edits"), but no information is logged that links to a user's account and no efforts will be made to link survey data to user accounts. This additional information can be used to debias survey results and provide more nuanced results (e.g., editor gender stratified by country or the edit buckets).

With respect to sampling, criteria such as edit count and age of account can be set as filters regarding which accounts are eligible to see a survey, but the actual sampling of users who meet the criteria is based on a browser token and is not related to the user's account. No information about what user accounts are included or not included is stored, so if a user logs in on multiple browsers, they will be re-sampled independently on each.

The privacy policy for the first round of surveys can be seen here: https://foundation.wikimedia.org/wiki/2019_Editor_Gender_Survey_Privacy_Statement

Results

Main article: Research:Surveys_on_the_gender_of_editors/Report