Wikimedia Diversity Conference 2013/Documentation/toolset for impact

Session: Alyssa Wright and Siko Bouterse // Creating a toolset for impact edit

Abstract edit

Setting up experiments and demonstrating the impact of diversity (and lack thereof) can help us communicate the importance and make progress towards equitable representation in our community. But fear and inertia can stop us from trying new things. Measuring diversity, from current demographics to initiative outcomes, can be in tension with the ethos of privacy found at the heart opensource. How do we track demographics to improve diversity within a context of privacy we want to preserve? How can we work towards diversity in ways that don’t alienate existing community, and yet still demonstrate the impact of our approaches? The goal of this interactive session is to develop a common toolset for diversity experimentation and measurement which balances community concerns with grounded strategies for implementation.

Starting point / Insights edit

Why collect data? Defines the problem and gets everyone to acknowledge that there's a problem. e.g. 9% of editors are women, missing content, etc. This grounds us: makes us realise that theres a problem. That's the top-dow method. The overall trends, etc. Also sets of inof and data form the "ground up". Grassroots programs in countries, for example: This won't show up in the overall numbers. Will take a long time to change the 9% number, but the small programs can help. Data is a tool that talks about facts, but also narrative and can be manipulated for your goal. People are now listening to data, so it's how they will look at funding, validity of initiatives, etc. Important to be part of the conversation from the beginning.

  • What is data's role in diversity?
  • Think about an initiative that you think has been successful.
  • What was the problem?
  • How do you know it was a success? Could you measure it?
  • What was the role of data in this process?

Example: Teahouse

  • Problem: 9% of WP editors are female (survey) and most new editors don't come back after first few edits.
  • Hypothesis: A welcoming support space can help retain more new editors, particularly women.
  • It's a project around the gender gap, but it is also around the wikipedia users and keeping them there. There are a lot of reasons the gender gap exists. We pick one slice which was new editor retirement.
  • Ran for a year, seeing some success. Data collecged from DB queries. New editors were visiting the teahouse. Making more edits, editing for longer, staying around about twice as long.

Is that success? If we want to retain women in particular, that's harder to measure. Involves collecting private and sensitive information. Most people don't even set their gender preferences.

Surveys: are they the only way you're generating your data?

Talking to women, they said they liked the teahouse, and said why, and we incorporated their comments. Does the fact that 29% of the teahouse guests were women, does this mean that we're making a measurable impact? Hard to say.

Question: how do you define new editors? Standard term is "someone with less than 10 edits" Answer: After they made 5 edits, they are sent a teahouse invite.

Role of data in Teahouse, it helpled us in some key ways:

  • Convinced the en-wp community to try something new. If it works, great, if it didn't, we'd go away.
  • Convinced them to continue after the pilot.
  • Encouraged other communities to try something new (e.g. French community)

[Break everybody up into smaller groups (5 persons). Every group thinks about project which was, is or might be successful. Later we got back and talk about that.

BREAKOUT RESULTS

Education Group - Diversity of Content Works with universites - wikipedia Looking to measure if content is addressing gaps (eg philosophy) Solutions content contributors - wikimetrics miniproject tags - content areas qualitative focus group - subject matter experts Challenges no good way to to measure volunteer hours (even definition can be a challenge) which were put into the project and how much effort it really took is it an effective use of volunteer time? would it be more effective if they did 1:1 mentor? surveys can be problematic -- univerities have to learn about wikipedia - is that volunteer time or what they are required as an instructor Question: Is there a change in systemic bias? how would you do that? Answer: There is no good way for computerize measurement of article quality. Created 24-point scale based on different article aspects (neutrality, readability, etc) Takes lots of time, effort. Question: Are you trying to assess the effectivness of volunteer hours? Answer: Asking people to volunteer to support students (professors, librarians etc) - so how much of the volunteer's time did it take to get program A to work vs program B. Which one had the most effective use of the volunteer hours?

Targetted Editathon (Daria) Background Ada editathon > almost exclusively women attended by women due to way it's advertised. Role of data/Solutions Gender breakdown > can see Collecting username and sometimes separate emails When framing editathon -- used data as a narrative -- motivate people to come and media attention Measure content as representative of successful diversity - Female Royal Scientists all have articles Challenges Wikimetrics is hard to understand for the first time consistency of personal info: people forget their usernames / don't have email people don't want to be tracked privacy (though less of an issue for newbies) QUESTION: What is the goal? Is the goal to increase contributor diversity or content diversity? Answer: Maybe they are linked and they are both. Where do we put our energy meauring? Content or contributors? Would be great to track media - one way is to acknowledge where ideas come from / partners in the event

Question: How are you determening what topic areas are underrepresent? Answer: General we look at whats been tagged, you know each article has a,b,c class and then we will look at which topic areas are undercovered / no coverage or major articles in that category.

Question: Is that a manual process? It's working? Answer: It's working but it is a challenge. We are looking at big topics and not on very specific sub groups. We are looking at philosophy e.g..

Anecdotal evidence is both a solution and challenge

Content needs to matter as much as contributors. We need diverse, high quality content to serve our end users. Not just what content is covered but how its included -- add another point of view the coversation around diversity of content is often framed about coverage not within an article eg shoes vs high heels Treating people's experience is a first class citizen in the world of data - case studies?

Television coverage in India led to hundreds of new users. Would be great to track them to see retention.

Internship Program (Sumana) Role of data / Solutions Inspired by data by GNOME summer of code internship At GNOME yearly conference women attendance increased to 15%+ Able to replicate what other community had done / good evaluation techniques influence others Techniques Mentorship Directed outreach towards women People dont need to be programmers -- more open expertise to apply Measurements Applicants: 1/2 internship before 7/ 8 currently People correcting themslves on gender on mailinglist Women felt comfortable identified as women / someone even came out as a woman had a goal from the beginning: 1/3 of the women have stuck around, one have hired Challenges Commuity metrics regarding conributins to OS communities Would like to look more at the individual arc of interns Don't know if planted a seed that others get to harvest - Mozilla, OKFN Personal stories can be problematic, almost illegal -- would need to ask people to opt into it Could start measuring it by all the misgenderings -- on the mailing list Manual counts -- working now for women and south asian -- works now but not sustainable When we do Google Summer of Code we're buying into Google's corporate structure - we give Google intern info They get feedback on survey results

Need to be clear about where that data goes from the beginning

When people know the purpose of this data -- privacy may not be so much of a concern TRUST? Do you trust how the data is going to be used? It's very important to be transparent about how that data will be used. Illegal to collect data without telling the user how it will be used in some countries (e.g. Germany)

Challenges GLAM: difficult to measure impact and cost

Keilana's group What was the problem? Working on editathons on women scientists, especially young women who are training to be scientists and doctors. School (Loyola) is mostly female Tapping women's groups and women's science groups. Hypothesis: providing a regular safe space for women to edit will increase retention. Even edits in workshops alone is considered success. Open-house from 13:00 to 18:00.

How do you know it was a success? Could you measure it? Exit survey: Collecting usernames, then checking articles worked on, motivating factors, what they enjoyed, suggestions for process (time, food, etc) To earn free food, had to include one sentence and one reference. All did more. Done one so far, want to do one every two weeks. Want to measure rate of return - only one so far, but several said they'd like to come back. Will send out emails, advertise on FB. Retention metric here is just in-event work - not expecting many to want to do it regularly on their own.

What was the role of data in this process? What you should do more of and what can be handed over to others? To create something that can continue after a volunteer has moved on.

Overall Challenges edit

  • Measuring courage quality in underrepresented topics (manual, anecdotal)
  • Number of volunteer hours / effort (students and trainers)

Overall Solutions edit

TOOLS Portal: https://meta.wikimedia.org/wiki/Programs:Evaluation_portal library of tools in one spot Wikimetrics Way to look at an entire group - measure what they have been doing (pages added; edits, etc.)

Learning patterns https://meta.wikimedia.org/wiki/Programs:Evaluation_portal/Library#learning-patterns Way to capture key learnings / Anecdotal Evidence "Asking the right questions"

Education program

  • does it help improve diversity of content?
  • focus on particular topics that are not as well covered
  • goal is more diverse content
  • wikimetrics demo, tomorrow (sunday) 2pm during coffee break in upstairs lounge