Research talk:Voice and exit in a voluntary work environment/Team work effect:a viability test

Feedback request edit

@Iadmc: @Stuartyeates: @Jane023: @Rosiestep: @SPatton (WMF): @Rich Farmbrough: @Neotarf: Hi all. You've all communicated with us about this research earlier in 2017 at Research talk:Voice and exit in a voluntary work environment. I'm following up with you as we have made some good progress on a proposed framework we would like to do some experiments with, to assess the viability of the direction we're moving towards. We have documented the suggested framework at Research:Voice_and_exit_in_a_voluntary_work_environment/Team_work_effect:a_viability_test. If you are interested, please read it and share your feedback with us.

To give you a sense in terms of the next steps and timelines: We will be requesting feedback from the people who may be interested in this research in the coming 2 weeks. We will then review the feedback and incorporate it as much as possible in the process. If we can't incorporate, hopefully we will convince you and others why that is the case. Then, we spend another 2 weeks finalizing the decision on which communities may be good candidates for this first version of the experiment. (We don't want to exhaust many communities, since this experiment will serve as a proof of concept that the direction is correct, or not.)

I'm hoping that we arrive at a place where we can run this tets in early April 2018 and have the results before the end of June 2018 (hopefully earlier, but this kind of experiments may need longer times as we rely on people being willing to participate in them.)

Anything you want to share, we're looking forward to hear. --LZia (WMF) (talk) 19:51, 2 February 2018 (UTC)Reply

Feedback edit

Latest comment: 6 years ago2 comments2 people in discussion

The survey 'Doctoral Degree' should be 'Doctoral Degree (Phd)' to make a clear distinction in the case of medical doctors.
I would encourage you to match groups of three, rather than two, since there is much more opportunity for for positive group dynamics in a group of three
The editor community has historically been quite adamant they they should be kept informed. A page outlining the project should be created once all the details are settled and all emails should contain a link to it as 'further information'

cheers Stuartyeates (talk) 22:31, 2 February 2018 (UTC)Reply

Stuartyeates Thanks for the feedback. Responses are below:

Done
There is one main issue I can see with teaming people up in groups of 3 and that is, for this specific iteration, we're talking about a relatively small scale experiment and finding matches in groups of 3 can become really hard. It's more likely to find 2 people interested in a topic than 3. Once we have a clearer baseline for groups of 2 and we have a solid pipeline we can run experiments in, we should definitely consider groups of 3. For example, it would be great to assess the potential trade-off between group dynamics and coordination failure/complexities in groups of 3 versus 2.
This is a hard one. :) There are a few things we absolutely do:

Have a privacy statement for the experiment/emails/surveys always linked from the emails and survey where the user can learn how the data that is going to be collected will be used.
Have a clear description of the project for the more experienced Wikimedia community.

We know that we cannot tell the participants details about the research as them knowing that we are, for example, testing for group effect can have impact on their responses. We basically lose the ability to be able to say what we learned from the study. Suppose this were to run on enwiki, what level of information would you want to see provided to the user knowing this complexity?

Thanks! :) --LZia (WMF) (talk) 19:12, 6 February 2018 (UTC)Reply

teaming up edit

Latest comment: 6 years ago2 comments2 people in discussion

Hey LZia, thanks for putting this up and asking for input. I was pointed here by someone who came across, and it looks like a promising direction. I just was curious about one thing. You seem to assume at some point that the participant is aware that you'll be trying to team them up later. However, it does not become clear to me at which point you inform them of that intention. Or am I overlooking something? Was this supposed to be part of the welcoming email, or is there a message before that? I can imagine it's a tricky balance between using it to encourage them to participate in the survey, transparency and not influencing them too much before the survey. Effeietsanders (talk) 08:41, 3 February 2018 (UTC)Reply

Hi Effeietsanders. You're right. The answer as to where we reveal this information depends on the research questions as you know. For this specific test, one of the things we want to assess is whether newcomers would engage with a survey, and if so, whether they will respond to the questions. We intentionally don't want to create an incentive (or disincentive) for the person to participate in the survey. If we see this direction is a promising direction, then we may want to consider bringing that information to the first email. Does this make sense to you? --LZia (WMF) (talk) 19:12, 6 February 2018 (UTC)Reply

Some feedback edit

Latest comment: 6 years ago4 comments2 people in discussion

I think the survey asked too many/too detailed questions of a new editor. I think some people (women?) may be inclined to ignore it because they don't want to divulge anything about themselves. As for those who complete the survey, it's too bad that only one-third will actually get paired with someone. I wonder what the reaction(s) might be of those who don't get paired? --Rosiestep (talk) 01:59, 4 February 2018 (UTC)Reply

Hi Rosiestep. Good points.

Regarding the length and details of the survey: One of the outcomes of this first test should be for us to understand if newcomers interact with this kind of survey at all. From the research perspective, we have a couple of options to understand the user's confidence: ask them, or observe them in an experimental setting. The latter is very complicated to set up in our environment and it can create all sorts of biases unless very carefully designed. When we go with the former, we then need to ask some questions from the users. Let's come back to this after the test to see if we can learn from it.
Regarding unintended reactions/feelings: You're right. We've tried a few things to alleviate this risk: setting clear expectations that the chances of finding a match is low and still letting them know that their contributions are very valuable independent of whether they're matched or not. We still may create a frustration feeling in some of the participants though. The issue is that we need to balance between the need for getting reliable results (and for that we need proper grouping of individuals as described in Overview) and possibly creating this feeling in the participants. Do you have recommendations for directions we can look into to alleviate this risk further?

--LZia (WMF) (talk) 19:25, 6 February 2018 (UTC)Reply

Hi LZia (WMF), I don't know that I have recommendations at the moment, but will give this some careful thought. --Rosiestep (talk) 23:49, 6 February 2018 (UTC)Reply

Great. Thank you, Rosiestep! :) --LZia (WMF) (talk) 00:00, 7 February 2018 (UTC)Reply

J-Mo's feedback edit

Latest comment: 6 years ago9 comments3 people in discussion

This looks promising! A few questions/comments below, in no particular order.

Do you have a citation to motivate your risk-aversion question? I know I've seen something about gendered stances towards risk-taking, but I don't see anything cited about that here.
Curious why you're asking the question about discretionary time. Neither Collier & Bear (2012) nor Hargattai & Shaw (2014), both of which you cite, found any connection between discritionary time, gender, and likeliness to contribute.
Also curious why you're asking about editing confidence, taste for competitiveness, and preconceptions of Wikipedia. Collier and Bear and Shane-Simpson & Gillespie-Lynch (2017) both find significant gender-mediated differences here. Do you expect a different result?
...which leads, I guess, to a more general question about asking about tech skills, confidence, free time, competition: is the intent behind these questions to replicate or otherwise verify existing findings, or do you believe that these are the most important factors mediating participation in the intervention? If you're looking for factors that mediate participation in the intervention, a question like "why did you join Wikipedia?" might be more useful than, say, the question about free time. According to the 2012 post-registration survey, a lot of new editors join specifically to write a particular article, and many create accounts for non-editing-related reasons. Knowing their specific intent at the time of registration (or whenever you deploy the survey) may be as useful, or more, than some of these other measures in explaining whether they are interested in direct collaboration with another newcomer.
for your Likert-scale questions, you might consider providing a "N/A" or "not sure" option in addition to the 1-5 rating. Especially for questions where the respondent may not feel they have enough background knowledge or direct experience to make a judgement. I'm thinking in particular about the "preexisting expectations of the Wikipedia community" question; a lot of new editors don't even know that Wikipedia is a community, and even if they do they may not feel confident in evaluating its friendliness right off the bat. Giving them a way to opt-out of the question can reduce noise in your data; people won't feel obligated to make a guess. I also imagine that responses to this question, in particular, will be strongly mediated by how many interactions they have had in their brief wiki-career. Asking someone the question before their first edit will probably yield a very different response than asking them after they've been reverted and warned a few times :) So perhaps controlling for # of reverts and talkpage welcome and/or warning templates when analyzing these responses, esp. if you plan to survey editors after they've made a few edits.
Why use free-form field for age, instead of ranges? Do you need that level of granularity? My gut tells me people will be more comfortable providing accurate information if it feels less potentially identifying/invasive.
Perhaps add a "prefer not to say" option to the gender question so that people who would rather not disclose their gender still feel comfortable completing the survey, and are less tempted to lie.
In the survey email, you may want to put the "please take our survey" line right at the top. People read unsolicited robo-emails quickly, and may miss the CTA if it's not prominent. Perhaps also a less dry/impersonal reason for the survey would help motivate responses. I think I understand your theory that leading with "complete this survey and get recommendations!" might bias who responds. However, people may be more likely to complete the survey if you lead with an appeal that is pithy and targeted, e.g. "We want to know how to make Wikipedia better for new editors like you", rather than the current 3 sentences that start "Wikipedia is made out of people like you..." which is long and very general.

Hope this feedback helps! I look forward to seeing the results of this study. Cheers, Jmorgan (WMF) (talk) 22:51, 6 February 2018 (UTC)Reply

Hey J-Mo!

Thank you so much for such terrific feedback -- this is super useful. :) A few points:

1. Please see here for the reference on the risk aversion question. This survey item has been validated experimentally on a representative sample of the German population.

2. About why we need such a long questionnaire at all and whether this will put people off. Agreed, but there are a few advantages of having this questionnaire in place -- at least in this very early phase. First and foremost, those questions will help us identify the reasons why we see the effects we do. Second, our sample size will be relatively small in this pilot phase, so we'd need to check that our randomization procedure into the 3 treatments really delivered comparable samples with respect to those important characteristics. Third, replication is useful: existing studies of how gender differences impact Wikipedia retention usually focus on a limited set of explanatory factors. Our comprehensive approach will help us (i) evaluate the stability of existing results, and (ii) disentangle the effect of factors that could be correlated (e.g., self-confidence and taste for competition). That said, we agree fully with you that there is a cost to this strategy in terms of the attractiveness of the survey. One of the goals of this pilot will be to see whether people are actually willing to engage with it. If and when we deploy this at a larger scale -- and depending on what the pilot delivers -- we might very well drop those questions altogether, as it may increase overall participation, and there should be no correlation between participants' characteristics and the treatment (since the treatment is allocated at random and will be based on a larger sample).

3. We agree with your suggestions on how to reduce noise in the data (e.g., by adding "N/A" to the Likert scales) and diminishing concerns about privacy (e.g., by giving age ranges instead of precise ages). We will implement them.

4. Thank you also for your comments with respect to the e-mail heading. "We want to know how to make Wikipedia better for new editors like you" seems like a much better way to engage people right from the start.

5. We will definitely keep the idea of controlling for # of reverts, talk page welcome, and/or warning templates in mind. Those could be important mediating factors for some of the questions we ask. SalimJah (talk) 13:14, 8 February 2018 (UTC)Reply

Sounds good. Thanks for the reply, SalimJah Jmorgan (WMF) (talk) 17:22, 8 February 2018 (UTC)Reply

@Jmorgan (WMF): Thanks for the feedback. One question we're hoping you and others can help us with is about what kind of task should be suggested to the editor as part of the follow-up email? For now, we have a place-holder which we filled with the lead paragraph of an article that needs expansion (think a stub with very short lead paragraph). Thanks! --LZia (WMF) (talk) 17:34, 8 February 2018 (UTC)Reply

Yeah, this is tough to say @LZia (WMF):. Seems like you want something lightweight that requires little policy or technical expertise, but feels satisfying and meaningful, and which is unlikely to be reverted. Copyedits (fixing typos, correcting awkward grammar) could be good, if you can detect them well and if there are enough of them within a particular topic area. I suppose many (most?) spelling mistakes are corrected automatically by bots, or else are quickly identified by more experienced editors. At least I rarely find spelling mistakes when I'm reading Wikipedia. I don't know of an equivalent to a #newbiebugs tag on English Wikipedia, like you might find in a large FLOSS project. An alternative might be link recommendation: you already have done work in this area; and the rules around when/where to add links are reasonably intuitive; and it's pretty easy to do with VE. Your suggestion about expanding lead sections sounds promising too, especially on low-traffic stubs where even a sentence or two can make a clear improvement. You don't necessarily need to do background research to write a lead sentence: just rephrase/summarize the content that already appears below. Still there are still lots of instructions and considerations to creating a 'good' lead, so you may want to consider writing a more concise, accessible set of suggestions to guide people in the task. Jmorgan (WMF) (talk) 20:06, 8 February 2018 (UTC)Reply

I'm assuming that guidelines for wikilinks and lead sections are relatively consistent across different Wikipedia languages. Jmorgan (WMF) (talk) 20:10, 8 February 2018 (UTC)Reply

The results of the Growth team experiments with GettingStarted could help you brainstorm:

Jmorgan (WMF) (talk) 20:16, 8 February 2018 (UTC)Reply

@Jmorgan (WMF): got you. we shall look into these. I like the idea of link recommendation and typo-fixing, for the latter it seems we should do some more work (I somehow assumed there are templates we can easily use for that.) Thanks! --LZia (WMF) (talk) 22:52, 8 February 2018 (UTC)Reply

Awesome! By the way, I ran across this research a few weeks ago. Looks relevant to the current study (at least as 'related work'). Paper is linked at the bottom of the blog post. Cheers, Jmorgan (WMF) (talk) 01:36, 9 February 2018 (UTC)Reply

Add topic