Wikimedia monthly activities meetings/Quarterly reviews/Growth/September 2014

The following are notes from the Quarterly Review meeting with the Wikimedia Foundation's Growth team, September 17, 2014, 10AM - 11:30AM PDT.

Attending

Present (in the office): Steven Walling, Tilman Bayer (taking minutes), Dario Taraborelli, Toby Negrin, Howie Fung, Lila Tretikov, Erik Moeller, Kaity Hammerstein, Terry Chay; participating remotely: Aaron Halfaker, Rob Moen, Sam Smith, Matthew Flaschen

Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Agenda

slide 2

[slide 2]

Goals and Annual Plan targets
Our roadmap for the quarter
What we shipped
What's next

Steven: Welcome
will cover last few months (still one more sprint left in this q)
will look back at goals (as set out in 14/15 Engineering roadmap)

slide 3

[slide 3]

(introduces remote team members)
Moiz can't attend today, but Kaity is here from UX team

Annual Plan Targets

Three targets: ambitious/mid/low

slide 4

[slide 4]

Acquisition: increase new registrations by 23% or
Activations: increase rate at which new reg. become active editors by 23% or
Retention: increase survival rate by 87% (month to month)
or some combination of the above (as this stalls the editor decline)

Last Quarter

slide 5

[slide 5]

Key users for this q, related to these targets
1. anons (who may have edited before, ask them to sign up) ~350k/month
2. 0 edits: ~38k/month
3. made first edits, get into active editor status

on enwiki: 38k 1+ editors/month, i.e. about 25% of the 150k activated/month

slide 6

[slide 6]

Focused first on anon editors, improve signup, ...
experiment, see what works
then for this q: task suggestions, ...

slide 7

[slide 7]

Stretch goal: collab with Mobile Web and Apps team on mobile version
Followup goal (next Q): Personalized task suggestions and notifications for new editors as default (on Wikipedias where Onboarding is present: 30 Wikipedias = 81% of new registrations)
Followup stretch goal: task suggestions and notifications for very active (100+) editors

slide 8

[slide 8]

1. Completed A/B testing of version 2 of anon editors acquisitions
2. Prototypes, usability testing (finished design) and first A/B test of personalized recommendation

1. Anonymous editor acquisition

slide 9

[slide 9]

Convince anon users that registration is beneficial for them and for WP
in first attempt, managed to increase registration a lot, but there's a productivity hit

slide 10

[slide 10]

second attempt: CTA in pre-edit workflow, take measures to reduce negative hit on productivity immediately afterwards

slide 11

[slide 11]

v1: sign up button vs. no thanks (text).

slide 12

[slide 12]

v2: ... easy to dismiss
in v1 it might not have been clear that the option to edit without signup still exists
Toby: CTA does not explain why one would want to register
Steven: in post-edit experiment, we did that (listing reasons)
Assumption: People are already used to registering from other websites (say Twitter )
v2a: sign up button and continue editing button
Aaron: it [listing reasons] might still work, we just didn't test it
Dario: benefits still listed on signup page?
Steven: no, just signaling social proof ;)

slide 13

[slide 13]

Still got significant increase in proportion of users who click edit and register Research:Asking anonymous editors to register/Study 2 - Hyp 2 reduced registration rate

slide 14

[slide 14]

"increase reg by 23%" target was reached, but did not avoid productivity hit

slide 15

[slide 15]

still caused 25%[?] decrease in editor productivity (as measured in 1+ unreverted edits within 48h)
Theory was confirmed that making the continue editing option clear reduced negative impact - but not bring it to 0 (-8%)

slide 16

[slide 16]

Lila: but it appears we can optimize this
Steven: yes, there might be a workflow
Howie (re slide 15): IP != unique users, right?
Steven: right, it's about "tokened" users (within one edit workflow)
Lila: and we can't tell whether IP user has edited before?
Steven: we can *sort of*
Aaron: we had two weeks of token tracking before the experiment. We used that to filter our logged out editors who have an account
Erik: CTA would come up only once? or would I get it repeatedly as power user?
Steven: yes, only once - we set a cookie
Lila: this is a narrow slice of the whole process, what about rest?
Steven: should look at the long term productivity gain (say, 3 month)
it's a good hunch, but untested
Matthew: recall the necromancy experiment (https://meta.wikimedia.org/wiki/Research:Necromancy )
Steven: yes, way back before we (Growth/E3) had a dev team, reached out to some inactive users by (plaintext) email - hacky
Lila: doesn't sound too bad
Steven: sent it from our @wikimedia addresses. Responses we got pointed to the kind of retention issues for very active users that are hard to tackle from software perspective (community conflict, less time because of new job, etc.)
Also, no control group
Dario: also, need confirmed email address for this - not all users have this
Steven: the other thing to be aware of re activation experiment: what would long term impact be? already saw diminishing returns in second experiment
the longterm anon editors (who are receptive to this) might be a limited group
both tests ran for a week
Toby: Wonder about focus on registration - what about retention afterwards?
Steven: yes, looking at that too, just not in these experiments
We could also show stuff in the user interface that one needs to register for (e.g. watchlist on mobile)
Lila: you improved attrition by factor of 3, so might be possible to reduce it to, say, 1% or whatever we consider acceptable
Howie: I miss 1+ and 5+ numbers in the benefits measurement?
Steven: Aaron tried to convert into absolute numbers - we got about 4700 new registrations for that week, 6500 for...
Aaron: https://meta.wikimedia.org/wiki/Research:Asking_anonymous_editors_to_register/Study_2#Hyp.C2.A02_reduced_registration_rate
Howie: so it's about the difference between v1 and v2? yes
Toby: seems that retention, rather than reluctance to register, is the main narrative
Steven: on desktop, we do have a decrease in registrations (as %age of total views/unique visitors...)

slide 17

[slide 17]

slide 18

[slide 18]

V3: decide after saving edit, then saving as either anon or under the (new) account
Also, offer to log in for those who already have account
Lila: that would be so helpful to me - I often forget to log in ;)
Steven: already worked on cookie...
Howie, Kaity: there should be an explicit login path
Lila: already did post-edit CTA?
Steven: yes - no negative effect on productivity, but no huge benefits either
Lila: would really like to test (version 3)
Steven: for "hardcore" anon users (who don't want to register/become part of community at all), one CTA ok[?]
Matt: one problem: uses API call, which responds with captcha
Erik: other problem: insecure connections
did we ever try sending to HTTPS immediately on edit[?]
talk with Tim and Mark
would "log in and save" button go to separate page? no, just different modal

Steven: other option: do with mobile web and apps. they dropped it from first release, but...
Dario: tests have 2 assumptions: have a large pool of anons, and it doesn't stay the same over time
not confirmed
Steven: yes, especially #2
Dario: so in a year or so, positive effect might go away
Erik: Wikia has this too
horizon - might be several months of work
Matt: might get an implementation in less time, but usability test, polish, etc, would add time

2. Task Recommendations

slide 19

[slide 19]

Steven:
Goal: increase editor activation rates by providing personalized suggestions of what to do next.

slide 20

[slide 20]

did a quick sprint on this in July

slide 21

[slide 21]

Matt built API for recommendations
did some usability testing (at Wikimania and afterwards), qualitative tests
prepared first A/B tests
Mobile also does something like that for Wikidata (building on Magnus Manske's game)
but scalability problems
recommendations infrastructure based on CirrusSearch
Lila: what's the algorithm - do we use some (third-party) open source software, or build it ourself?
(Steven:) kind of both
Matt: takes last edits, uses text similarity with that
Lila: are we training it in any way, e.g. if user clicks "no thanks"?
Matt: no
Toby: it's ranking model, so search is perfect for it
could adjust ranking per user
Steven: could look at categories, co-editing [with other users]...
Optimal version is probably a combination of all these

slide 22

[slide 22]

to test effectiveness, did some handcoding of recommendations, compared with article that new editors came from
first five recommendations were very similar
Aaron: found that first 15 recommendation should be safe to serve

slide 23

[slide 23]

Steven:
usability testing:
did some at Wikimania (with Abby's help) - unfortunately lost videos due to tech problems
then I did some at usertesting.com

Post-edit recommendations usability test clip

"your edit was saved. edit a similar article"
performance on beta labs was bad, this had a major impact
user was asked to describe what they would do next
this was a typical response: first thing they notice is the recommendation (+edit confirmation)
user's instinct is to click on the suggested edit link (curiosity)
Lila: keep them engaged, give feedback
does the algorithm take into account how long ago page was a edited? no
so users are most likely to drop off *after* going through to the edit link?
Steven: yes, but didn't test that in this test
Lila: as new editor, I would need to be guided
Steven: would have had Guided Tour even before that
main blocker is figuring out (programmatically) what the [actionable] issues are with a page

Recommendations flyout usability test clip

video: (user clicks through recommendations)
Lila: did we explain why these are recommended?
Steven: "based on your last edit"
Toby: so why was "beekeeping" in there? [which the user wondered about]
Steven: only had ~50 Wikipedia pages imported to beta labs, and beekeeping had lots of European history, like her last edit ;)
Dario: is there more filtering? so we have text similarity, then filter by maintenance templates. what else?
Steven: part of the theory is personalization - user more interested in topic than in task
in GettingStarted, focus was on ensuring task is doable by new user
Toby: we can do a lot of testing with this - Google iterates a zillion time a day ;)
Howie: yes, this is to establish a baseline first
Steven: for completely new users, this tells them to make some edits first (so as to have data to base recommendations on)
Dario: are we tagging recommended edits?
Steven: no

slide 24

[slide 24]

right now testing 4 variants on 12 Wikipedias
Lila: the Germans already complained about it ;)
Steven: yes, that was a bug
Erik: the oldest and most experienced Wikipedians got some of them ;)
Steven: it's really meant for new users
Matt: should take into account more than just last edit - this was just first implementation
Dario: could also evaluate type of edit (new page created, typo fixed, ..)
Steven: It's still possible that this post-edit recommendation really annoys some cohort
Howie: doing funnel analysis ...? yes, later
Steven: should be careful with revision tags, because they might have an effect in themselves (by influencing feedback from other users)
I regret that we introduced tags for GS without A/B testing (the tags)
Erik: tags can be good and bad, but for e.g. VisualEditor they are very helpful for uncovering problems

slide 25

[slide 25]

Steven: What's next (slide)
...

slide 26

[slide 26]

slide 27

[slide 27]

slide 28

[slide 28]

Three things to consider [after Steven's departure]:
Products, Practices, People
GettingStarted (GS) and GuidedTours(GT) can't be allowed to rot, regardless of what happens to team
Erik: GS recommendations still based on old system, basically relying on maintenance templates
Steven: they impact different parts of UX
not high maintenance to retain ability to suggest copyeditable pages
Erik: although it needs to be scaled to other languages
Steven: we should split recommendations stuff into separate extension
Toby: who would own these extensions?
Steven: Rob, Sam, Matt know the code best
no opinion on who from Features team
Matt: realistically, won't have as much time to work on this after team switch
Steven: prioritize cleanup work now over new tests
Lila: discussion is bigger than this, of course
Erik: Maryana says Elasticseach-based recommendations are helpful for their team; she's more skeptical about Suggestbot-type stuff
Toby: should be search-based; I know the search team would love to work on this
Steven: yes, they have been very supportive
Erik: think about how the two recommendation systems can be more integrated, e.g. categories
Aaron: Nik (from search team) says it should not be complicated to include categories
Lila: important to determine which team will own which feature