Notes from the Quarterly Review meeting with the Wikimedia Foundation's Discovery team, January 21, 2016, 8:45 - 9:30 AM PT.

Please keep in mind that these minutes are mostly a rough paraphrase of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Discovery

slide 1

Slide 1 (KPIs)

KPIS

User satisfaction - doubled in Q2 (15-28%)
Doubling result of implementation of stats change
Zero results rate - (33-26%) -normal variance
Began collecting last fiscal year - still collecting comparative data

slide 2

Slide 2 (Search language detection)

Ran an a/b test to to adjust search queries based on detected language
Quantitate analysis showed that people search in non english languages on english wikipedia
Surveys showed the users actively do this
Ran A/B test but the impact was not statistically significant
Language detecting library was not good at detecting short queries
Lila: Are we looking at other language detection options?
Yes, we have a couple of leads on new libraries that could be used
Lila: We could prompt the user if we're not sure of the language
Lila: How common is this problem?
On enwiki, about 7% of queries that get no results are not in English
Generally, each change we make seem to have a large impact for a small number of people
Making the experience better is a LOT of small changes
Lila: ?
We could index all wikis in one place, but would need more hardware
Wes: How many A/B tests were run?
Ran one test for this specific goal
Since the A/B tests were disappointed the team re-aligned to the Completion suggester
Dropped zero results by 10%
Feedback has been very positive
This was an absolute minimum product, so can't become default yet
Wes: Will completion suggester become default this quarter?
Yes. Incremental rollout.
Should improve zero results; earlier tests showed maybe 10%
Hopefully would improve satisfaction, but we don't have qualitative evidence on that yet
Based on user feedback, we are optimistic

slide 3

Slide 3 (Portal)

Measure the usage of the portal, performs ab test, decrease time users spend searching on page
Portal has been migrated to git/gerrit
Used to be a static page with some css and js on meta wiki and a script named extract2.php that hit the api and pushed out the page
Lila: Is this typical for other MediaWiki pages?
Yes, version control is not common for pages like this
We engaged with primary maintainer (mxn) who makes 90%+ of edits to the portal
He was very supportive of the migration
Other (mostly uninvolved) people objected; took a couple months to work through that
We did launch the first A/B test right before the end of the quarter, but too late to hit this goal
Early results are showing a 5% increase in conversion with no loss of interaction
Lila: What is 5% in actual pages?
- ___ million per week
Wes: Be sure to specific bot/non-bot traffic
In this case, it's virtual all non-bot
After migrating to gerrit, we were able to add instrumentation

slide 4

Slide 4 (improve satisfaction metric)

Iterate on user satisfaction by running a QuickSurvey
Lila: How did we create the User Satisfaction metric? Is it industry standard?
Yes, Google uses something similar
Planned to run a survey on the result page they ended up on, asking if they were satisfied, and compare with our quantitative satisfaction data
Quick survey was delayed, and then the deployment freeze and fundraiser cut into the available time to do a survey like this
Lila: If it is an industry standard then lets make sure to benchmark against it
Wes: Quick survey is being used by multiple teams
We had a couple extra requirements. It just wasn't ready
Lila: When will we know?
After we have several weeks of data, so might not have answers until Q4
Lila: On all these slides, we're not showing impact as clearly as we should. To align with FDC format, the template should bring out impact more clearly
Wes: Trevor has started a page to explore different formats to improve the template
Tomasz: Distinguish between outputs and outcomes
Greg (via Etherpad): That sounds good - let's connect offline to discuss possible revisions before next QRs

slide 5

Slide 5 (evalutate maps/wdqs)

We had done beta-level deployments, so wanted to review user feedback
Maps was rolled out to some wikivoyage; got good feedback
We made some improvements based on feedback
Both WDQS and maps: Request rate spiked on launch and now has normalized to a normal level that is growing organically
We have <1 engineer on WDQS and <1 on maps
Android app "nearby" was a flat list; now is a map using our tile server
Lila: No map for San Diego. Are we going to have a bot add maps?
Any place that already had a map got the new one automatically. Adding new maps is a manual process
We have prototyped adding maps via Visual Editor, interactively/visually (Thanks to Ed Sanders, who did prototyping before a )
Lila: How are we measuring success/impact?
We measure number of tiles served, and number of users who see tiles (pageviews on wikivoyage, android)
Eventually we want to measure discovery--getting from a map to somewhere else. Technically challenges to that.
Wes: What about page performance?

Lila: It needs to be an automatic way of cross-connecting projects
WDQS: Some of the most prominent consumers are planning to switch to our service.
We plan to continue to suppor
We will upgrade to blazegraph 2.0 when released (this quarter)
Add geospatial searches
Lila: Who are the main consumers of WDQS?
Mostly previous users of Magnus's WQS which was on labs
Our initial goal was to move these users, and they have
We are enabling people to build more tools on top of this
WDQS has beta-level support, which is a step up from labs
Lila: Are we looking at adding write features
Discovery is not; Wikidata folks are looking into it
Lila: Will we get wikidata folks into this process?
Wes: That is the intent

slide 6

Slide 6 (referrer metrics)

We have a dashboard for referrer traffic (Google, Bing, etc)
Lila: I was told that 2/3 was from google, which is different from this graph
Mental note for Greg: Bring attn to that stat in public display
Lila: Interesting trends, like google holiday dip
Sylvia: Who are the "non-search"?
Includes clicks on internal links; any direct link from anywhere on the web
Lila
Can we separate internal page views?
Getting to wikipedia direct vs article queries (clicks from within vs. from without)
Wes: ? (missed what he said that Sylvia +1'd)
We got a lot done, but didn't quite complete the goal as written
Lila: Want to follow up with more questions later
Tomasz: Shifting our analyst was the right choice, but was painful. Threw off our planning. Consider as a systemic issue moving forward.
Wes: Yes, we're aware and working on it
Kudos to Dario for writing up a rationale, and sharing appropriately. Process went well.
Lila: Not just looking at wikipedia; looking at all projects
Non-wp projects may be hard to find. Maybe that explains the difference above related to "2/3 of traffic"
Lila: Can we surface commons, sources, etc. at a much higher rate? It's coming up in the strategy consultation too
Yup
Lila: Goal is not to cannibalize any project

slide 7

Slide 7 (portal migration)

Mostly delayed due to un-involved users
mxn corrected this by saying that he was involved
2 factors exacerbated the problem
No CL support for Discovery
CL hired
not enough product support
PM hired
In the end, nobody objected. But it took 2+ months.
Met with wikidata folks in Germany
Wikidata folks are pushing ahead, with our support
Trip was incredibly worthwhile

slide 8

Slide 8 (improving analysis)

We now have a standard set of metrics and reporting that PM's can use thanks to our Data Analysts work

Wes: Editing is also using Discovery's A/B testing processes?
They are at least interested
Hired our PM first working day of Q3
Dedicated Discovery CL starts next week

slide 9

Slide 9

slide 10

Slide 10

Completion suggester is still fundamentally a prefix search using elastic, but compensates for small errors like two typos

Appendix (portal screenshots)

Shows an A/B/C test we ran on the portal
"This draws people to the search box, but the search feature isn't great"
"This improves the search process"
We plan to push the improved search dropdown to production
Wes: When presenting results, distinguish between desktop and mobile
Apps team did qualitative surveys, and people liked images
Lila: We should create lists of missing images that community could help fill in
Lila: Are the descriptions coming from wikidata? (top 100, top 1000)
Yes, good idea.
Quim: Interesting idea. We'll talk.
Lila: Rollout plans?
We have plans for a lot of potential changes
By the end of this quarter, we would like to push this for everyone. Will bring desktop experience closer to mobile. 5% improvement is substantial (12% of people who left the page without doing anything; millions of people)
Lila: Could we prompt users if they search in the wrong language?
We are looking to improve the existing language picker
Lila: Does anyone click on the language links around the globe?
About 10%
Placement of languages around the globe is known not to be great (mxn doesn't like it)
There is a reason they are arranged as they are, but could be better
Lila: Need to dig deeper into SEO/referrals, drive more, awareness
Wes: we are working with partnerships to understand the full funnel
Lila: DIscovery's role is not clear. Make mission clear; improve messaging
Lila: Recommend sending a monthly update to wikimedia-l (or wherever is appropriate)

Appendix slides

Wikimedia monthly activities meetings/Quarterly reviews/Discovery, January 2016

Contents