Wikimedia monthly activities meetings/Quarterly reviews/Research, Design Research, Analytics, and Performance, October 2016

Notes from the Quarterly Review meeting with the Wikimedia Foundation's Technology I: Research, Design Research, Analytics Engineering, Performance teams, October 20, 10:00 - 11:30 AM PT.

Please keep in mind that these minutes are mostly a rough paraphrase of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Present (in the office): Dario Taraborelli, Abbey Ripstra, Rob Lanphier, Michelle Paulson, Megan Neisler (taking minutes), Tilman Bayer; participating remotely: Maggie, Aaron H., Abbey, Dan, Jaime V., Joady, Jonathan, Joseph, Katherine, Luca (Elukey), Mark, Nathaniel Nuria, Ori, Wes, Andrew Otto, Darian Patrick, Marcel Ruiz Forns

Research and Data

Dario: Team has stayed the same in size the past quarter but we increased the number of collaborators. We now have more than twice the coverage.

Objective: Broaden ORES usage

Dario: ORES reached production as a service for Q4. First and primary goal was to add extension. Rolled out to 8 wikis (Reads slide). Prepared and released article score dataset. Purpose is to make sure we have data for the research committee. Believe this is going to have large impact on research. Also made data available for use in ElasticSearch to better help filter and search articles. We completed a number of community announcements on availability of this feature. Acknowledgements to Amir and Sabyasachi.

Successes and Misses: ORES

Dario: Hosted session as Product and Tech onsite on using revision scoring in production and sharing management of this project. Had some very useful early conversations and follow-up discussions are continuing. A number of new tools are using ORES. Adoption in community is growing. Started exploration into new signal sources, which we will continue next quarter.

Successes and Misses: Discussion modeling

Dario: Started collaboration with jigsaw and the CE team on discussion modeling. This is the big focus for the next fiscal year. The first major outputs were seen this quarter. The team designed and evaluated agressiveness models on articles talk comments.
Dario: Two presentations and article in Wired. Person attack data vis project will help push project even further.

Successes and Misses: Open Notebooks infrastructure

Dario: Collaboration with ops - more of an infrastructure goal. Large part was deferred to Q3. Plan on doing this using open source software that support python and r-based code. Will be shareable by researchers. Realized in the ops and research side that we weren't read for launch. The work to get new hardware in place and end user documentation will take more time. Expecting to launch in Q3.

Successes and Misses: Open Notebooks infrastructure (continued)

Dario: Growing a population of students to be early adopters of system when launched.

Objective: Productize the article recommendation API

Dario: Worked on in the last couple of months. The goal of production level API by end of Quarter was missed. Realized the set of requirements needs to be expanded. Ori provided substantial support. You can still use application.

Successes and Misses: Productize the article recommendation API

Objective: Research discoverability

Dario: This was a stretch goal. We wanted to spend time to create landing page for Research on how to collaborate with the team. This was missed - My time was focused on Wikicite and incorporation of design researchers into product teams

Wikicite, outreach and acknowledgements

Dario: Wikicite in May. We published a full report that is now available on commons. Gave a number of presentations at conferences.

Core workflows and metrics

Dario: Completed four new MOUs for research collaborators. Continuing monthly cadence of research showcases after being on hold.
Katherine: Looking forward to reading the Wikicite
Wes: When is the next Wikicite?
Dario: Originally planning in January but decided to push to Q4. Looking to do as part of Hackathon in Vienna
Katherine: Productization of article recommendation API. What is the current discovery pathway to find and utilize the recommendation service into broader products?
Dario: I can give broader update. We are doing work that is meant to solidify API. There is already integration with content translation. There are other contributor recommendations that can be integrated. Determining types of recommendations we'd like to support (such as, Stubs? expansion and opportunities to integrate with product interface design. Have been primarily collaborating with Editing team on this, but we are also looking at collaborating with Reading team on reader recommendations.
Katherine: Want to call out detox work with Jigsaw. Meant a lot to the CE team. Excited to see where this goes to address harassment.
Dario: Last quarter was focused on understanding sources and this quarter will look into the impact. Consider that it took a year researching impact of quality control on retention (the "Decline" paper). So researching the impact of toxicity will take time too. There is little literature on retention of volunteers.
Katherine: Will be useful to have that data.

Design Research

Abbey: Team went down by 1. 3 full-time people on design reserach team. Daisy moved to Editing and Sherah on reading team. Jonathan and I are doing research outside of product timelines. Samantha is remaining on core design research team.

Objective: Evaluative Design Research

Abbey: First goal was about evaluative design research. (Reads Slide)

Objective: Evaluative Design Research

Abbey: Daisy and Sherah were involved. Some hovercard usability testing to improve on mobile, feature testing experience, testing various android workflow. Daisy worried with discovery team on changes to portal and language drop.

Objective: Generative Design Research

Abbey: All the new reader activities conducted. All the data collected was analyzed in field and shared as soon as we got back. Findings from Mexico were integrated into Nigeria and India findings.
Abbey: Reached out to team, looked at all the findings and asked people to advise on their most important finding. Prioritized two findings. People are accessing info online and sharing offline. The next finding, concept generation, is about affordability and we will be addressing next quarter.
Abbey:Mobile web team have built prototypes for offline access to do some evaluative testing on.
Abbey:Clear cross-team collaboration is important for concept development and evaluation. Allowed us to build systemic concepts and address with diverse perspectives and expertise.

Objective: New Readers Research completed and shared

Abbey: Shared findings with core team and 2 day workshop with Reboot (a little bit of conceprt generation conducted).
Abbey: Zack led presentation of New Readers findings to community and staff.

Generative Design Research (continued)

Abbey: Led by Jonathan on new user support. Edit Review improvements and evaluate impact of Wiki Adventure, Teahouse (social context). Spoke with targeted users before we commit resources. Evaluating previous experience on how we onboard people can advise how we do it in the future.
Michelle: How is the wiki adventure being pushed?
Jonathan: Not highly visible right now because sw needs to be updated. It hasn't had any support so it has not been updated. We've been evaluating historical data.
Michelle: Does data indicate we should be spending some time to update?
Jonathan: Data indicates it does not have positive impact on new editor retention in its current version (one could potentially make changes and test if they improve this).
Dario: Happy we're finding negative results. Helps us prioritize and we can shift focus and inform project strategies.
Michelle: What has community use of Teahouse has been like the past year?
Jonathan: Teahouse is doing very well. initial results show it does keep newcomers around. By most measures, the teahouse experience has improved. People are more welcoming to new editors.

Objective: Objective: Research Capacity Building

Abbey: Part of this was about the tooling used to evaluative research. UserZoom has more tools for research than usertesting.com but it's also a little more complex. A lot work to engage the service this quarter and now we're beginning to use it.

Abbey: Participant recruiting. Samantha is leading effort and did some social media outreach with Jeff Elder. Did recruiting for three projects. Currently on medical leave. We're assessing how it's going with design team pitching in to support each other in her absence.

UserZoom

Abbey: Used for hovercard usability study. 100 participants on UZ panel.

Objective: Research Data Mapping

Abbey: Jonathan is leading. Doing analysis of all the sources and ensure that all study data is in compliance with WMF data policies.
Abbey: We're reviewing and doing house cleaning. Creating plan for when we do studies then what data tidying we need to do.
Abbey: Building process for releasing New Readers research. Also did trello cleanup led by Aaron. Feels good to do this type of work.

Core workflows and metrics

Abbey: Participant recruiting is key workflow.

Other achievements in Q4

Abbey: Did some work on personas. Created one persona for Mexico (only 15 interviews). Broader personas from Nigeria and India are being used for concept development for New Readers (understand their technical concept and abilities).
Abbey: We had set of personas from past, pragmatic personas, the reading team is going to move two pragmatic personas forward.

Appendix

Abbey: Places you can find our work.
Michelle: With studies I browsed through, the number of people in study are relatively low. Any concern about skewed percentages. Looking through open and closed study as example
Abbey: Diminished return. If you are working with specific group of people, 5 people are fine to determine patterns. With other research, such as generative research, you need a larger number of participants.
Tilman: In Reading, we've discussed this relation with of small group user testing with AB testing using larger numbers. Determining preferences between two options usually needs a larger group of people. But if everyone in smaller group (say 5 people) says it's a problem, then an experiment with 100 or 1000 people is very unlikely to change the conclusion that there is a problem.

Analytics Engineering

Nuria: Present 4 goals working on this quarter. The team has been a little short.

Operational Excellence

Nuria: Main goal is the pageview API. Carry-on goal from last quarter because we needed to do a lot of work into loading data. The work paid off.

Per article latency graphs

Nuria: Latency in graph. Looks like it goes to zero. Below one second. We are able to sustain low latencies but need to add more capacity.

Objective: Better tools to access data

Nuria: Likely to miss some targets because you set high goals. It was completed in the first couple weeks into Q2.

Graph: Pageviews daily (bar graph per country)

Nuria: Can get graph in a matter of seconds.

Graph: Pageviews daily (timeseries per country)

Nuria: The patterns for Mexico and Spain are somewhat different.

Wikistats 2.0

Nuria: Biggest goal this past quarter. Able to sucessfully reconstruct edit history from start of wiki. Several community wishlist items that will benefit from this project and other product when complete.

Public Event Stream POC

Nuria: Event stream endpoint is operational. Insrastructure is not productionised yet. It is meant to be fully public.

Other successes and misses

Nuria: A variety of things the team is doing. Kafka upgraded on non-analytics cluser (also used in fundraising).
Nuria: Org ask us for specfic data set. Collaborated with reseach to compile cached dataset.

Graph: User agent breakdowns (timeseries)

Nuria: We developed blog post showing browser data we have. Most people require people to buy this data. We are giving it away for free. We get thousands of data per day. Not one of the goals but a byproduct of our work that others benefit from.
Michelle: Legal team is very excited about Wikicite. We use often in our work.
Dario: Should introduce pivot to ?? as well.
Nuria: It's a high level tool so everyone in this meeting will benefit from it's usage. Eventually we'll also have edit data.
Dario: What's the timeline for ??? events?
Nuria: 3 months before it's is usable.

Graph: Visits over time (to our user agent breakdowns)

Performance Team

Graph: Daily median save timing (timeseries)

Ori: One of our key metrics. It's an editor metric. The x-axis extends before and after quarter (highlighted area shows quarter). Made substantial gain in reduction (1/5 of a second). Towards the end of quarter there was a slight regression but we fixed that.

Graph: Daily median first paint time (timeseries)

Ori: Time it takes for any content to appear. Less of an interesting story. There was a regression we caused then reverted. Overall, it's pretty flay. Even minor fluctuations are less significant than previous graph.

Objective: Thumbor

Ori: Thumbor is revamped tumbnailing. Moved into standalone component. The goal was to get running in production. It has been packaged and various plugins and debugging capabilities has been deployed. Any request to have image to resize. With current thumbnail stock, any size above a certain resolution won't render. We have to make sure we don't cause regression with Thumbor. The goal for next quarter is to identify failed rendering and comparing them to output of MediWiki.

Ori: Collaboration with Fillipo has been effective and is a model that should be replicated. Fillipo is engaged in day to day. End result is maintainable and has substantial buy-in from ops.

Objective: Performance Inspector (1 of 2: on wiki performance inspector tool)

Ori: Carried over for two quarters. Feature is complete but has not become beta feature. I've been editor since 2005 so I've been able to wing it for deploying new features. Since Peter doesn't have that background, it was wrong to think we could launch without product and community support. Peter has also taken a few period of absence.

Dario: This is something that lack of design and product support for tech team that also emerged in past Quarter review. A couple teams releasing end-user facing functionality. Think it's something we need to addess and come up with plan to get support from design research.
Abbey: I second.
Ori: I don't completely agree with rolling out features but open to being corrected. It's a very specialized feature that add functionality that didn't exist before. Thought we should get something out and get validation that it's worth dedicating resources to before reaching out to other teams:
Rob: Also a matter of managing risk. Taking risks in all areas. It's a question of how much risk we want to take in each area.

Objective: Performance Inspector (2 of 2: Optimise critical rendering path)

Ori: Necessary for basic layout of text on page. Still not done. Blocked on refactors and making model execution asynchronous. We have to roll that back and review why we didn't catch that in testing. Find way to avoid regression.Revised patches going out next week.

Ori: Also, loading of user and site CSS was de-duplicated. This has been de-duplicated.

Objective: Multi-DC

Ori: Several defects uncovered in database layer that were focus of Aaron's work this quarter. That has been fixed largely due to MASTER_GRID_WAIT. Updates database and then read the data back -risk factor is if you read from slave you might not get new data. That was also fixed.