Wikimedia monthly activities meetings/Quarterly reviews/Research, Design Research, Analytics, and Performance, April 2016
Notes from the Quarterly Review meeting with the Wikimedia Foundation's Technology I: Research, Design Research, Analytics, Special Projects, Performance teams, April 14, 8:00 - 9:30 AM PT.
Please keep in mind that these minutes are mostly a rough paraphrase of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material
Present (in the office): Abbey, Dario, Maggie, Daisy, Stephen L, Tilman, Gabriel, Chris Steipp, Gabriel, Rob L, Kevin Leduc, Geoff Brigham, Ori, Zhou, Katherine
participating remotely: Aaron Halfaker ,aotto, Dan, Jonathan M, Joseph Allemandou, Luca Toscano, Marcel Ruiz Forns, Mark Bergsma, mviswanathan, Nuria Ruiz, Samantha, Wes
Research and Data
editSlide 1
editSlide 2
editDario: new addition to the team: Nathaniel (full stack engineer). Still working with 2 research fellows, but many of our volunteer collaborators left in the quarter, which affected the team's productivity.
Slide 3 - Objective: Revision scoring
editDario read through goals on slides
- Edit typing. - working on edit taxonomy for this goal. will continue in Q4
- Partially achieved, second half is still in progress
- Papers submitted or on track for submission
Learnings - extending revision scoring to other languages requires hand coding and substantial community support which makes it harder than initially expected. We are also still working with Ops on addressing the blockers that need to be removed to productize the service.
Slide 4 - other achievements
editSwagger-based documentation
Service now reports its own performance metrics, e.g. Filter rate - that measures how much work we're saving for our curation community
Slide 5 - Article recommendations
editDario: we wanted to run a campaign, didn't get there, but improved usability. it's ready now for a campaign, hoping to test this in the coming days, working on an announcement with Comms.
Slide 6 - Reader segmentation
edit3rd major goal - qualitative plus quantitative research. gave an early presentation to the Reading team and we expect to publish the results more broadly after Q4
Slide 7 - Other successes and misses
editOrganized WikiCite in Berlin
Organized a joint research workshop at 2 major conferences, to be held in Q4
Deployed referer policy in collaboration with Ops
Slide 8 - Core workflows and metrics
editHosted public showcases.
Maggie: Community Engagement really appreciates the work of the team
Wes: good progress toward actual application
Katherine: agree, great to see move towards applications; glad to see increased collaboration, excited to see readership data
Slide 9 - Appendix
editDesign Research
editSlide 10
editSlide 11 - Production Work
editAbbey: making sure we build what is usable. Exploratory research with non-technical users. Daisy and sherah did workflow testing with iOS app. Findings are being used to iterate on the app. Daisy did research with new editor experience about the new editor education flow on VE.
Contextual inquiry in Mexico. Learned about doing contextual inquiries in the field and a lot from and about the people who participated in the research. See monthly metrics from March to see a high level description of our findings.
Slide 12 - Production Work
editSlide 13 - Mentoring
editAbbey; mentoring people. Sherah has been working on research for the Reading team. Seeing how the mentoring works, but may do it with others
Worked with May and Volker to understand the needs of people who build UI on wiki around UI libraries. (Focus groups)
Currently working on the program toolkits with Jaime, Maria, Edward and Subha on theCE team, to better understand how our toolkits work for peopel who organize and run programs. There will be iteration on the toolkits from what we learn.
Slide 14 - Personas
editPersonas - this will be an ongoing thing. Didn't achieve our current goal, because we didn't get the analysis we hoped to get done. done.
Slide 15 - External Collaborations
editJonathan has been leading collab with UW, 200 responses, we're going to see what we can learn with students seeking information. Generalizing experience from Mexico and UW. (In general as we do more exploratory research, we will be able to compare the various groups of people we learn from (participants).
Slide 16 - Othre successes and misses
editReboot = consultancy which has done many contextual inquiries.
On methodology of inquiry. readers in Nigeria and India. working on building a db to find commonalities and differences of research in various regions.
Research (Design Research and Research and Data) team offsite - good for figuring out how we're going to work together
Slide 17 - Core workflows
editSamantha really helped recruiting in Mexico, as well as her ongoing recruiting for production work with product teams.
Monthly metrics about Mexico deep dive.
CSCW workshop "Breaking into new data-spaces"
Slide 18 - Appendix
editGeoff: product design work and this are theoretically all (attached to Product). how much - what are the efficiencies that make it into product? Do you see misses?
Abbey: we do research, and Product does design. We work closely with product. We do have misses (when product goes out the door before researc. Not that evrything needs research to be done, but we do want there to be no usability issues when product goes into production. We're getting better and better at iterating and working with teams. VE was working well for a long time, but now there is no designer on the team, and we are seeing the difference.
Jonathan: often when we our work doesn't make it into product, reasons have more to do with organizational changes that affect product teams' prioritites.
E.g. at times in Collab, UX standardization team, our work not integrated or on holdbecause product changed
Geoff: let's have a side conversation
Katherine: regarding generative value of your research, how do you prioritize? do people come to you, or do you have a backlog?
Abbey: so each official product team, we have biweekly meetings. we always have topics for discussion, and prioritized list of projects. we have a hit list of things that need research work. We work with the teams on what we have bandwidth for. We also have our Phab board, and people can add projects. Weekly triage (backlog grooming). you can look at that board and see how we're working. We also had a design workshop. All service (teams that build user facing functionality but do not have design supprt) depts (e.g. Security, Legal,...) have some user facing functionality they are building, so to the degree that we can (with our bandwidth) we work with those teams to iterate their user facing functionality.
We're working with Sarah on publishing reports (e.g. on wiki) and better communicating our work more widely.
Analytics Engineering
editSlide 19
editNuria: we would like to figure out how to track velocity better. Likely next quarter we will be reporting ops works separately. We have 6 full time engineers, and 1 fulltime manager
Slide 20 - Uniques
editNuria: spend quite a bit of time on this this quarter. We were able to deliver unique devices daily and monthly. Per country and per project (desktop and mobile) We now publish it the unique devices data as downloadable files. Started January 2016. We have data split of by country internally, publisjhing that externally is quite tricky due to privacy reasons.
Slide 21 (graph)
editMobile: over half of our uniques. in Indonesia, over 80%
Slide 22 - Wikistats 2.0
editThis is part of wikistats 2.0, our project to replace work that Erik Z. is been doing on http://stats.wikimedia.org for the better part of a decade
Browser data: https://browser-reports.wmflabs.org/#all-sites-by-os Soon to be deployed on new production domain. by OS, by browser, combination. Analytics would greatly benefit from working with a designer, UI projects take longer because we don't spend enough time on design cycles, building mocks, for example.
Slide 23- User Agent Breakdowns (graph)
editOne goal we missed (though 99% there), we weren't able to do this. Sanitizing data is very difficult. We got Ok from research but we need to work on this with security.
Need to work with security so that we know what we're doing works.
Slide 24 - sanitization
editChris: I didn't realize you were waiting for me. Do you need help?
Nuria: right now, it's on us. we have some research to do. we want to make sure we have something to propose, and then want Chris to review
Slide 25 - Operational Excellence
editA lot of time spent working in Operations. We maintain several systems, Eventlogging, pageview API, cluster.
We need more DBA help on the team.
Mark: hope to have that sorted out within a month
Slide 26
editPiwik - kinda like Google Analytics. (demo of the dashboard) Self-hosted; smart platform for small websites, and it works well for our small sites, but not with large amounts of traffic
Slide 27 - Piwik (example screenshot)
editSlide 28
editpublic by default
removed (hashed) IPs from EventLogging data to collect as little PII as possible.
We're trying to make more sanitized from the start. Group editing data. Pageview stats - our homework is to have very good data for edits, too.
Katherine: thank you for helping the whole organization. I hope we'll be able to get the design support and the security help
Personally (from Comms) thanks for your help with Piwik, especially when we melted it down ;) Nuria: glad we're able to deploy it for the right size projects
Special projects
editSlide 29
editSlide 30 - Begin a Community Consultation
editKevin: I led a virtual team including Michelle, Tiffany, Juliet, Johan, Edward and David S. with a goal to start a consultation on uniques
Out of our measures of success which were milestones, we
- Hosted internal brownbag
- we did not run a survey
- and we did not start a consultation.
Analytics team had been debating using unique tokens for a years.
Unique Tokens: postponing it for at least half a year and if the new ED wants to pick up the issue. We avoided a costly community consultation both resource-wise and community good will - Feb-March would not have been a good time to consult the community about this.
Slide 31 - Other successes and misses
editWhat did we learn?
Having unique device counts tipped the scales away from implementing a unique token. We have very valuable dataset now. It takes a long time to develop metrics, and understand and use them. Dumping a whole bunch of new metrics wasn't something we could do. There are alternatives to unique tokens and they should be used. Reading team now has a way of instrumenting their code. If we did want to do unique tokens, we would need more support in the staff, and it was clear that it didn't have the staff support. We're handing over the metrics to the Reading team.
Geoff: Is there a process to know tokens our current data collection is giving us what we need?
Kevin: we don't have a process. we don't know the gaps.
Dario: there needs to be a phase of data analysis. we need to know what we have before we iterate further.
Tilman: You said the Reading team *now* has a way to instrument, what did that refer to?
Kevin: I'm referring to session metrics. The reading team can implement that on the client side and report back xqysldkfaslk;dfj has to schedule that implement
Katherine: thank you. Social and technical we know this is complex. Thank you and the team for making the tradeoff. To be able to explore something and then step back from it is a sign of a mature decision making process.
Geoff: good learning about communicating with staff. It felt like there were feedback loops
Kevin: yes from execs to product managers
Performance
editSlide 32 (team)
editOri: 5 full time employees
Slide 33 KPI: first paint time (graphs and charts)
editOri: from the time navigating to a page, and the user seeing content rendered. we've been flat this quarter....slight regression. we have a good idea why, but not something we can do anything at the moment. Box plot-whiskers are at the extremes (10th/90th percentile) - line in middle is median. On the whole our 1 year graph looks pretty good.
Slide 34 SPDY usage vs time
editInvestigated regression with first paint.inversely correlations to % of client connections that use SPDY
(Google rolled out protocol that improves efficiency of connections, rechristened SPDY->HTTP2. Std adopted is slightly differnt. Browsers are dropping SPDY and adding HTTP2. Clients on older browers won't get the benefit of using either.
right now only suport SPDY. HTTP2 planned for later this month
i.e. large story short: largely due to browser suport, changes
Slide 35 KPI: Page save time
editPage save time
Save->edited article loading
Better news: we're getting gains. Wide array of gains. Still fairly significant gains. Fastest connections editing hardest pages are seeing biggest benifit
Slide 36 Performance inspector
editTool
Semi-hidden debugging tool picked up by editors
our perf characterics are not just function of code, but there is a high degree of variance betwen projects local admins have ability to load gadgets by default by modifying CSS and JS
I.e. differing CSS and javascript code on different wikis. we've had bad regressions on various wikis because of this. Idea was to make information available to those that can use it. Make it compelling and actionable to the right people. indication of how long it would take to load on 2g connection (for exmaple)
Have an early prototype, see short video in this slide. but ongoing development.
Caring over to next quarter
[[File:WMF Research & Design Research & Analytics & Special Projects & Performance teams quarterly review Q3 2015-16.pdf|thumb|380px|page=37|slide 37]
Slide 37 - High-availabiltiy for Mediawiki / Leaner mobile web
editTwo added goals
1. narrower and deeper subset. taking MediaWiki which was not written for multi-concurrency, and chasing down bugs and race conditions that prevent us from geographically distributing servers
data centers). April 19 switchover. read only test that simulates some conditions.
2. Goal of mobile website leaner. Segment based on bandwidth availability
Made a lot of interesting progress, but ultimate feedback from designers was that even on high speed connections, high density solutions weren't worth it. so we disabled it.
Slide 38 - Granular Performance Dashboards
editSubstantially more data
Geo granularity
(demo of Grafana) NavigationTiming by geolocation
We can compare site performance on a per-country basis. Cool off the shelf tool we were able to deploy. Ongoing work on ...
= Slide 39 - Contributors
editMediawiki availability. Aaron and Timo are prolific.
Slide 40 - Problems and Prospects
editProblems:
- Number 9 :-)
- We can't always can't always anticipate what is going to be the problem
- Too much is knowledge is bundled in too few people. provide guidance, e.g. Timo is part of ArchCom, gives input to Reading team on lazy loading of images. Team members have a lot of expertise. Productize our projectize our work. still don't know the impact of performance on wider
RobLa: compared to when before there used to be a lot of lore trapped in a few experts' heads, you've done a good job of socializing this, giving people the ability to see the impact of what they do. that's huge
Dario: often work not supported by designers
Geoff: because we don't have enought designers?
Dario: no designer assigned to Research, Perf, ... so we ask designer from Product team to help out
Geoff: is the reason you want design that you want more peopel to use things, more accessible?
Abbey: yes, to enable users to find and use it findable, usable
Abbey: there's expertise.
Nuria: teams work better when teams work on their core expertise. e.g. in Analytics, we are engineers. If you have to do UI and you don't konw that, you step out of your core expertis, not efficient
Dario: it makes sense that .... we have low designer->dev for a lot of our work
Ori: at least, here there is not risk of doing actual harm when doing experiments [as opposed to end user facing work]
bigger risk: we try idea, and reject it in absense of design research. we toss ideas for "not having value"
Abbey: if you have something you already built, we can do heuristic eval, or a usability test. For a designer..
Wes: Do you use OOJUI?
Ori: yes...what you saw in the prototype was based on OOUI.
Wes: Volker is working with Editiing team, good to get solifiied
Kevin: what has been the community reaction to reducing image quality.
Ori: there hasn't been a negative reaction I'm aware of. rollout correlated with a lot of other things. wasn't advertised as loudly and compelling
Tilman: it was on Tech News, that was the only time we highlighted it publicly
Wes: might be good to follow up on
Dario: it is noticable - graph rendered as thumbs it's notable
Ori: doesn't apply to PNGs. If you just consider impact on pages, 1.8 MB to 900k on w:Barack Obama article
Wes: Design Research assessment?
Ori: not done here, but (by others). lazy loading related but not entirely
Gabriel: lazy loading infrastructure should provide an opportunity to properly support high/low speed targeting in JS
Ori: good point...may allow for reintroducing in some specific instances
Gabriel: browsers don't take bandwith into account, internal mechanism for that is kind of poor
Maggie: could you give slides?
Katherine: thank you. not sure about goal of making Performance team obsolete, but agree with the move to empower people
Design comments have been noted & related conversations with Heather and Arthur