Wikimedia monthly activities meetings/Quarterly reviews/Multimedia/January 2015

The following are notes from the Quarterly Review meeting with the Wikimedia Foundation's Multimedia team, January 28, 2015, 10:15 - 10:45 PST.

Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Present: Gergő Tisza, Gilles Dubuc, Erik Möller, Mark Homquist, Fabrice Florin, Rob Lanphier, Guillaume Paumier, Tilman Bayer (taking minutes), Rachel diCerbo, Howie Fung (as of slide 9), Toby Negrin (from 10:40)

Participating remotely: none

Welcome, agenda, team intro

slide 2

[slide 2]

Agenda

slide 3

[slide 3]

Gilles:
Team overview
(Fabrice just left us)

What we said

slide 5

[slide 5]

What we did

Deployed all MV improvements that were planned
The one big change in MV: Reorg of metadata panel

slide 8

[slide 8]

error logging is what keeps us busy right now
Erik: sample, or full stream? full stream
Erik: could we handle a case where there are errors on every page view?
Gilles: should be able to handle it just like for PHP errors
at least on Beta labs - may have to sample in production
Erik: consider there's lots of language-level customization, user gadgets etc in production

slide 9

[slide 9]

(Gilles:)
upload pipeline
righ now, cleaning code base
don't know yet which areas are most promising for improvement effforts

slide 10

[slide 10]

structured data
Wikidata team was still occupied with other tasks, which is why we decided to put this on hold for now
RobLa, Gilles, Guillaume: We still need to announce that the Structured data project is on hold. However the Wikidata query service is making progress both on the WMF and WMDE side, and this will be used by Structured data in the future. Keegan is working on an announcement with Maarten.

slide 11

[slide 11]

Guillaume: Metadata cleanup

slide 12

[slide 12]

Gilles: Had to do firefighting, e.g for thumbnail chaining
Erik: Are we still using thumbnail chaining in production?
Gilles: No
Erik: what about a 10MB JPEG that has never been scaled?
Gilles: 99.5% of Varnish misses find images that are already rendered in Swift
we overestimated the phenomenon (of thumbnails not being ready)
so that case [is insignificant]
Erik: I'm suspicious of that, what if we turn off optimization completely?
Gilles: already did, to verify it really only affects this tiny proportion of images
(discussion about pre-rendering)
MV is inefficient with vertical images
browser is going to scale down because it's height-constrained
proper pre-rendering in MV will require revisiting of this approach
Erik: aside from percentage, consider also UX on upload
Gilles: yes, pre-rendering will fix that, and we can turn it back on. I only turned it off to gather data for a month or so

What we learned

slide 14

[slide 14]

structured data

slide 15

[slide 15]

MV
(Gilles:) User design research was critical
not a lot of feedback recently, but that doesn't mean there is consensus, would need to run survey or so

Metrics & other key accomplishments

slide 17

[slide 17]

Guillaume: The good news: In three months, we got rid of over a third of the files missing machine-readable metadata across all wikis. Most of this progress was driven by editing file templates on the wikis with the most files. Over this period we gained 3 percentage points in the total proportion of files with machine-readable metadata.

The challenge is that we’ve now exhausted the low-hanging fruits (templates that were on lots of pictures) , and we’ve reached the point where most of the remaining files don’t have templates. This means that we need to add the templates ourselves to structure information that is currently in raw wikitext, which will take more time. This is going to be done by running focused campaigns using bots on large sets of files whenever possible.

slide 18

[slide 18]

Gilles: this is a review of this area from another perspective
Trend may not entirely be due to cleanup drive, e.g. might also be that more recent uploads have better metadata
but it's great from the MV perspective nevertheless

slide 19

[slide 19]

Erik: among readers, it [enabled rate] is presumably higher

slide 20

[slide 20]

Gergő: caveat: spike at end of September might be an artefact
Erik: interesting that disable is trending downwards
Fabrice: there was a slump in image views in December due to holidays

slide 21

[slide 21]

MV vs. file page

slide 22

[slide 22]

Gilles:
UploadWizard
put a lot of work into complexity measurements
Erik: ...
Mark: ... did some refactoring, not sure it's worth
Erik: mostly, would like to get rid of jQuery UI

slide 23

[slide 23]

Gilles: UW funnel analysis
good news is that we didn't make things worse ;)

slide 24

[slide 24]

What's next

slide 26

[slide 26]

Sentry (JS error logging)
Erik: again, in worst case scenario where every PV generates error, this 100% approach might need to be revised
Gilles: can still decide to sample

slide 27

[slide 27]

Ops wants us to move away from Swift, not store 3 copies of each thumbnail
RobLa: other problem with Swift: high latency
Gilles: yes, saw that in stats
can't tell though if smaller thumbnails are better, because all our stats are cumulative
didn't know that Swift is the major cause of slowdown for cache misses
Toby: how big are [typical] thumbnails?
Gilles: can easily be 1-2MB
want to increase UW test coverage
refactoring is done with that in mind
Erik: worry that Sentry could take a long time to deploy
should think about general JS error logging strategies that can be employed directly
thinking back to the times where a single EventLogging error[?] took down site
timelines should sync up, Mark is working in the dark right now
ETA for Sentry?
Gergő: there are two parts: catching uncaught errors and manually adding error catching to UW JS code. The second can be done sooner because UW usage scale is much smaller

Asks

slide 30

[slide 30]

(Gilles:)
Analytics:...
Toby: looking forward to working with you on this

slide 31

[slide 31]

(Gilles:)
Product:
better scoping, e.g. for MV, we put work into features that didn't make it in eventually
qualitative metrics; the survey we did wasn't that good

slide 32

[slide 32]

C-level
Erik: well-scoped work for next few months, e.g. UW
then conversation on whether to merge/split team
Gilles: and/or perhaps merge our reporting with other team
Erik, Gilles: (discussion on whether Varnish hits dashboard shows significant performance regression ...)
Toby: do general site perf dashboards show similar regressions on images?
Erik: yes, might
I'm worried about turning off the pre-rendering, even if it's just a few thousand users that experience like 15-20 sec delays
Gilles: decide on whether spend that time
turn it back on in February
Fabrice: ...