Research:Module storage performance

Created

November 23, 2013

Contact

Halfak (WMF)

Collaborators

Ori.livneh

Duration: 2013-11 – 2013-11

Wikimedia supported

Contact: Halfak (WMF)

Open access
via meta.wikimedia.org

Open source
via github.com

Research:Projects

This page documents a completed research project.

This page in a nutshell: "Module storage" is a new MediaWiki feature that caches ResourceLoader modules in localStorage. The intent is to increase performance by using localStorage as caching. In this study, we explore how this feature works out in practice.

In this study, we assess the impact of caching ResourceLoader modules in localStorage on page load time.

Background

MediaWiki’s user interface comprises modules of CSS, JavaScript, and localization strings. These modules specify the layout and style of page contents and the labeling and behavior of its interactive components, and their cumulative byte-size is standardly several times greater than the byte-size of the article text which they accompany. During the time it takes your browser to load these modules, page elements may be missing, unstyled, or unresponsive to user input. Studies have shown that long page loading time correlates with diminished user interest and satisfaction. Page loading time is neither pleasant nor useful, and there is a growing body of evidence correlating it with diminished user participation. We want it to be as short as possible.

MediaWiki optimizes page loading time by concatenating interface modules, so that they can be sent in bulk, and by supplying hints to the user’s browser about how module contents could be cached locally and reused for subsequent page views. Although generally effective, these optimization strategies sometimes work against each other. Composite module data that is sent in bulk is cached in bulk and must be discarded in bulk whenever one of its parts requires an update, causing the browser to throw out and re-retrieve module data that has not changed since it was last downloaded. There exists no standard, programmatic interface to native browser caches; thus, although MediaWiki’s JavaScript core is able to tease apart concatenated modules, it cannot be used to improve native caching behavior.

Modern browsers do, however, implement a generic interface for persisting data locally, called localStorage. Although localStorage was not designed to serve as an application cache, its programming interface is flexible enough that an application cache could be implemented on top of it in JavaScript. Change 86867 provided an implementation of a MediaWiki module cache that uses localStorage for persistence and that is able to decompose bulk module data into discrete modules and cache them separately, making cache updates more granular.

In this study, we'll experimentally explore the effects that module storage has on performance in practice on Wikimedia wikis.

Methods

In order to explore the effects of module storage, we ran a controlled experiment where we randomly sampled 0.1% of browsers and randomly split them between "control" and "test" conditions.

control: Module storage was disabled and users' browsers were expected to perform all caching
test: Module storage was enabled and cached users' javascript in localStore

TODO: Ori discusses bucketing algorithm in details

Sample

Between 07:42 UTC on Nov. 20th, 2013 and 23:17 on Nov. 23th, 2013 we gathered 1.49 million load timings for 381,860 unique readers using Schema:ModuleStorage.

Load time density. The density of load times is plotted by experimental group and load index.

Load index

In order to compare the performance of module storage against browser based caches, we sought to measure both the pre-cache performance (the first time that a reader loads the site) and post-cache performance in both experimental conditions. We assume that the first recorded page load during the experiment is related to pre-cache performance and that, from the second load onward, we are observing post-cache performance. However we didn't want to just stop with the second page loads, there are some reasons to believe that performance might continue to improve after the second page load^[1]^[2], so we indexed and compared all subsequent page loads as well.

Load time statistic

In order to measure the differences in load time, we needed a statistic that represents a stable measurement of the distribution of load timings. To figure out an appropriate statistic, we plotted the density of load timing split by the type of load. Upon logging the x axis, figure #Load time density shows two clear, overlapping log-normal distributions for the first pre-cache page load (index=0) and the post-cache page loads (index=1-9).

This log-normal distribution of load timings suggests that a geometric mean would provide solid, stable description of the distribution.

Results

Grouped analysis: which one is faster?

Load time by condition. The geometric mean load time is plotted by experimental group and load index.

Module storage is faster.

250px;px

Load time: cached vs. not cached. The geometric mean load time is plotted for cached and non-cached requests to Wikipedia.

Why the descending load timings?

Load times by quantile. The geometric mean load time is plotted for each load index by the quantile for number of recorded loads that the reader falls into.

Readers who load slower tend to browse less.

Differences between browsers

Mobile vs. non-mobile load times. Load time density is plotted by browser on mobile platforms for each experimental condition.

Mobile doesn't benefit from caching as much or as consistently as non-mobile.

Load time density by browser (non-mobile). Load time density is plotted by browser on non-mobile platforms.

Load time density by browser (mobile). Load time density is plotted by browser on mobile platforms.

References

↑ Predictive optimization: "Chrome learns the network topology as you use it...the predictor relies on historical browsing data, heuristics, and many other hints from the browser to anticipate the requests."
↑ The higher the page view index, the more likely it is that there had been a previous page view in the same browser session, which means more page resources are available in RAM; a decreased likelihood of being affected by TCP slow-start; increased likelihood that a persistent connection had already been established prior to the current page view.

Contact

[1] Predictive optimization: "Chrome learns the network topology as you use it...the predictor relies on historical browsing data, heuristics, and many other hints from the browser to anticipate the requests."

[2] The higher the page view index, the more likely it is that there had been a previous page view in the same browser session, which means more page resources are available in RAM; a decreased likelihood of being affected by TCP slow-start; increased likelihood that a persistent connection had already been established prior to the current page view.

[1]

[2]