Research talk:Reading time/Work log/2018-11-15

Thursday, November 15, 2018

Next phase of modeling

The models so far show that (1) readers read for longer on desktop than on mobile, that (2) readers from lower HDI countries read for longer than readers from higher HDI countries and that (3) the gap between desktop and mobile is greater for lower HDI countries than for higher HDI countries.

I feel that these are some interesting results, but all need to be explored further before we can draw strong conclusions.

First, I want some assurance that (1) is not being driven by measurement differences between mobile and desktop. In particular, I am worried that the gap might not be due to reading behavior, but by people leaving tabs open in the desktop who aren't reading. Obviously there is only so much we can do to control for this, and there is also sort of a conceptual issue about conceiving of what we are measuring as "reading" (If I am leaving a tab open as a reference while I work on another task (off the computer) , I may not be reading, but I am still "using" the page). To address this concern I want to do some more robustness checks. Also since tabs that lose focus may be killed on mobile devices we may have measurement error on mobile that introduces a negative bias.

Second, I want to test some different explanations for (2). Specifically, (2) might be driven by (a) readers reading in their non-primary languages and/or (2) might be driven by (b) a greater informational need in lower HDI contexts. Obviously it is plausible that some combination of (a) and (b) is in effect. I think this is quite interesting because it points to needs for improving WP quality in all language editions. To provide some insight to this question I will build a variable that indicates whether someone is reading from a country where the language edition they are reading is a common primary language and add it to the models.

Finally, I was surprised to find (3) as I imagined the "device gap" hypothesis to be a little bit too technologically deterministic and that if people have high information needs then they will read on whatever device is available. I want to do some more robustness checks for (3).

TODO List

[Done] Fit a M1 for all wikis (right now we are using the "secret weapon"). Fitting a model for all wikis is not doable using lm given the amount of RAM on the notebook machine. I can fit the model on a small sample, but the fixed effects for wiki consume so much variation that the relationships we are interested in are not well estimated.

[ ] Make better visualization of the "secret weapon" plots for M1.

[ ] Diagnose M2. Do the residuals look OK? Does it seem like we have a reasonable model specification? What robustness checks should we run?

[ ] Fit robustness checks to address measurment differences between mobile and desktop. (e.g. removing long views).

[Benched] Build same-language variable and add to M1 and M2.

[X] Do robustness checks for (3). (Is our specification for HDI correct? What if we control for other UN indicators instead i.e. literacy,

[X] global north vs global south

Add topic