Research talk:Reading time/Work log/2018-10-02

Monday, October 1, 2018 / Tuesday, October 2, 2018 edit

Investigate the 40 second period error DONE edit

It turns out at the 40 second intervals were due to a bug computing the deltas. So it's not a problem.

Made cleaner plots of discrepancies edit

 
This chart shows the distribution of discrepancies between event timestamps and measured dwell times on Wikipedia. The 40 second intervals are positive (except around 0) and appear to decay exponentially. The discrepancies compare the time between server log events and the total length of time recorded by the browser. The discrepancies between visible length and the timestamps are similar.
 
As above, with dwell times measured in visible length, and the axis constrained. 


Look by IP block edit

When grouping by IPv4 block, there are not any obvious discrepancies. When comparing IPv4 to IPv6 it becomes clear that most of the errors are coming from IPv4.

 
Comparison of the discrepancies by the first digit of the IPv4 address. 1*, 2*, and 7* addresses are somewhat more common, but these might be over represented in the logs as well.

TODO: Do these as a proportion of all events in the group.

 
As above, comparing IPv4 to IPv6. Most of the errors come from IPv6.
  • Look by Geolocation (Mountain View, Redmond, Country, region, city)
 
Variation in reading time discrepancy by country.. The y axis is the average magnitude of the discrepancy between times on the server logs and client side timers. The x axis shows country codes. There is quite a bit of variation in the amount of discrepancy by country, but so far no clear pattern.
 
Variation in the proportion of client timers that measure more time than logevents suggest is possible.. The y axis is the average proportion of views where times on the server logs are shorter than the client side timers. The x axis shows country codes. There doesn't appear to be much variation. The countries at the high and low end have smaller sample sizes.
  • Inform engineering of findings
  • Maybe fallback to Webrequest table if we need more information

Improve Workflow edit

Filtering data for analysis edit

  • Exclude bots and spiders.
Return to "Reading time/Work log/2018-10-02" page.