Research talk:Reading time/Work log/2018-10-04

Thursday, October 4, 2018

edit

Notes from meeting

edit
  • Implications of tabbed browsing behavior is a concern. Comparing visible and total time and looking within sessions may be necessary for evaluating this concern.
  • We want a statistical model so we can measure uncertainty in changes in the metric.
  • Multimodal distributions suggest interesting cogpsych phenomena (activity, action, operation)
  • Aaron thinks we will see a lognormal distribution (of session lengths) with most of the density in the 1-5 minute interval.

References

edit

Research:Mobile_sessions

Research:Characterizing_Wikipedia_Reader_Behaviour

Research:Activity_session

Aaron's paper about sessions and activity theory

Server log vs browser time discrepancies

edit

We were concerned about a strange 40 second periodicity in the difference between the time between server log events and the time the page was open. It turns out this was due to a bug in how dates were parsed.

 
This chart shows the distribution of discrepancies between event timestamps and measured dwell times on Wikipedia. The discrepancies are centered around 0 and appear to decay exponentially. The discrepancies between visible length and the timestamps are similar.

We can see that the discrepancy is often negative. In these cases the time measured by the browser is greater than the time between the log events. This could have to do with network latency adding noise to the time of the log events. Also, log events are measured at a lower resolution than the browser time. Negative discrepancies are a bigger concern than positive discrepancies because if our measurements were accurate enough they would not exist. The plot below takes a closer look at the negative discrepancies.

 
Time between server log timestamps and the span of time the page was open in the browser (total time).. The X axis shows the time between server log events minus the time span measured by the browser. When these values are negative, the time measured by the browser is greater than the time between the log events. This chart shows the distribution of the negative discrepancies. We can see that

The top left plot shows a histogram with a millisecond resolution of the total number of views on the Y axis and the top left plot scales this axis by the total number of errors that are negative. The bottom two plots show the cumulative histogram of the negative discrepancies. We can see that in 50% of the cases the error is 1 second or less and only 5% of cases have a discrepancy greater than 6 seconds. There appears fat and long tail as the cumulative distribution heads to 0 very slowly.

Return to "Reading time/Work log/2018-10-04" page.