Research talk:Reading time/Work log/2018-10-04
Thursday, October 4, 2018
editNotes from meeting
edit- Implications of tabbed browsing behavior is a concern. Comparing visible and total time and looking within sessions may be necessary for evaluating this concern.
- We want a statistical model so we can measure uncertainty in changes in the metric.
- Multimodal distributions suggest interesting cogpsych phenomena (activity, action, operation)
- Aaron thinks we will see a lognormal distribution (of session lengths) with most of the density in the 1-5 minute interval.
References
editResearch:Characterizing_Wikipedia_Reader_Behaviour
Aaron's paper about sessions and activity theory
Server log vs browser time discrepancies
editWe were concerned about a strange 40 second periodicity in the difference between the time between server log events and the time the page was open. It turns out this was due to a bug in how dates were parsed.
We can see that the discrepancy is often negative. In these cases the time measured by the browser is greater than the time between the log events. This could have to do with network latency adding noise to the time of the log events. Also, log events are measured at a lower resolution than the browser time. Negative discrepancies are a bigger concern than positive discrepancies because if our measurements were accurate enough they would not exist. The plot below takes a closer look at the negative discrepancies.
The top left plot shows a histogram with a millisecond resolution of the total number of views on the Y axis and the top left plot scales this axis by the total number of errors that are negative. The bottom two plots show the cumulative histogram of the negative discrepancies. We can see that in 50% of the cases the error is 1 second or less and only 5% of cases have a discrepancy greater than 6 seconds. There appears fat and long tail as the cumulative distribution heads to 0 very slowly.