Research talk:Reading time/Work log/2018-11-02

Saturday, November 3, 2018

Latest comment: 6 years ago2 comments2 people in discussion

Page unloaded event differences by mobile?

As discussed in the meeting with Jon, One possible limitation of the data may threaten our ability to make a fair comparison between mobile readers and desktop readers. Will mobile browsers fire pageUnloaded events when readers close or switch apps or not? If they do not then we will have a large number of page loaded events without page unloaded events and we will have missing data in a way that will be correlated with mobile usage. Even if mobile browsers do fire page unloaded events, they may do so in situations when we might expect the visiblelength counter to be updated. This would lead to downward bias in mobile reading times.

I wrote this query to compare the frequency of discrepant pageloaded and pageunloaded events.

SELECT COUNT(DISTINCT pagetoken) AS NReaders, Mobile, SUM(IF(one_each,1,0)) AS n_one_each, SUM(IF(not_unloaded,1,0)) AS n_not_unloaded, SUM(IF(loaded_more_than_1x,1,0)) AS n_loaded_more_than_1x, SUM(IF(unloaded_more_than_1x,1,0)) AS n_unloaded_more_than_1x
FROM 
( SELECT pagetoken, Mobile, (SUM(Nloaded) == 1) AND (SUM(Nunloaded) == 1) AS one_each, (SUM(Nloaded) == 1) AND (SUM(Nunloaded) == 0) AS not_unloaded, SUM(Nloaded) > 1 AS loaded_more_than_1x, SUM(Nunloaded) > 1 AS unloaded_more_than_1x 
FROM 
( SELECT pagetoken, 
         action, 
         Mobile,
         COUNT(*) AS N,
         SUM(IF(action=="pageLoaded", 1, 0)) AS Nloaded,
         SUM(IF(action=="pageUnloaded", 1, 0)) AS Nunloaded
         FROM ( SELECT event.pagetoken AS pagetoken, event.action AS action, webhost LIKE "%.m.%" AS Mobile    FROM nathante.cleanReadingData WHERE event.namespaceid == 0) g
         GROUP BY pagetoken, action, Mobile
) h
GROUP BY pagetoken, action, Mobile
) i

GROUP BY Mobile

dt = as_pandas(hive_cursor)
dt['p_one_each'] = dt['n_one_each'] / dt['nreaders']

dt['p_not_unloaded'] = dt['n_not_unloaded'] / dt['nreaders']
dt['p_unloaded_more_than_1x'] = dt['n_unloaded_more_than_1x'] / dt['nreaders']

dt = dt.drop('n_loaded_more_than_1x',1)

	nreaders	mobile	n_one_each	n_not_unloaded	n_unloaded_more_than_1x	p_one_each	p_not_unloaded	p_unloaded_more_than_1x
0	6	None	3	2	1	0.500000	0.333333	0.166667
1	448722947	False	424857529	21928737	1935907	0.946815	0.048869	0.004314
2	940421259	True	400006656	535123953	5288834	0.425348	0.569026	0.005624

As suspected, the incidence of page loaded events without page unloaded events is high on mobile. About 57%!

@Jon (WMF): --- FYI

Nevertheless, I am fitting the models that we talked about earlier. — The preceding unsigned comment was added by Groceryheist (talk) 03:03, 3 November 2018 (UTC)Reply

Thanks for looking into this and quantifying these concerns!

Also CCing Timo Tijhof, with whom I had a chat about this recently in Portland. He mentioned that Google has proposed a new browser feature that addresses such issues, the "Page Lifecycle API". Actually, from https://developers.google.com/web/updates/2018/07/page-lifecycle-api it seems that this is already live in the most recent versions of Chrome? Jon, would this be something we could try using in the ReadingDepth schema? Regards, Tbayer (WMF) (talk) 02:39, 4 November 2018 (UTC)Reply

Add topic