Research talk:Reading time/Work log/2018-11-18
Sunday, November 18, 2018
editWrapping up Robustness Checks for Model 3
editHow sensitive was the analysis of HDI to the choice of development instead of other variables like education levels? I fit the models from yesterday but used the variable "mean years of schooling" instead of HDI. I chose years of schooling because there was actually a lot of missing data for literacy in the UN dataset. This variable is the average years spent in school by the adult population of a country. Our results have roughly the same interpretation as HDI. It's important to note that years of schooling is a component of HDI, so this result isn't that surprising. I standardized MeanSchooling before I added it to the model.
model 2 | model 3 | model 3 with quadratic MS | model 3 with cubic MS | model 3 with quartic MS | model 3 with quadratic MS:mobile | ||
---|---|---|---|---|---|---|---|
Intercept | 9.9485 (0.0077)*** | 10.0198 (0.0078)*** | 10.0304 (0.0078)*** | 10.0324 (0.0078)*** | 10.0469 (0.0078)*** | 10.0511 (0.0078)*** | |
mobile | -0.1985 (0.0011)*** | -0.3101 (0.0017)*** | -0.3001 (0.0017)*** | -0.2999 (0.0017)*** | -0.2939 (0.0017)*** | -0.3026 (0.0019)*** | |
Human Development Index | -0.0733 (0.0007)*** | -0.1431 (0.0011)*** | -0.0965 (0.0017)*** | -0.0992 (0.0018)*** | -0.0398 (0.0023)*** | -0.0239 (0.0026)*** | |
mobile : MeanSchooling | 0.1252 (0.0015)*** | 0.1141 (0.0015)*** | 0.1140 (0.0015)*** | 0.1099 (0.0015)*** | 0.0836 (0.0027)*** | ||
mobile : MeanSchooling^2 | 0.0248 (0.0021)*** | ||||||
MeanSchooling^2 | -0.0394 (0.0010)*** | -0.0447 (0.0018)*** | -0.1211 (0.0025)*** | -0.1372 (0.0028)*** | |||
MeanSchooling^3 | 0.0042 (0.0011)*** | -0.0438 (0.0016)*** | -0.0429 (0.0016)*** | ||||
MeanSchooling^4 | 0.0440 (0.0010)*** | 0.0444 (0.0010)*** | |||||
Revision length (bytes) | 0.1677 (0.0005)*** | 0.1673 (0.0005)*** | 0.1677 (0.0005)*** | 0.1677 (0.0005)*** | 0.1684 (0.0005)*** | 0.1685 (0.0005)*** | |
time to first paint | -0.0157 (0.0006)*** | -0.0157 (0.0006)*** | -0.0156 (0.0006)*** | -0.0156 (0.0006)*** | -0.0156 (0.0006)*** | -0.0156 (0.0006)*** | |
time to dom interactive | 0.0036 (0.0009)*** | 0.0037 (0.0009)*** | 0.0036 (0.0009)*** | 0.0036 (0.0009)*** | 0.0036 (0.0009)*** | 0.0036 (0.0009)*** | |
sessionlength | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | |
lastinsessionTRUE | 0.6190 (0.0011)*** | 0.6161 (0.0011)*** | 0.6152 (0.0011)*** | 0.6152 (0.0011)*** | 0.6152 (0.0011)*** | 0.6152 (0.0011)*** | |
nthinsession | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | |
dayofweekMon | 0.0952 (0.0020)*** | 0.0951 (0.0020)*** | 0.0950 (0.0020)*** | 0.0949 (0.0020)*** | 0.0943 (0.0020)*** | 0.0942 (0.0020)*** | |
dayofweekSat | 0.0087 (0.0020)*** | 0.0060 (0.0020)** | 0.0057 (0.0020)** | 0.0057 (0.0020)** | 0.0058 (0.0020)** | 0.0056 (0.0020)** | |
dayofweekSun | 0.0271 (0.0020)*** | 0.0252 (0.0020)*** | 0.0244 (0.0020)*** | 0.0244 (0.0020)*** | 0.0246 (0.0020)*** | 0.0244 (0.0020)*** | |
dayofweekThu | 0.0487 (0.0020)*** | 0.0484 (0.0020)*** | 0.0482 (0.0020)*** | 0.0482 (0.0020)*** | 0.0479 (0.0020)*** | 0.0478 (0.0020)*** | |
dayofweekTue | 0.0283 (0.0020)*** | 0.0286 (0.0020)*** | 0.0286 (0.0020)*** | 0.0286 (0.0020)*** | 0.0278 (0.0020)*** | 0.0277 (0.0020)*** | |
dayofweekWed | 0.0674 (0.0020)*** | 0.0671 (0.0020)*** | 0.0669 (0.0020)*** | 0.0669 (0.0020)*** | 0.0664 (0.0020)*** | 0.0663 (0.0020)*** | |
usermonth4 | 0.0099 (0.0097) | 0.0098 (0.0097) | 0.0098 (0.0097) | 0.0098 (0.0097) | 0.0098 (0.0097) | 0.0098 (0.0097) | |
usermonth5 | 0.0123 (0.0096) | 0.0116 (0.0096) | 0.0112 (0.0096) | 0.0112 (0.0096) | 0.0111 (0.0096) | 0.0110 (0.0096) | |
usermonth6 | -0.0075 (0.0099) | -0.0080 (0.0098) | -0.0083 (0.0098) | -0.0083 (0.0098) | -0.0084 (0.0098) | -0.0086 (0.0098) | |
usermonth7 | -0.0463 (0.0098)*** | -0.0469 (0.0098)*** | -0.0465 (0.0098)*** | -0.0465 (0.0098)*** | -0.0462 (0.0098)*** | -0.0463 (0.0098)*** | |
usermonth8 | -0.0105 (0.0098) | -0.0111 (0.0098) | -0.0110 (0.0098) | -0.0111 (0.0098) | -0.0108 (0.0098) | -0.0108 (0.0098) | |
usermonth9 | 0.0421 (0.0077)*** | 0.0426 (0.0077)*** | 0.0426 (0.0077)*** | 0.0426 (0.0077)*** | 0.0427 (0.0077)*** | 0.0426 (0.0077)*** | |
usermonth10 | -0.0012 (0.0076) | -0.0035 (0.0076) | -0.0039 (0.0076) | -0.0038 (0.0076) | -0.0022 (0.0076) | -0.0022 (0.0076) | |
R2 | 0.0520 | 0.0527 | 0.0528 | 0.0528 | 0.0530 | 0.0530 | |
Adj. R2 | 0.0520 | 0.0526 | 0.0528 | 0.0528 | 0.0530 | 0.0530 | |
Num. obs. | 9873641 | 9873641 | 9873641 | 9873641 | 9873641 | 9873641 | |
RMSE | 14.3861 | 14.3810 | 14.3799 | 14.3799 | 14.3785 | 14.3784 | |
***p < 0.001, **p < 0.01, *p < 0.05 |
Marginal Effects plot of higher-order poly for education levels by mobile. This chart is used to interpret a regression model predicting amount of time a Wikipedia page is visible in reader's browsers. It is a marginal effects plot showing how the model-predicted relationship between the education levels of the country in which the reader is located and the amount of time they spend reading depending on whether they visit the mobile or desktop sites. Readers read longer on desktops than on mobile devices and from lower educated contexts, and the gap between desktop and mobile is greater in lower educated contexts.
|
Marginal Effects plot of higher-order poly for HDI by mobile. Same as plot to the left except with HDI instead of MeanSchooling.
|
So we can see clearly by comparing the two plots that our substantive results do not seem to depend on selection of HDI instead of Education level. The only difference is that maybe in the highest education contexts people read for somewhat longer, but my intuition is that this is an artifact of using a high-degree polynomial which may be overfitting. As with HDI, out-of-sample predictive performance improves as we increase the degree of the polynomial of MeanSchooling, but increasing the order of the interaction term hurts out-of-sample predictive performance.
Rmse | Rsqr | name |
---|---|---|
1.677632 | 0.0536712 | model 2 |
1.678465 | 0.0570225 | model 3 |
1.679589 | 0.0563081 | model 3 with quadratic MS |
1.679736 | 0.0562688 | model 3 with cubic MS |
1.679118 | 0.0579871 | model 3 with quartic MS |
1.679154 | 0.0576906 | model 3 with quadratic MS:mobile |
Model Diagnostics: Residual Plots
edit
Model 3 residuals QQ plot. This chart tests the assumption of OLS regression that the residuals of the model are normal. The plot shows that a normal distribution fits most of the distribution well, however extreme values (both high and low) of residuals are more common than a normal distribution would predict. This is not a great surprise. While the reading time data are do not perfectly fit a log-normal distribution, we are using a log normal distribution to model reading times.
My conclusion from the residual plots is that the model fits the data about as well as can be expected. I don't see any need for alarm. We don't have perfectly normal data so we don't have perfectly normal residuals. Because of the clustering in the predicted vs residuals plot, I think we should add a mobile:lastinsession interaction.
|
Model 3 residuals Histogram. This chart plots residuals of model 3. It provides similar information to the QQ plot: the residuals have too many extreme high and low values to be perfectly fit by a normal distribution.
Model 3 residuals values plotted against predicted values. This chart plots the predicted values of model 3 against the residuals. This type of plot is useful for identifying heteroskedasticity in the data that is not accounted for in the model. As we see 2 clusters of data points, this does suggest heteroskedasticity and we should report HWC standard errors for publication. This appears to be a result of very long last in sessions.
|
Adding mobile:lastinsession to Model 3
editAdding this variable to the table improves the model's fit and predictive performance. More importantly, it leads to a quite striking change in conclusions. We now find that, accounting for the terms in the model:
- Mobile readers dwell for longer on average.
- This is true in countries that are more developed than average.
- However, in less developed countries, the "device gap" reemerges and mobile readers have shorter dwell times than mobile readers on average.
model 2 | model 3 | model 3 with quadratic HDI | model 3 with cubic HDI | model 3 with quartic HDI | model 3 with quadratic HDI:mobile | ||
---|---|---|---|---|---|---|---|
Intercept | 9.7900 (0.0077)*** | 9.8504 (0.0078)*** | 9.8534 (0.0078)*** | 9.8197 (0.0078)*** | 9.8326 (0.0078)*** | 9.8321 (0.0079)*** | |
mobile | 0.1134 (0.0015)*** | 0.0174 (0.0023)*** | 0.0284 (0.0023)*** | 0.0347 (0.0023)*** | 0.0373 (0.0023)*** | 0.0387 (0.0023)*** | |
Human Development Index | -0.0891 (0.0009)*** | -0.1501 (0.0014)*** | -0.0883 (0.0022)*** | -0.0195 (0.0026)*** | 0.0063 (0.0028)* | -0.0006 (0.0034) | |
mobile : HDI | 0.1064 (0.0019)*** | 0.0931 (0.0019)*** | 0.0847 (0.0019)*** | 0.0838 (0.0019)*** | 0.0950 (0.0036)*** | ||
mobile : HDI^2 | -0.0104 (0.0028)*** | ||||||
HDI^2 | -0.0518 (0.0014)*** | 0.0201 (0.0020)*** | -0.0628 (0.0042)*** | -0.0560 (0.0046)*** | |||
HDI^3 | -0.0807 (0.0016)*** | -0.0904 (0.0017)*** | -0.0908 (0.0017)*** | ||||
HDI^4 | 0.0392 (0.0018)*** | 0.0390 (0.0018)*** | |||||
Revision length (bytes) | 0.1679 (0.0005)*** | 0.1678 (0.0005)*** | 0.1678 (0.0005)*** | 0.1677 (0.0005)*** | 0.1678 (0.0005)*** | 0.1677 (0.0005)*** | |
time to first paint | -0.0161 (0.0006)*** | -0.0160 (0.0006)*** | -0.0157 (0.0006)*** | -0.0155 (0.0006)*** | -0.0156 (0.0006)*** | -0.0156 (0.0006)*** | |
time to dom interactive | 0.0031 (0.0009)*** | 0.0031 (0.0009)*** | 0.0030 (0.0009)*** | 0.0031 (0.0009)*** | 0.0031 (0.0009)*** | 0.0031 (0.0009)*** | |
sessionlength | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | |
lastinsessionTRUE | 0.9461 (0.0015)*** | 0.9412 (0.0015)*** | 0.9404 (0.0015)*** | 0.9404 (0.0015)*** | 0.9401 (0.0015)*** | 0.9402 (0.0015)*** | |
nthinsession | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | |
dayofweekMon | 0.0891 (0.0020)*** | 0.0891 (0.0020)*** | 0.0890 (0.0020)*** | 0.0892 (0.0020)*** | 0.0890 (0.0020)*** | 0.0891 (0.0020)*** | |
dayofweekSat | 0.0148 (0.0020)*** | 0.0130 (0.0020)*** | 0.0127 (0.0020)*** | 0.0121 (0.0020)*** | 0.0123 (0.0020)*** | 0.0124 (0.0020)*** | |
dayofweekSun | 0.0316 (0.0020)*** | 0.0304 (0.0020)*** | 0.0298 (0.0020)*** | 0.0290 (0.0020)*** | 0.0292 (0.0020)*** | 0.0293 (0.0020)*** | |
dayofweekThu | 0.0507 (0.0020)*** | 0.0506 (0.0020)*** | 0.0505 (0.0020)*** | 0.0506 (0.0020)*** | 0.0504 (0.0020)*** | 0.0504 (0.0020)*** | |
dayofweekTue | 0.0307 (0.0020)*** | 0.0310 (0.0020)*** | 0.0310 (0.0020)*** | 0.0313 (0.0020)*** | 0.0310 (0.0020)*** | 0.0311 (0.0020)*** | |
dayofweekWed | 0.0708 (0.0019)*** | 0.0707 (0.0019)*** | 0.0705 (0.0019)*** | 0.0707 (0.0019)*** | 0.0705 (0.0019)*** | 0.0705 (0.0019)*** | |
usermonth4 | 0.0095 (0.0096) | 0.0096 (0.0096) | 0.0096 (0.0096) | 0.0098 (0.0096) | 0.0097 (0.0096) | 0.0097 (0.0096) | |
usermonth5 | 0.0105 (0.0096) | 0.0102 (0.0096) | 0.0099 (0.0096) | 0.0097 (0.0096) | 0.0098 (0.0096) | 0.0098 (0.0096) | |
usermonth6 | -0.0095 (0.0098) | -0.0098 (0.0098) | -0.0099 (0.0098) | -0.0099 (0.0098) | -0.0099 (0.0098) | -0.0098 (0.0098) | |
usermonth7 | -0.0482 (0.0098)*** | -0.0489 (0.0098)*** | -0.0485 (0.0098)*** | -0.0480 (0.0098)*** | -0.0479 (0.0098)*** | -0.0478 (0.0098)*** | |
usermonth8 | -0.0128 (0.0097) | -0.0134 (0.0097) | -0.0133 (0.0097) | -0.0126 (0.0097) | -0.0127 (0.0097) | -0.0126 (0.0097) | |
usermonth9 | 0.0385 (0.0076)*** | 0.0391 (0.0076)*** | 0.0390 (0.0076)*** | 0.0393 (0.0076)*** | 0.0392 (0.0076)*** | 0.0392 (0.0076)*** | |
usermonth10 | 0.0023 (0.0075) | 0.0021 (0.0075) | 0.0012 (0.0075) | 0.0005 (0.0075) | 0.0007 (0.0075) | 0.0007 (0.0075) | |
mobileTRUE:lastinsessionTRUE | -0.6636 (0.0021)*** | -0.6568 (0.0021)*** | -0.6575 (0.0021)*** | -0.6584 (0.0021)*** | -0.6582 (0.0021)*** | -0.6584 (0.0021)*** | |
R2 | 0.0613 | 0.0616 | 0.0617 | 0.0620 | 0.0620 | 0.0620 | |
Adj. R2 | 0.0613 | 0.0616 | 0.0617 | 0.0619 | 0.0620 | 0.0620 | |
Num. obs. | 9873641 | 9873641 | 9873641 | 9873641 | 9873641 | 9873641 | |
RMSE | 14.3154 | 14.3131 | 14.3121 | 14.3102 | 14.3099 | 14.3099 | |
***p < 0.001, **p < 0.01, *p < 0.05 |
Out of sample predictions
editRmse | Rsqr | name |
---|---|---|
1.677372 | 0.0542021 | model 2 |
1.675276 | 0.0700483 | model 3 |
1.677113 | 0.0688259 | model 3 with quadratic HDI |
1.676338 | 0.0718099 | model 3 with cubic HDI |
1.677631 | 0.0741799 | model 3 with quartic HDI |
1.677640 | 0.0743893 | model 3 with quadratic HDI:mobile |
Residuals for M3v2
editWe still have clusters in the predicted vs residuals plot. Adding the interaction term has improved things to some extent. However, it appears that it may be difficult to fully correct for the deskop + last-in-session patterns. This motivates checking if our results are robust to very long reading times.
Robustness check: removing long dwell times
editRemoving dwell times longer than 1 hour improves the model diagnostics, but the qualitative conclusions from the above model are robust.
Marginal Effects Plot
editRegression Tables
editmodel 2 | model 3 | model 3 with quadratic HDI | model 3 with cubic HDI | model 3 with quartic HDI | model 3 with quadratic HDI:mobile | ||
---|---|---|---|---|---|---|---|
Intercept | 9.7791 (0.0074)*** | 9.8431 (0.0074)*** | 9.8465 (0.0074)*** | 9.8099 (0.0075)*** | 9.8208 (0.0075)*** | 9.8213 (0.0075)*** | |
mobile | 0.1312 (0.0014)*** | 0.0302 (0.0022)*** | 0.0430 (0.0022)*** | 0.0499 (0.0022)*** | 0.0521 (0.0022)*** | 0.0508 (0.0022)*** | |
Human Development Index | -0.0911 (0.0009)*** | -0.1556 (0.0014)*** | -0.0844 (0.0021)*** | -0.0095 (0.0025)*** | 0.0124 (0.0027)*** | 0.0188 (0.0032)*** | |
mobile : HDI | 0.1117 (0.0018)*** | 0.0963 (0.0019)*** | 0.0871 (0.0019)*** | 0.0863 (0.0019)*** | 0.0759 (0.0034)*** | ||
mobile : HDI^2 | 0.0097 (0.0027)*** | ||||||
HDI^2 | -0.0596 (0.0013)*** | 0.0186 (0.0019)*** | -0.0517 (0.0040)*** | -0.0581 (0.0044)*** | |||
HDI^3 | -0.0878 (0.0015)*** | -0.0960 (0.0016)*** | -0.0956 (0.0016)*** | ||||
HDI^4 | 0.0333 (0.0017)*** | 0.0335 (0.0017)*** | |||||
Revision length (bytes) | 0.1643 (0.0004)*** | 0.1643 (0.0004)*** | 0.1643 (0.0004)*** | 0.1642 (0.0004)*** | 0.1642 (0.0004)*** | 0.1642 (0.0004)*** | |
time to first paint | -0.0250 (0.0006)*** | -0.0248 (0.0006)*** | -0.0245 (0.0006)*** | -0.0243 (0.0006)*** | -0.0243 (0.0006)*** | -0.0243 (0.0006)*** | |
time to dom interactive | 0.0034 (0.0008)*** | 0.0034 (0.0008)*** | 0.0032 (0.0008)*** | 0.0034 (0.0008)*** | 0.0033 (0.0008)*** | 0.0033 (0.0008)*** | |
sessionlength | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | -0.0001 (0.0000)*** | |
lastinsessionTRUE | 0.7958 (0.0014)*** | 0.7905 (0.0014)*** | 0.7895 (0.0014)*** | 0.7895 (0.0014)*** | 0.7893 (0.0014)*** | 0.7892 (0.0014)*** | |
nthinsession | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | 0.0002 (0.0000)*** | |
dayofweekMon | 0.0795 (0.0019)*** | 0.0795 (0.0019)*** | 0.0794 (0.0019)*** | 0.0796 (0.0019)*** | 0.0795 (0.0019)*** | 0.0795 (0.0019)*** | |
dayofweekSat | 0.0098 (0.0019)*** | 0.0078 (0.0019)*** | 0.0075 (0.0019)*** | 0.0069 (0.0019)*** | 0.0071 (0.0019)*** | 0.0071 (0.0019)*** | |
dayofweekSun | 0.0333 (0.0019)*** | 0.0320 (0.0019)*** | 0.0313 (0.0019)*** | 0.0305 (0.0019)*** | 0.0307 (0.0019)*** | 0.0307 (0.0019)*** | |
dayofweekThu | 0.0524 (0.0019)*** | 0.0524 (0.0019)*** | 0.0522 (0.0019)*** | 0.0523 (0.0019)*** | 0.0522 (0.0019)*** | 0.0522 (0.0019)*** | |
dayofweekTue | 0.0332 (0.0019)*** | 0.0334 (0.0019)*** | 0.0334 (0.0019)*** | 0.0338 (0.0019)*** | 0.0336 (0.0019)*** | 0.0336 (0.0019)*** | |
dayofweekWed | 0.0693 (0.0019)*** | 0.0692 (0.0019)*** | 0.0690 (0.0019)*** | 0.0691 (0.0019)*** | 0.0690 (0.0019)*** | 0.0690 (0.0019)*** | |
usermonth4 | 0.0024 (0.0092) | 0.0025 (0.0092) | 0.0025 (0.0092) | 0.0027 (0.0092) | 0.0026 (0.0092) | 0.0026 (0.0092) | |
usermonth5 | -0.0002 (0.0091) | -0.0005 (0.0091) | -0.0009 (0.0091) | -0.0011 (0.0091) | -0.0011 (0.0091) | -0.0011 (0.0091) | |
usermonth6 | -0.0127 (0.0094) | -0.0129 (0.0094) | -0.0131 (0.0094) | -0.0132 (0.0093) | -0.0131 (0.0093) | -0.0131 (0.0093) | |
usermonth7 | -0.0498 (0.0093)*** | -0.0506 (0.0093)*** | -0.0501 (0.0093)*** | -0.0495 (0.0093)*** | -0.0495 (0.0093)*** | -0.0495 (0.0093)*** | |
usermonth8 | -0.0148 (0.0093) | -0.0154 (0.0093) | -0.0153 (0.0093) | -0.0146 (0.0093) | -0.0146 (0.0093) | -0.0146 (0.0093) | |
usermonth9 | 0.0384 (0.0073)*** | 0.0389 (0.0073)*** | 0.0388 (0.0073)*** | 0.0392 (0.0073)*** | 0.0391 (0.0073)*** | 0.0391 (0.0073)*** | |
usermonth10 | -0.0020 (0.0072) | -0.0023 (0.0072) | -0.0033 (0.0072) | -0.0041 (0.0072) | -0.0039 (0.0072) | -0.0039 (0.0072) | |
mobileTRUE:lastinsessionTRUE | -0.5163 (0.0020)*** | -0.5092 (0.0020)*** | -0.5099 (0.0020)*** | -0.5109 (0.0020)*** | -0.5107 (0.0020)*** | -0.5105 (0.0020)*** | |
R2 | 0.0518 | 0.0522 | 0.0524 | 0.0527 | 0.0527 | 0.0527 | |
Adj. R2 | 0.0518 | 0.0522 | 0.0524 | 0.0527 | 0.0527 | 0.0527 | |
Num. obs. | 9787783 | 9787783 | 9787783 | 9787783 | 9787783 | 9787783 | |
RMSE | 13.5945 | 13.5919 | 13.5905 | 13.5882 | 13.5879 | 13.5879 | |
***p < 0.001, **p < 0.01, *p < 0.05 |
Diagnostic Plots
edit
Model 3 residuals Histogram --- long times removed. This chart plots residuals for a robustness check of model 3 in dwell times over 1 hour have been removed. It provides similar information to the QQ plot: the residuals have too many extreme high and low values to be perfectly fit by a normal distribution.
|
Model 3v2 residuals values plotted against predicted values --- long times removed. This chart plots the predicted values of model 3, with long dwell times removed, against the residuals. This type of plot is useful for identifying heteroskedasticity in the data that is not accounted for in the model. Removing long dwell times makes this data seem much more gaussian.
|
Model 3 residuals QQ plot --- no long times. This chart tests the assumption of OLS regression that the residuals of the model are normal. The plot shows that a normal distribution fits most of the distribution well. However, removing very long dwell times means that there are fewer long dwell times than a normal distribution would predict. This is not a great surprise. While the reading time data are do not perfectly fit a log-normal distribution, we are using a log normal distribution to model reading times.
|