Research talk:Teahouse long term new editor retention/Work log/20151020
Tuesday, October 20, 2015 Edit
Today I'm extending my analysis to include multivariate regressions. I'll be using a en:logistic regression to look for differences in the proportion of surviving newcomers. I'll be including the preinvite statistics I worked on last time to control for random effects around the effect of invitation. I expect the effect invitation to become more prominent after controlling for these random effects. I also expect to see some interactions between either initial investment (# of edits preinvite)/negative feedback and the invitation condition. A positive relationship would suggest that newcomers who are highly invested and get negative feedback gain more "survivalness" from the invite.
Checking for bucketing bias Edit
The following list of wilcox and Chi^2 tests check for significant differences between the preinvite predictors between conditions. Scalars are noted by quantiles (0%, 25%, 50%, 75% and 100%). Logicals by their proportion.
 edits control=56711215 invited=56712241 W=18158354.5 p=0.597
 main_edits control=0469212 invited=0469241 W=18327891 p=0.181
 talk_edits control=000035 invited=000037 W=17914856.5 p=0.188
 user_edits control=000174 invited=0001232 W=17823485.5 p=0.163
 user_talk_edits control=000024 invited=000090 W=17958249 p=0.444
 wp_edits control=000019 invited=000079 W=18091557 p=0.581
 other_edits control=000096 invited=0000125 W=18104004.5 p=0.552
 vandal_warning control=0.121 invited=0.123 Xsquared=0.153 p=0.696
 spam_warning control=0.027 invited=0.026 Xsquared=0.081 p=0.776
 copyright_warning control=0.003 invited=0.003 Xsquared=0.304 p=0.582
 general_warning control=0.221 invited=0.214 Xsquared=0.57 p=0.45
 block control=0.001 invited=0.002 Xsquared=0.729 p=0.393
 welcome control=0 invited=0 Xsquared=0 p=1
 csd control=0.035 invited=0.031 Xsquared=1.234 p=0.267
 deletion control=0.051 invited=0.047 Xsquared=0.696 p=0.404
 afc control=0 invited=0 Xsquared=NaN p=NaN
 teahouse control=0 invited=0 Xsquared=NaN p=NaN
No significant differences here.
Predicting 1+ edits Edit
Now to build some logistic models that account for these preinvite predictors.
 3 to 4 weeks
Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 3.40157 0.28915 11.764 < 2e16 *** grpinvited 0.45260 0.31556 1.434 0.151491 log(edits + 1) 0.36207 0.12951 2.796 0.005181 ** log(main_edits + 1) 0.09796 0.04883 2.006 0.044832 * log(talk_edits + 1) 0.19515 0.07423 2.629 0.008563 ** log(user_edits + 1) 0.02817 0.04907 0.574 0.565918 log(user_talk_edits + 1) 0.22418 0.06657 3.368 0.000758 *** log(wp_edits + 1) 0.13143 0.09053 1.452 0.146561 general_warningTRUE 0.52963 0.19096 2.773 0.005547 ** csdTRUE 1.52286 0.72156 2.111 0.034813 * deletionTRUE 0.41958 0.37541 1.118 0.263717 grpinvited:log(edits + 1) 0.23479 0.12669 1.853 0.063855 . grpinvited:general_warningTRUE 0.04394 0.21149 0.208 0.835434 grpinvited:csdTRUE 1.09061 0.75507 1.444 0.148633 grpinvited:deletionTRUE 0.21884 0.42406 0.516 0.605816  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 8869.9 on 14765 degrees of freedom Residual deviance: 8564.5 on 14751 degrees of freedom AIC: 8594.5 Number of Fisher Scoring iterations: 6
First, the obvious effects. We see the usual suspects here. The more edits you do  overall, but especially talking  the more likely you are to be retained. We also see some substantially negative effects of warning messages and CSD notifications.
We see a negative effect of invitation here, but it looks like the combined effect of grpinvited:log(edits + 1)
counteracts that for editors who saved (log(x+1)=2, x=6) edits or more when the invite was posted. For any editor who saved more than 6 edits (highly motivated), it looks like the invite might be substantially improving retention at scale with how much editing they are doing. But the effect remains insignificant (marginal @ 0.064).
Counter to my suspicions, I don't think we're seeing solid evidence of an interaction between being invited to the teahouse and surviving despite negative feedback (csd & warning). It could be that this is due to too low of observations.
Just for the sake of making sure that my previous analysis wasn't totally off, let's try the model with just the invite as a predictor.
 3 to 4 weeks (single predictor)
Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 2.44393 0.06633 36.843 <2e16 *** grpinvited 0.14830 0.07369 2.012 0.0442 *  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 8869.9 on 14765 degrees of freedom Residual deviance: 8865.7 on 14764 degrees of freedom AIC: 8869.7 Number of Fisher Scoring iterations: 5
Sure enough. Getting the invite seems to look significant on its own. OK! Now to try the longterm retention outcomes.
 1 to 2 months
Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 3.38622 0.24595 13.768 < 2e16 *** grpinvited 0.06769 0.27003 0.251 0.802051 log(edits + 1) 0.50057 0.11032 4.537 5.7e06 *** log(main_edits + 1) 0.14552 0.04364 3.335 0.000854 *** log(talk_edits + 1) 0.17592 0.06710 2.622 0.008744 ** log(user_edits + 1) 0.05333 0.04358 1.224 0.220999 log(user_talk_edits + 1) 0.11024 0.06179 1.784 0.074370 . log(wp_edits + 1) 0.07868 0.08374 0.940 0.347399 general_warningTRUE 0.50829 0.15991 3.179 0.001480 ** csdTRUE 0.95765 0.46844 2.044 0.040918 * deletionTRUE 0.76776 0.35457 2.165 0.030360 * grpinvited:log(edits + 1) 0.01426 0.10815 0.132 0.895065 grpinvited:general_warningTRUE 0.10018 0.17865 0.561 0.574961 grpinvited:csdTRUE 0.57811 0.50508 1.145 0.252384 grpinvited:deletionTRUE 0.23804 0.39010 0.610 0.541725  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 11229 on 14765 degrees of freedom Residual deviance: 10845 on 14751 degrees of freedom AIC: 10875 Number of Fisher Scoring iterations: 5
 2 to 6 months
Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 3.19216 0.23718 13.459 < 2e16 *** grpinvited 0.08069 0.26070 0.310 0.75694 log(edits + 1) 0.43819 0.10755 4.074 4.61e05 *** log(main_edits + 1) 0.18349 0.04316 4.252 2.12e05 *** log(talk_edits + 1) 0.21654 0.06491 3.336 0.00085 *** log(user_edits + 1) 0.03663 0.04303 0.851 0.39460 log(user_talk_edits + 1) 0.05446 0.06177 0.882 0.37797 log(wp_edits + 1) 0.11562 0.08142 1.420 0.15563 general_warningTRUE 0.53228 0.15243 3.492 0.00048 *** csdTRUE 0.55369 0.37929 1.460 0.14435 deletionTRUE 0.69683 0.32396 2.151 0.03148 * grpinvited:log(edits + 1) 0.01059 0.10513 0.101 0.91977 grpinvited:general_warningTRUE 0.10878 0.17073 0.637 0.52403 grpinvited:csdTRUE 0.04731 0.42470 0.111 0.91130 grpinvited:deletionTRUE 0.14502 0.36032 0.402 0.68732  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 11889 on 14765 degrees of freedom Residual deviance: 11496 on 14751 degrees of freedom AIC: 11526 Number of Fisher Scoring iterations: 5
Similar story here, but it doesn't seem like the effect of the invite isn't even marginally significant. Onto the 5+ measures.
Predicting 5+ edits Edit
Same story as above except survival only counts when there's 5+ edits in the survival period.
 3 to 4 weeks
Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 4.65842 0.39320 11.847 < 2e16 *** grpinvited 0.61702 0.42874 1.439 0.15011 log(edits + 1) 0.44489 0.17306 2.571 0.01015 * log(main_edits + 1) 0.20667 0.06855 3.015 0.00257 ** log(talk_edits + 1) 0.16150 0.10187 1.585 0.11289 log(user_edits + 1) 0.17094 0.06630 2.578 0.00993 ** log(user_talk_edits + 1) 0.21420 0.08913 2.403 0.01625 * log(wp_edits + 1) 0.19019 0.11490 1.655 0.09787 . general_warningTRUE 0.78168 0.30218 2.587 0.00969 ** csdTRUE 1.42153 1.01750 1.397 0.16239 deletionTRUE 0.29195 0.52487 0.556 0.57806 grpinvited:log(edits + 1) 0.26699 0.16513 1.617 0.10591 grpinvited:general_warningTRUE 0.22127 0.33150 0.667 0.50447 grpinvited:csdTRUE 1.26795 1.05635 1.200 0.23001 grpinvited:deletionTRUE 0.48490 0.60548 0.801 0.42322  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 5101.0 on 14765 degrees of freedom Residual deviance: 4814.7 on 14751 degrees of freedom AIC: 4844.7 Number of Fisher Scoring iterations: 7
Again, we see a lack of significant independent effect for the invitation. Again, we also see the marginially significant interaction with log(edits + 1) suggesting that the invitation might be more effective for newcomers who save a lot of edits before getting the invitation.
Onto the longterm outcomes:
 1 to 2 months
Regression with multicollinearity problem


Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 4.07573 0.32035 12.723 < 2e16 *** grpinvited 0.40777 0.34891 1.169 0.24253 log(edits + 1) 0.39092 0.14264 2.741 0.00613 ** log(main_edits + 1) 0.24877 0.05703 4.362 1.29e05 *** log(talk_edits + 1) 0.20831 0.08312 2.506 0.01221 * log(user_edits + 1) 0.16667 0.05522 3.018 0.00254 ** log(user_talk_edits + 1) 0.18157 0.07557 2.403 0.01627 * log(wp_edits + 1) 0.15650 0.09935 1.575 0.11521 general_warningTRUE 0.55499 0.22216 2.498 0.01249 * csdTRUE 12.75334 135.48352 0.094 0.92500 deletionTRUE 1.16388 0.59382 1.960 0.05000 * grpinvited:log(edits + 1) 0.21278 0.13657 1.558 0.11923 grpinvited:general_warningTRUE 0.07856 0.24671 0.318 0.75015 grpinvited:csdTRUE 12.36522 135.48374 0.091 0.92728 grpinvited:deletionTRUE 0.43928 0.63691 0.690 0.49038  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 7482.1 on 14765 degrees of freedom Residual deviance: 7085.4 on 14751 degrees of freedom AIC: 7115.4 Number of Fisher Scoring iterations: 14 
Yikes! here, we're seeing too much correlation between getting a 'csd' message and being invited. Going to need to drop the predictor.
Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 4.07257 0.31851 12.786 < 2e16 *** grpinvited 0.41388 0.34713 1.192 0.23315 log(edits + 1) 0.39039 0.14205 2.748 0.00599 ** log(main_edits + 1) 0.24097 0.05671 4.249 2.14e05 *** log(talk_edits + 1) 0.18318 0.08288 2.210 0.02710 * log(user_edits + 1) 0.15665 0.05482 2.858 0.00427 ** log(user_talk_edits + 1) 0.17481 0.07540 2.318 0.02043 * log(wp_edits + 1) 0.15835 0.09922 1.596 0.11051 general_warningTRUE 0.62899 0.22163 2.838 0.00454 ** deletionTRUE 1.28957 0.59291 2.175 0.02963 * grpinvited:log(edits + 1) 0.22144 0.13593 1.629 0.10329 grpinvited:general_warningTRUE 0.02041 0.24601 0.083 0.93387 grpinvited:deletionTRUE 0.52833 0.63556 0.831 0.40581  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 7482.1 on 14765 degrees of freedom Residual deviance: 7100.9 on 14753 degrees of freedom AIC: 7126.9 Number of Fisher Scoring iterations: 6
 2 to 6 months
Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 4.313717 0.293284 14.708 < 2e16 *** grpinvited 0.363524 0.320604 1.134 0.256848 log(edits + 1) 0.672387 0.128262 5.242 1.59e07 *** log(main_edits + 1) 0.160990 0.051039 3.154 0.001609 ** log(talk_edits + 1) 0.249009 0.074702 3.333 0.000858 *** log(user_edits + 1) 0.002232 0.051036 0.044 0.965120 log(user_talk_edits + 1) 0.047366 0.073726 0.642 0.520577 log(wp_edits + 1) 0.166464 0.092988 1.790 0.073429 . general_warningTRUE 0.731064 0.209830 3.484 0.000494 *** deletionTRUE 0.977154 0.467432 2.090 0.036575 * grpinvited:log(edits + 1) 0.089354 0.124897 0.715 0.474347 grpinvited:general_warningTRUE 0.033336 0.232662 0.143 0.886070 grpinvited:deletionTRUE 0.286314 0.510743 0.561 0.575081  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 8502.0 on 14765 degrees of freedom Residual deviance: 8129.1 on 14753 degrees of freedom AIC: 8155.1 Number of Fisher Scoring iterations: 6
Well, the direction and scale of the coefs don't change. We don't see independent significance in the effect of the invitation or it's interaction with previous activity.
Again, just to check my sanity, let's try the 2 to 6 month regression with the bucket as the single predictor.
Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 2.52590 0.06867 36.78 <2e16 *** grpinvited 0.16681 0.07617 2.19 0.0285 *  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 8502 on 14765 degrees of freedom Residual deviance: 8497 on 14764 degrees of freedom AIC: 8501 Number of Fisher Scoring iterations: 5
Sure enough, there's the significant effect I saw in the simple Chi^2 test. Halfak (WMF) (talk) 18:54, 20 October 2015 (UTC)