Research talk:Active editor spike 2015/Work log/2015-03-22
Sunday, March 22, 2015
editGot new data. I'm currently updating the bits that tell me how a user registered. That will be important for doing the breakdowns, but for now I can look at overall stats.
Query
|
---|
> select count(*) from editor_month; +----------+ | count(*) | +----------+ | 41013773 | +----------+ 1 row in set (0.02 sec) |
Let's look at enwiki's January's
Query
|
---|
> select month, count(*) from editor_month where wiki = "enwiki" and month like "%01" and revisions >= 5 group by 1; +--------+----------+ | month | count(*) | +--------+----------+ | 200101 | 4 | | 200201 | 137 | | 200301 | 561 | | 200401 | 1724 | | 200501 | 7406 | | 200601 | 34839 | | 200701 | 72916 | | 200801 | 61884 | | 200901 | 56696 | | 201001 | 52340 | | 201101 | 48680 | | 201201 | 45255 | | 201301 | 43505 | | 201401 | 41935 | | 201501 | 42920 | +--------+----------+ 15 rows in set (16.60 sec) |
How about february's?
Query
|
---|
> select month, count(*) from editor_month where wiki = "enwiki" and month like "%02" and revisions >= 5 group by 1; +--------+----------+ | month | count(*) | +--------+----------+ | 200102 | 8 | | 200202 | 168 | | 200302 | 616 | | 200402 | 2359 | | 200502 | 7446 | | 200602 | 36901 | | 200702 | 73340 | | 200802 | 61864 | | 200902 | 54648 | | 201002 | 50614 | | 201102 | 47619 | | 201202 | 44829 | | 201302 | 40983 | | 201402 | 39732 | | 201502 | 41925 | +--------+----------+ 15 rows in set (16.44 sec) |
Definitely something showing up in the year-over-year. Let's try all the wikis.
Query big wikis
|
---|
> SELECT -> wiki, -> last_year.ae AS last_year_ae, -> this_year.ae - last_year.ae AS change_abs, -> (this_year.ae - last_year.ae)/last_year.ae AS change_prop -> FROM ( -> SELECT -> wiki, -> COUNT(*) as ae -> FROM editor_month -> WHERE month LIKE "2015%" AND (month LIKE "%01" OR month LIKE "%02") -> AND revisions >= 5 -> GROUP BY wiki -> ) AS this_year -> LEFT JOIN ( -> SELECT -> wiki, -> COUNT(*) as ae -> FROM editor_month -> WHERE month LIKE "2014%" AND (month LIKE "%01" OR month LIKE "%02") -> AND revisions >= 5 -> GROUP BY wiki -> ) AS last_year USING (wiki) -> WHERE last_year.ae >= 10000 -> ORDER BY change_prop DESC -> LIMIT 50; +--------------+--------------+------------+-------------+ | wiki | last_year_ae | change_abs | change_prop | +--------------+--------------+------------+-------------+ | wikidatawiki | 10282 | 2229 | 0.2168 | | commonswiki | 20338 | 1475 | 0.0725 | | enwiki | 81667 | 3178 | 0.0389 | | frwiki | 11720 | -367 | -0.0313 | | dewiki | 16672 | -1139 | -0.0683 | +--------------+--------------+------------+-------------+ 5 rows in set (18.09 sec) |
We see wikidatawiki expressing a high growth rate given it's size at 20 points. Commonswiki and enwiki also had substantial growth, but frwiki and dewiki saw decline.
What about the next step down in wiki-size?
Query middle wikis
|
---|
> SELECT -> wiki, -> last_year.ae AS last_year_ae, -> this_year.ae - last_year.ae AS change_abs, -> (this_year.ae - last_year.ae)/last_year.ae AS change_prop -> FROM ( -> SELECT -> wiki, -> COUNT(*) as ae -> FROM editor_month -> WHERE month LIKE "2015%" AND (month LIKE "%01" OR month LIKE "%02") -> AND revisions >= 5 -> GROUP BY wiki -> ) AS this_year -> LEFT JOIN ( -> SELECT -> wiki, -> COUNT(*) as ae -> FROM editor_month -> WHERE month LIKE "2014%" AND (month LIKE "%01" OR month LIKE "%02") -> AND revisions >= 5 -> GROUP BY wiki -> ) AS last_year USING (wiki) -> WHERE last_year.ae BETWEEN 1000 and 10000 -> ORDER BY change_prop DESC -> LIMIT 50; +----------+--------------+------------+-------------+ | wiki | last_year_ae | change_abs | change_prop | +----------+--------------+------------+-------------+ | trwiki | 1489 | 511 | 0.3432 | | idwiki | 1098 | 351 | 0.3197 | | arwiki | 1992 | 590 | 0.2962 | | ukwiki | 1701 | 358 | 0.2105 | | kowiki | 1850 | 321 | 0.1735 | | cawiki | 1152 | 164 | 0.1424 | | zhwiki | 5130 | 664 | 0.1294 | | eswiki | 9901 | 1205 | 0.1217 | | metawiki | 1725 | 171 | 0.0991 | | ptwiki | 3893 | 307 | 0.0789 | | cswiki | 1585 | 61 | 0.0385 | | jawiki | 8528 | 249 | 0.0292 | | svwiki | 1959 | 2 | 0.0010 | | itwiki | 6636 | -113 | -0.0170 | | hewiki | 1919 | -41 | -0.0214 | | plwiki | 3432 | -137 | -0.0399 | | nlwiki | 3248 | -147 | -0.0453 | | fiwiki | 1342 | -86 | -0.0641 | | ruwiki | 9204 | -627 | -0.0681 | | fawiki | 2255 | -194 | -0.0860 | | huwiki | 1389 | -133 | -0.0958 | | nowiki | 1167 | -196 | -0.1680 | +----------+--------------+------------+-------------+ 22 rows in set (17.14 sec) |
There's a lot more shrinkage happening here, but still the majority of wikis saw growth. Once we go beneath this level we'll have too many wikis to look at individually, so I'd like some summary statistic. I think I'm going to go with the overall absolute change sum for each group.
Query totals
|
---|
> SELECT -> IF(jan_counts.last_year_ae < 1000, "small", -> IF(jan_counts.last_year_ae BETWEEN 1000 AND 10000, "medium", "large")) AS size, -> SUM(jan_counts.last_year_ae) last_january, -> SUM(jan_counts.change_abs) AS january_change, -> SUM(jan_counts.change_abs)/SUM(jan_counts.last_year_ae) AS january_change_prop, -> SUM(feb_counts.last_year_ae) last_february, -> SUM(feb_counts.change_abs) AS february_change, -> SUM(feb_counts.change_abs)/SUM(feb_counts.last_year_ae) AS february_change_prop -> FROM ( -> SELECT -> wiki, -> last_year.ae AS last_year_ae, -> this_year.ae - last_year.ae AS change_abs, -> (this_year.ae - last_year.ae)/last_year.ae AS change_prop -> FROM ( -> SELECT -> wiki, -> COUNT(*) as ae -> FROM editor_month -> WHERE month LIKE "2015%" AND (month LIKE "%01") -> AND revisions >= 5 -> GROUP BY wiki -> ) AS this_year -> LEFT JOIN ( -> SELECT -> wiki, -> COUNT(*) as ae -> FROM editor_month -> WHERE month LIKE "2014%" AND (month LIKE "%01") -> AND revisions >= 5 -> GROUP BY wiki -> ) AS last_year USING (wiki) -> ) AS jan_counts -> LEFT JOIN ( -> SELECT -> wiki, -> last_year.ae AS last_year_ae, -> this_year.ae - last_year.ae AS change_abs, -> (this_year.ae - last_year.ae)/last_year.ae AS change_prop -> FROM ( -> SELECT -> wiki, -> COUNT(*) as ae -> FROM editor_month -> WHERE month LIKE "2015%" AND (month LIKE "%02") -> AND revisions >= 5 -> GROUP BY wiki -> ) AS this_year -> LEFT JOIN ( -> SELECT -> wiki, -> COUNT(*) as ae -> FROM editor_month -> WHERE month LIKE "2014%" AND (month LIKE "%02") -> AND revisions >= 5 -> GROUP BY wiki -> ) AS last_year USING (wiki) -> ) AS feb_counts USING (wiki) -> GROUP BY 1; +--------+--------------+----------------+---------------------+---------------+-----------------+----------------------+ | size | last_january | january_change | january_change_prop | last_february | february_change | february_change_prop | +--------+--------------+----------------+---------------------+---------------+-----------------+----------------------+ | large | 52253 | 2761 | 0.0528 | 49752 | 1892 | 0.0380 | | medium | 50333 | 480 | 0.0095 | 46438 | 2001 | 0.0431 | | small | 22602 | -248 | -0.0110 | 19827 | 1603 | 0.0808 | +--------+--------------+----------------+---------------------+---------------+-----------------+----------------------+ 3 rows in set (33.47 sec) |
size | January | February | ||||
---|---|---|---|---|---|---|
total ae | yoy change | yoy prop change | total ae | yoy change | yoy prop change | |
large | 52253 | 2761 | 0.0528 | 49752 | 1892 | 0.0380 |
medium | 50333 | 480 | 0.0095 | 46438 | 2001 | 0.0431 |
small | 22602 | -248 | -0.0110 | 19827 | 1603 | 0.0808 |
OK. So it looks like the trend is shared across big and small wikis when examined as groups. Next, I'll be working on figuring out which sub-group is most affected by this trend. --Halfak (WMF) (talk) 17:15, 22 March 2015 (UTC)