Research talk:Measuring article importance/Work log/2014-10-23
Thursday, October 23, 2014
editI want to gather a set of datasets:
- WikiProject importance classifications
- Count of inlinks per page
- Page view rate per page
WikiProject importance classifications
editSee http://quarry.wmflabs.org/query/794 (Article importance classes for English Wikipedia).
> select importance, COUNT(DISTINCT page_id) from importance_classification group by importance; +------------+-------------------------+ | importance | COUNT(DISTINCT page_id) | +------------+-------------------------+ | Top | 35639 | | High | 144344 | | Mid | 584461 | | Low | 2359667 | | Unknown | 1794587 | +------------+-------------------------+ 5 rows in set (5.79 sec)
> select LEFT(last_update, 4) AS year, COUNT(DISTINCT page_id) from importance_classification group by year; +------+-------------------------+ | year | COUNT(DISTINCT page_id) | +------+-------------------------+ | 2006 | 22813 | | 2007 | 166217 | | 2008 | 536320 | | 2009 | 786836 | | 2010 | 681561 | | 2011 | 1070609 | | 2012 | 802999 | | 2013 | 590660 | | 2014 | 587419 | +------+-------------------------+ 9 rows in set (6.80 sec)
Interestingly, it appears that many importance ratings are quite old. It's surprising that, for the most part, importance ratings are updated at a consistent rate. I'd suspect that most importance ratings don't change. --Halfak (WMF) (talk) 14:58, 23 October 2014 (UTC)
Count of inlinks per article
editSee http://quarry.wmflabs.org/query/806 (Inlink counts for all English Wikipedia articles).
Query is still running. I'll come back later to post an update. --Halfak (WMF) (talk) 14:58, 23 October 2014 (UTC)
Page view rate
editThis, I need to think carefully about. I'll ping a few people to see if they have worked through the implications yet. --Halfak (WMF) (talk) 14:58, 23 October 2014 (UTC)