Research talk:Understanding Wikidata's Value/Work log/2017-05-18

Thursday, May 18, 2017 edit

Yesterday, I refactored scripts to extract/aggregate wikibase usage from wikis. These utilities should be (nearly) done.

Today, I'm moving on to work on a script to download Wikipedia page view data for a specific range of time. The user of this utility should be able to specify a start time (down to the hour) and end time (down to the hour) and download all page view logs for that given period of time. The implementation of this will involve parsing the html of the dumps.wikimedia.org pages listing all of the page view logs and matching only logs that fit the start and end criteria.

For our own purposes, I've been using a shell script to download a year's worth of data needed for our study. It takes about 3 days to download that much data with that script. We've been aggregating each month's page views using mwviews aggregate. It takes about a day to aggregate one month of views.

Today, I'll also finish my report of work presented at CHI 2017 (I attended last week).

Return to "Understanding Wikidata's Value/Work log/2017-05-18" page.