Research talk:Understanding Wikidata's Value/Work log/2017-06-05

Monday, June 5, 2017 edit

Today, I'll be working on finishing up a script that processes the "stub-meta-current" dumps in order to produces page ids for all page titles across wikis. This script will handle redirects too. The script will be parallelized. Given that some of the stub-meta-current files are huge (e.g. the enwiki one has data for 40+ million pages/redirects), we'll process the 10 largest wikis separately to minimize memory usage.

I'll also finish/send out a proposal for funding for July-August/September.

I'll work on a utility that will parse page view data into namepace-titles. Time permitting, I'll then work on a script that will take namespace-title page view data and merge that with page ids so that we have page id page view data.

Return to "Understanding Wikidata's Value/Work log/2017-06-05" page.