Research talk:Wikipedia article creation/Work log/Friday, November 15th
Return to "Wikipedia article creation/Work log/Friday, November 15th" page.
Friday, November 15thEdit
I don't have much time to hack on this today. I've been waiting for one of my queries to finish. This is the one that gathers all of the pages that *have not* been deleted along with the rev_id of their first edit. Instead of waiting on that to finish, I decided to split up the work.
First, I'll get all of the page stats from the revision table:
CREATE TABLE halfak.nov13_page_revision_stats SELECT rev_page AS page_id, MIN(rev_id) AS first_rev_id, COUNT(*) AS revisions, MIN(rev_timestamp) AS first_edit, MAX(rev_timestamp) AS last_edit FROM revision WHERE rev_timestamp < "20131105000000" GROUP BY 1
Then I'll join it with page to get non-archived pages with stats:
SELECT page_id, page_namespace, page_title, revisions, first_edit, last_edit, False AS archived, first_rev_id FROM halfak.page_revision_stats INNER JOIN page USING (page_id) WHERE rev_timestamp < "20131105000000" GROUP BY 1,2,3;
I've kicked the query off so, hopefully, it will be done by the end of the day.