Research talk:Are the bots really fighting/Work log/2017-03-15
Latest comment: 7 years ago by EpochFail in topic Wednesday, March 15, 2017
Wednesday, March 15, 2017
editI'm doing some data cleanup today. I have two problems.
- I was only including bot reverts that included bots that currently are flagged.
- I was not grabbing the last reverted edit before the reverting edit but rather the most distant reverted edit from the reverting edit.
So let's fix both!
First, I grabbed all of the usernames from en:Wikipedia:List_of_Wikipedians_by_number_of_edits/Unflagged_bots and combined that with a list that Staeiou curated, the list of bots flagged in enwiki.user_groups and the bots that used to be flagged from enwiki.user_former_groups.
dbstore staging -e "SELECT user_id, user.user_name FROM enwiki.user INNER JOIN tmp_geiger_bot_usernames_enwiki_20170315 g ON user.user_name = g.user_name UNION SELECT user_id, user.user_name FROM enwiki.user INNER JOIN tmp_unflagged_bot_usernames_enwiki_20170315 u ON user.user_name = u.user_name UNION SELECT user_id, user.user_name FROM enwiki.user INNER JOIN enwiki.user_groups ON ug_user = user_id WHERE ug_group = 'bot' UNION SELECT user_id, user.user_name FROM enwiki.user INNER JOIN enwiki.user_former_groups ON ufg_user = user_id WHERE ufg_group = 'bot';"
This gave me a dataset of 1417 bots (compared to the 314 I had before).
I also cleaned up the bot2bot query.
SELECT reverted.*, page.page_namespace AS page_namespace
FROM staging.enwiki_reverted_20140820 AS reverted
INNER JOIN enwiki.page ON rev_page = page_id
INNER JOIN staging.enwiki_unified_bot_20170315 AS reverted_bot ON
rev_user = reverted_bot.user_id
INNER JOIN staging.enwiki_unified_bot_20170315 AS reverting_bot ON
reverting_user = reverting_bot.user_id
WHERE
rev_user != reverting_user AND
rev_revert_offset = revisions_reverted
UNION ALL
SELECT reverted.*, archive.ar_namespace AS page_namespace
FROM staging.enwiki_reverted_20140820 AS reverted
INNER JOIN enwiki.archive ON rev_id = ar_rev_id
INNER JOIN staging.enwiki_unified_bot_20170315 AS reverted_bot ON
rev_user = reverted_bot.user_id
INNER JOIN staging.enwiki_unified_bot_20170315 AS reverting_bot ON
reverting_user = reverting_bot.user_id
WHERE
rev_user != reverting_user AND
rev_revert_offset = revisions_reverted;
The trick is in the lines that say rev_revert_offset = revisions_reverted. This makes sure that the reverted edit we select is the *last* edit in the string of reverted edits. The query is running now. I should have data soon. --EpochFail (talk) 22:30, 15 March 2017 (UTC)