User:EpochFail/Journal/2011-10-11
Tuesday, Oct 11th
editI'm gathering huggle data for the last experiment. First up is matching the huggle message events with message reading events.
CREATE TABLE halfak.huggle_posting_mk2
SELECT
posted.user_id AS user_id,
posted.user_text AS user_text,
posted.TIMESTAMP AS time_posted,
MIN(consumed.TIMESTAMP) AS time_consumed
FROM halfak.huggle_posting_action_mk2 posted
LEFT JOIN halfak.huggle_posting_action_mk2 consumed
ON posted.user_text = consumed.user_text
AND posted.TIMESTAMP <= consumed.TIMESTAMP
AND consumed.action = "read"
WHERE posted.action = "received"
GROUP BY posted.user_id, posted.user_text, posted.TIMESTAMP;
--mysql> select time_consumed IS NULL, count(*) from halfak.huggle_posting_mk2 group by 1;
--+-----------------------+----------+
--| time_consumed IS NULL | count(*) |
--+-----------------------+----------+
--| 0 | 7842 |
--| 1 | 6206 |
--+-----------------------+----------+
--2 rows in set (0.03 sec)
CREATE INDEX user_text_timestamp_idx ON halfak.huggle_posting_mk2 (user_text, time_posted);
DROP TABLE halfak.huggling_agg_mk2;
CREATE TABLE halfak.huggling_agg_mk2
SELECT
ha2.rev_id,
def,
personal,
nodirectives,
exp_case != 'none' as experimental,
reverted_id,
ha2.rev_page,
ha2.rev_user,
ha2.rev_user_text,
ha2.rev_timestamp,
page_id,
warned_user,
IF(hp2.time_posted IS NULL, ha2.rev_timestamp, MIN(hp2.time_consumed)) AS message_consumed,
COUNT(a.ar_rev_id) > 0 AS deleted_revs
FROM staeiou.huggling_agg_mk2 ha2
LEFT JOIN halfak.huggle_posting_mk2 hp2
ON ha2.warned_user = hp2.user_text
AND hp2.time_posted >= ha2.rev_timestamp
LEFT JOIN enwiki.archive a
ON ha2.warned_user = a.ar_user_text
GROUP BY ha2.rev_id;
SELECT
experimental,
deleted_revs,
message_consumed IS NOT NULL as message_consumed,
warned_user NOT RLIKE "[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}" as registered,
count(*)
FROM halfak.huggling_agg_mk2
GROUP BY 1, 2, 3
WITH ROLLUP;
--+--------------+--------------+------------------+------------+----------+
--| experimental | deleted_revs | message_consumed | registered | count(*) |
--+--------------+--------------+------------------+------------+----------+
--| 0 | 0 | 0 | 0 | 1488 |
--| 0 | 0 | 1 | 0 | 2060 |
--| 0 | 0 | NULL | 0 | 3548 |
--| 0 | 1 | 0 | 0 | 60 |
--| 0 | 1 | 1 | 1 | 366 |
--| 0 | 1 | NULL | 1 | 426 |
--| 0 | NULL | NULL | 1 | 3974 |
--| 1 | 0 | 0 | 0 | 1917 |
--| 1 | 0 | 1 | 1 | 1984 |
--| 1 | 0 | NULL | 1 | 3901 |
--| 1 | 1 | 0 | 1 | 107 |
--| 1 | 1 | 1 | 0 | 504 |
--| 1 | 1 | NULL | 0 | 611 |
--| 1 | NULL | NULL | 0 | 4512 |
--| NULL | NULL | NULL | 0 | 8486 |
--+--------------+--------------+------------------+------------+----------+
--15 rows in set (0.05 sec)
Everything looks good. I spot checked a couple and they look to be correct. It also appears that registered users are more likely to read their messages which makes a lot of sense.
Now it is time to generate the HTML files for Jonathan.
20:28, 11 October 2011 (UTC)
I actually just re-worked the above to include a field for editors with deleted revisions. It looks like we have 504 instances of an editors who read their message, but have deleted revisions. That's a lot more than I thought, but I'm trudging forward. We'll deal with them later. :\
20:56, 11 October 2011 (UTC)
SELECT
warned_user NOT RLIKE "[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}" as registered,
count(*)
FROM halfak.huggling_agg_mk2
WHERE exp_case != "none"
AND NOT deleted_revs
AND message_consumed IS NOT NULL
GROUP BY 1
WITH ROLLUP;
--+------------+----------+
--| registered | count(*) |
--+------------+----------+
--| 0 | 1793 |
--| 1 | 191 |
--| NULL | 1984 |
--+------------+----------+
--3 rows in set (0.06 sec)
It looks like we are going to have 1984 users to code with our volunteers (along with the 504 noted above).
21:02, 11 October 2011 (UTC)
halfak@internproxy:~/data/newbie_warnings/mk2$ mysql -hdb1047 -e "SELECT warned_user, message_consumed FROM huggling_agg_mk2 WHERE NOT deleted_revs AND experimental AND message_consumed IS NOT NULL" -N halfak > hugglings.message_consumed.no_deleted.tsv
halfak@internproxy:~/data/newbie_warnings/mk2$ wc hugglings.message_consumed.no_deleted.tsv 1984 3992 57118 hugglings.message_consumed.no_deleted.tsv
I better limit my after-coding to only those editors who performed work afterward.
DROP TABLE halfak.huggling_agg_revs_after;
CREATE TABLE halfak.huggling_agg_revs_after
SELECT
ha2.rev_id,
ha2.warned_user,
ha2.message_consumed,
COUNT(DISTINCT r.rev_id)+COUNT(DISTINCT a.ar_rev_id) AS revs_after
FROM halfak.huggling_agg_mk2 ha2
LEFT JOIN enwiki.revision r
ON r.rev_user_text = warned_user
AND r.rev_timestamp > ha2.message_consumed
LEFT JOIN enwiki.archive a
ON a.ar_user_text = warned_user
AND a.ar_timestamp > ha2.message_consumed
GROUP BY ha2.rev_id;
CREATE INDEX rev_idx ON halfak.huggling_agg_revs_after (rev_id);
ALTER TABLE halfak.huggling_agg_mk2 ADD COLUMN revs_after INT;
UPDATE halfak.huggling_agg_mk2 ha2, halfak.huggling_agg_revs_after hara
SET ha2.revs_after = hara.revs_after
WHERE ha2.rev_id = hara.rev_id;
halfak@internproxy:~/data/newbie_warnings/mk2$ mysql -hdb1047 -e "SELECT warned_user, message_consumed FROM huggling_agg_mk2 WHERE NOT deleted_revs AND experimental AND message_consumed IS NOT NULL AND revs_after > 0" -N halfak > hugglings.message_consumed.no_deleted.revs_after.tsv
OK... Well this is weird, but I can't seem to write a file to /a on internproxy. I'll move on to script writing in the meantime.
00:06, 12 October 2011 (UTC)
I wrote a script to take advantage of Stu's script for gathering contrib histories for users in HTML files. It is working and I've done a couple of test runs, but now I'm stuck on internproxy again. I have Asher working on the problem and he said it doesn't look good. Well... we've got our datasets and I have my code committed so hopefully we can catch up in the morning.
Singing off. --EpochFail 00:06, 12 October 2011 (UTC)