WSoR datasets/reverted
The reverted dataset contains information about reverted revisions, who they were reverted by, which revision was reverted back to and whether the revert was for vandalism.
Location
editdb42:halfak.reverted_20110115
Fields
edithalfak@internproxy:~/Sandbox/wsor$ mysql -h db42 -e "EXPLAIN reverted_20110115;SELECT * FROM reverted_20110115 LIMIT 3" halfak +----------------+----------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------------+----------------+------+-----+---------+-------+ | rev_id | int(11) | YES | MUL | NULL | | | username | varbinary(255) | YES | | NULL | | | user_id | int(11) | YES | | NULL | | | comment | varbinary(255) | YES | | NULL | | | rvtg_id | int(11) | YES | MUL | NULL | | | rvtg_username | varbinary(255) | YES | | NULL | | | rvtg_user_id | int(11) | YES | | NULL | | | rvtg_comment | varbinary(255) | YES | | NULL | | | rvtto_id | int(11) | YES | | NULL | | | rvtto_username | varbinary(255) | YES | | NULL | | | rvtto_user_id | int(11) | YES | | NULL | | | rvtto_comment | varbinary(255) | YES | | NULL | | | is_vandalism | tinyint(1) | YES | MUL | NULL | | | revs_reverted | int(11) | YES | | NULL | | +----------------+----------------+------+-----+---------+-------+ +-------------+----------------+---------+-----------------------------------------------------+-----------+---------------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------+---------------+-----------------------------------------------------+--------------+---------------+ | rev_id | username | user_id | comment | rvtg_id | rvtg_username | rvtg_user_id | rvtg_comment | rvtto_id | rvtto_username | rvtto_user_id | rvtto_comment | is_vandalism | revs_reverted | +-------------+----------------+---------+-----------------------------------------------------+-----------+---------------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------+---------------+-----------------------------------------------------+--------------+---------------+ | 261295963 | 88.109.188.145 | 0 | | 261309052 | Numyht | 4660442 | [[WP:UNDO|Undid]] revision 261295963 by [[Special:Contributions/88.109.188.145|88.109.188.145]] ([[User talk:88.109.188.145|talk]]) rv unsourced | 260061081 | Red Sismey | 1991048 | /* Acting */ | 0 | 1 | | 269434849 | 75.53.209.45 | 0 | /* Total Nonstop Action Wrestling (2008-present) */ | 269442144 | JakeDHS07 | 743183 | /* Total Nonstop Action Wrestling (2008-present) */ week by week as well as incorrect | 269376850 | PCE | 2202642 | /* Total Nonstop Action Wrestling (2008-present) */ | 0 | 1 | | 280667690 | 86.44.89.61 | 0 | | 280684923 | TheFBH | 5122755 | [[WP:UNDO|Undid]] revision 280667690 by [[Special:Contributions/86.44.89.61|86.44.89.61]] ([[User talk:86.44.89.61|talk]]) | 279423089 | TheFBH | 5122755 | /* In wrestling */ | 0 | 1 | +-------------+----------------+---------+-----------------------------------------------------+-----------+---------------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------+---------------+-----------------------------------------------------+--------------+---------------+
Each row represents a reverted revision.
- Reverted Revision - One of the reverted revisions
rev_id
: The identitifier of the reverted revision PRIMARY KEYusername
: The username of the editor who created the reverted revision (same as revision.rev_user_text)user_id
: The identifier of the editor who created the reverted revision (same as revision.rev_user)comment
: The comment left by the reverted editor (same as revision.rev_comment)
- Reverting Revision - The revision that performed the revert.
rvtg_id
: The identitifier of the reverting revisionrvtg_username
: The username of the editor who performed the revert (same as revision.rev_user_text)rvtg_user_id
: The identifier of the editor who performed the revert (same as revision.rev_user_text)rvtg_comment
: The comment left by the reverting editor
- Reverted to Revision - The revision that was re-instated by the reverting edit.
rvtto_id
: The identitifier of the revision that was reverted back torvtto_username
: The username of the editor who created the reverted to revisionrvtto_user_id
: The identifier of the editor who created the reverted to revisionrvtto_comment
: The comment left by the editor who created the reverted to revision
- Meta
is_vandalism
: True whenrvtg_comment
matched D_LOOSE/D_STRICT regexprevs_reverted
: The total number of revisions that were reverted in this action.
Reproduction
editTo reproduce this dataset, install Wikimedia Utilities and run this script.
For example:
$ python reverts.py --output_prefix=test1/ enwiki.*.7z Aug-29 16:51:36 INFO Starting test1/ run... Aug-29 16:51:36 INFO Creating output file: test1/revert.tsv Aug-29 16:51:36 INFO Creating output file: test1/reverted.tsv Aug-29 16:51:36 INFO Prcoessing... Aug-29 16:51:36 INFO Processing dump file enwiki-20110829-pages-meta-history.sample.xml. |.||||..|.|....|......|........|..........|...........|.|.|.|.|...|.|||...|.|.|
This will produce two output files that can be loaded into the database test1/revert.tsv
and test1/reverted.tsv
. Create the table to load into:
CREATE TABLE reverted (
rev_id INT UNSIGNED,
rvtg_id INT UNSIGNED,
rvtto_id INT UNSIGNED,
is_vandalism BOOL,
revs_reverted INT(2)
);
This gets the base dataset into the MySQL so it can be denormalized. To denormalize, run the following command:
CREATE TABLE reverted_denorm
SELECT
rvtd.rev_id,
rvtd.rev_user_text as username,
rvtd.rev_user as user_id,
rvtd.rev_comment as comment,
rvtg.rev_id as rvtg_id,
rvtg.rev_user_text as rvtg_username,
rvtg.rev_user as rvtg_user_id,
rvtg.rev_comment as rvtg_comment,
rvtto.rev_id as rvtto_id,
rvtto.rev_user_text as rvtto_username,
rvtto.rev_user as rvtto_user_id,
rvtto.rev_comment as rvtto_comment,
reverted.is_vandalism,
reverted.revs_reverted
FROM reverted
LEFT JOIN revision rvtd
ON reverted.rev_id = rvtd.rev_id
LEFT JOIN revision rvtg
ON reverted.rvtg_id = rvtg.rev_id
LEFT JOIN revision rvtto
ON reverted.rvtto_id = rvtto.rev_id;
Finally, move the denormalized table over the original.
DROP TABLE reverted;
ALTER TABLE reverted_denorm RENAME TO reverted;
Notes
editThe current dataset is complete up to the January 2011 dump.