Research talk:Revision scoring as a service/Work log/2016-01-31

Latest comment: 8 years ago by EpochFail in topic Summary

Sunday, January 31, 2016 edit

Got some of the edits labeled for vandalism (488536, so about 50%) and got impatient, so I had an early look. I randomized the order of the edits being processed, so this should be roughly representative.

> select anon_user, trusted_user, trusted_edits, client_edit, merge_edit, COUNT(*) AS edits, SUM(reverted) AS reverted, SUM(reverted)/COUNT(*) AS prop FROM wikidata_nonbot_reverted_sample GROUP BY 1,2,3,4,5 ORDER BY edits;
+-----------+--------------+---------------+-------------+------------+--------+----------+--------+
| anon_user | trusted_user | trusted_edits | client_edit | merge_edit | edits  | reverted | prop   |
+-----------+--------------+---------------+-------------+------------+--------+----------+--------+
|         0 |            1 |             0 |           0 |          1 |      4 |        0 | 0.0000 |
|         0 |            1 |             0 |           1 |          0 |      9 |        0 | 0.0000 |
|         1 |            0 |             0 |           0 |          1 |     22 |        0 | 0.0000 |
|         0 |            1 |             0 |           0 |          0 |     34 |        0 | 0.0000 |
|         0 |            1 |             1 |           1 |          0 |    414 |        1 | 0.0024 |
|         0 |            0 |             0 |           0 |          1 |    866 |        8 | 0.0092 |
|         0 |            1 |             1 |           0 |          1 |   3355 |       10 | 0.0030 |
|         0 |            0 |             1 |           0 |          1 |   3994 |       20 | 0.0050 |
|         0 |            0 |             0 |           1 |          0 |   4012 |       64 | 0.0160 |
|         0 |            0 |             1 |           1 |          0 |   5664 |       44 | 0.0078 |
|         1 |            0 |             0 |           0 |          0 |   6914 |      499 | 0.0722 |
|         0 |            0 |             0 |           0 |          0 |  15546 |      123 | 0.0079 |
|         0 |            1 |             1 |           0 |          0 | 195032 |      222 | 0.0011 |
|         0 |            0 |             1 |           0 |          0 | 252670 |      891 | 0.0035 |
+-----------+--------------+---------------+-------------+------------+--------+----------+--------+
14 rows in set (0.42 sec)

It looks like anonymous edits clearly have the highest revert probability. Second, it looks like client edits show up here, but I can't see how they are vandalism or would even need to be reviewed. We'll want to think about those, but I think we can just exclude them by default. If they are vandalism, they are vandalism on the originating wiki. Merges are also an interesting case here, but it's hard to see what's going on with all of these dimensions broken out, so let's look at them by themselves.

Merge edits edit

> select merge_edit, COUNT(*) AS edits, SUM(reverted) AS reverted, SUM(reverted)/COUNT(*) AS prop FROM wikidata_nonbot_reverted_sample WHERE NOT client_edit GROUP BY merge_edit;
+------------+--------+----------+--------+
| merge_edit | edits  | reverted | prop   |
+------------+--------+----------+--------+
|          0 | 470196 |     1735 | 0.0037 |
|          1 |   8241 |       38 | 0.0046 |
+------------+--------+----------+--------+
2 rows in set (0.30 sec)

It looks like merge edits are "reverted" at about the same rate as regular edits. Luckily, only 38 of them were reverted in the entire set (!!!) so we can review those manually.

OK! No intentional damage and a lot of the "good faith reverteds" were actually kind of debatable. I had to learn a bit about Wikidata merge policy in order to figure out what was right/wrong here. --EpochFail (talk) 22:04, 31 January 2016 (UTC)Reply

Trusted users edit

OK. Now to look into edits by "trusted" users. These are users who have saved a lot of edits or have attained a high-level user-right in Wikidata.

> select trusted_edits OR trusted_user AS trusted, COUNT(*) AS edits, SUM(reverted) AS reverted, SUM(reverted)/COUNT(*) AS prop FROM wikidata_nonbot_reverted_sample WHERE NOT client_edit AND NOT merge_edit GROUP BY trusted;
+---------+--------+----------+--------+
| trusted | edits  | reverted | prop   |
+---------+--------+----------+--------+
|       0 |  22460 |      622 | 0.0277 |
|       1 | 447736 |     1113 | 0.0025 |
+---------+--------+----------+--------+
2 rows in set (0.33 sec)

So, they get reverted about an order of magnitude less often than non-trusted users. They also save the vast majority of the "human" edits in Wikidata. That's great -- if we can ignore their reverts as not really being for "damage". So I think it is time to do some spot checking. We should probably be aiming at ~100 edits to get a sense for how often (if at all) such reverted edits are actually damaging and/or vandalism. --EpochFail (talk) 22:55, 31 January 2016 (UTC)Reply

I'm going to do this one in etherpad: https://etherpad.wikimedia.org/p/wikidata_reverted_trusted_user_edits --EpochFail (talk) 23:01, 31 January 2016 (UTC)Reply

I have evaluated 70 of the hundred and run out of steam. Gonna call it a night and pick this back up tomorrow. --EpochFail (talk) 00:37, 1 February 2016 (UTC)Reply

Here we go!

--EpochFail (talk) 16:22, 1 February 2016 (UTC)Reply

I'm just realizing that there are a few "good faith mistakes" that are really "good edits that could be better". E.g. adding a country to a person is good, but it would be better if it was added to their nationality. Gonna fix that now. --EpochFail (talk) 16:25, 1 February 2016 (UTC)Reply

So, 7 out of 98 reverted main namespace edits were event "damaging" and they were all clearly good-faith mistakes. I think that we can declare this set as not worth review. --EpochFail (talk) 16:29, 1 February 2016 (UTC)Reply

Summary edit

OK. Time to summarize what I think we have learned. It looks like we can exclude Merges and Trusted user edits from review (if we are looking for intentional vandalism) but we should probably include them if we are looking for "good-faith mistakes" too. It also seems like there are a few common patterns that we can probably pick up on. I added some feature requests for them:

One thing that I also realized is that there are a lot of reverts that are really content moves. E.g if one adds "country", "USA" to a person, the edit that moves that value to "nationality" would appear as two edits: one that removes the whole claim and another that adds a new claim of "nationality" "USA". This looks like a revert, but it is really just an improvement to a good, but not perfect edit.

So, how many edits does this remove from our set?

> select NOT (trusted_edits OR trusted_user OR client_edit OR merge_edit) AS needs_review, COUNT(*) AS edits, SUM(reverted) AS reverted, SUM(reverted)/COUNT(*) AS prop FROM wikidata_nonbot_reverted_sample GROUP BY needs_review;
+--------------+--------+----------+--------+
| needs_review | edits  | reverted | prop   |
+--------------+--------+----------+--------+
|            0 | 466076 |     1260 | 0.0027 |
|            1 |  22460 |      622 | 0.0277 |
+--------------+--------+----------+--------+
2 rows in set (0.25 sec)

It looks like 4.82% of edits actually might need review. That's great! --EpochFail (talk) 18:06, 1 February 2016 (UTC)Reply

Return to "Revision scoring as a service/Work log/2016-01-31" page.