Research:STiki 1 million reverts review

Created

14:00, 13 October 2016 (UTC)

Contact

Andrew G West

no affiliation

Collaborators

Aaron Halfaker

Wikimedia Foundation

Research:Projects

This page is an incomplete draft of a research project.
Information is incomplete and is likely to change substantially before the project starts.

STiki has reverted 1 million edits. What's up with all that? That's our goal in this project. We'll explore and describe the role that STiki has played in English Wikipedia.

Data edit

On 2016-OCT-16, STiki will complete its first full backup since crossing the 1M revert threshold (its a MySQL DB). I will pull that file and post it somewhere in case you want a local copy. I'll probably also get that static snapshot running on my server and can give you server/database access (though the machine is no powerhouse). This should give us a consistent copy from which to work, that doesn't interfere with STiki's live operation, and that doesn't have constantly growing tables as we try to do data analysis. It's around 20GB compressed, the vast majority of which is the metadata/features stored alongside every NS0 edit, and hyperlink data from where STiki tried to play the anti-spam game for a bit (this could probably be ommitted). I don't recall the uncompressed size. West.andrew.g (talk) 21:26, 14 October 2016 (UTC)

STiki MySQL DATABASE ON 2016-10-21 (~30GB compressed; essentially a MyISAM directory; dump files were taking forever to rebuild indices). West.andrew.g (talk) 19:20, 21 October 2016 (UTC)
Credentials to a live MySQL version of the above snapshot have been provided to User:EpochFail. I am willing to do the same for other interested researchers. West.andrew.g (talk) 19:33, 21 October 2016 (UTC)

Explaining critical tables and columns edit

TABLE: "feedback" : This is where every press of the "classification" buttons is recorded (except for "pass"). The opaque column in this table is "LABEL", which can take on the following values:

QUEUE	Innocent	Guilty	Good-faith
STiki	-1	1	5
CBNG	-2	2	10
WikiTrust	-3	3	15
Spam	-4	4	20

TABLE: "log_client" : If a client does something that requires a database change, it is done through a stored procedure, and all those calls are recorded here. We don't actually record the parameters, just the name of the stored procedure. The "user" in this case is also the database user, not the Wikipedia user. We are able to link these using some session matching. The initial idea of the table was to do audits and make sure no one was using the exposed API/procedures to mass classify edits as innocent or place a "hold/reservation" on great quantities of edits so the queue would be useless to others. It's important to realize that STiki's queues are "synchronized". Any action in one queue performs the same action in all other queues. The client actions impacting the database are:

SP name	SP description
queue_fetch_*	The client saying "I need some edits!" from a queue. The server will return the 10 available RIDs with the highest priority that the user hasn't ignored. It will also place a TTL "reservation" on these RIDs such that they are unavailable to others.
feedback_insert	The user has pressed "vandalism", "guilty", or "ignore" and we need to record that in the "feedback" table.
queue_delete	Because the user has classified an edit, we need to dequeue it so no one else gets it.
queue_wipe	Remove one's reservation on some RIDs. Done if a user switches queues mid-session or more often, a clean shut-down of STiki.
oe_insert	If the classification is "vandalism" we have an "offending edit" (OE) and this gets recorded special as these are used in calculating reputation metrics for articles/editors.
queue_ignore	A user has pressed the "pass" button for an edit. This is the only place passes are recorded.
queue_resurrect	A weird corner case that captures -some- uses of the "back" button. If a user classified an edit as "innocent", went "back", then classified it as "pass", we need to do some DB unwinding which this handles.
leaderboard	When a user generates a version of the leader-board from inside the client.

A timeline of STiki's history edit

Date	UNIX TS	Description
2010-FEB-26	1267209791	First STiki classification (testing by west.andrew.g)
2010-DEC-25	1293253809	The "log_client" table comes online
2016-OCT-21	1477023316	Last STiki classification per snapshot

Research questions edit

How long does the average anti-vandal session last? How many classifications is that? Does vandalism hit-rate effect session length or frequency?
Who are the anti-vandals using the tool? How do they distribute geographically? Where are they in their wiki careers? When STiki use stops, are they quitting Wikipedia, or have they found alternative tasks (is it a gateway drug?)?
Has issuing barnstars for classification thresholds achieved anything? Are we able to gamify anti-vandals for throughput?
STiki briefly hosted an anti-spam queue. No one used it. Reviews took a long time. Does this tell us anything?
STiki integrated a "good faith revert" function at some point besides just vandalism/innocent/pass. How did this effect "vandalism" presses?
Can a history of STiki + CBNG edit scoring tell us anything about Wikipedia's vandalism propensities in a longitudinal fashion?

Methods edit

Results edit

References edit

AGW's PhD dissertation (I know, right?) - The content of Chapter 6 has never been published in a conference/journal and is fair game. It was an 11th hour attempt to shove a bit of STiki data into the document. Obviously we need to go much deeper.
Geiger, R. S., & Halfaker, A. (2013, February). Using edit sessions to measure participation in wikipedia. CSCW (pp. 861-870). ACM. -- To compare STiki anti-vandal sessions with those project-wide.
Geiger, R. S., & Halfaker, A. (2013, August). When the levee breaks: without bots, what happens to Wikipedia's quality control processes?. OpenSym (p. 6). ACM. -- STiki users review edits that make it past ClueBot NG and Huggle.
https://www.washingtonpost.com/graphics/politics/2016-election/presidential-wikipedias/ - The mainstream media looks at STiki scoring in the context of 2016 US presidential election