User:GeneralNotability/Paper sockfarm AAR

SummaryEdit

Over the course of a year, hundreds of socks spammed references to several academic paper across over a dozen Wikipedias. This led to the discovery of several large groups of socks engaging in widespread reference spamming. This sock farm was significantly more sophisticated than most.

Relevant discussionsEdit

Involved domainsEdit









Known papersEdit

This is drawn from papers related to Wikipedia published by Włodzimierz Lewoniewski, Krzysztof Węcel, and Witold Abramowicz, since all of the socking here appears to revolve around their papers.


Papers by other authorsEdit

These papers are associated with the sock group but aren't associated with the same authors as the above papers. I'm unclear why they were included - whether they were other papers the sockfarm was trying to push or whether they were just added to make it less obvious that the Lewoniewski papers were the actual goal.

Unused papersEdit

Sock behaviorEdit

I have lumped socks into multiple informal "groups." All are interested in spamming Lewoniewski papers, though their methodology differs. Analysis of IP edits was done separately (since IPs don't use CentralAuth, which messed with my analysis tool). Thankfully, there were a manageable number of IP edits. See User:GeneralNotability/Paper_sockfarm_AAR/IP_Edits for my commentary there.

Group 1: manual editsEdit

See w:en:Wikipedia:Sockpuppet investigations/Lomtikov. As described in the SPI, these socks appear to be meatpuppetry and/or COI editing to push Lewoniewski papers. They also created or significantly edited several articles related to him - w:en:BIS conference, w:en:Uladzimir Levaneuski, w:en:Interstudent, w:en:National Strike Committee of Belarus, and w:en:Witold Abramowicz (a frequent co-author on these papers). Unlike group 2, most of these edits appear to be manual work rather than automated or semi-automated. In 2018 and 2019, we do start seeing behavior similar to group 2 from these accounts as well (use of what appears to be templates to insert generated statements into multiple articles) see for example w:en:Special:Diff/911706130 from Lomtikov.

Several of these accounts also create userpages on non-English wikis which only contain babel boxes, which is a behavior we see in group 2. If we assume that these are honest, the editors primarily speak Eastern European languages - Polish, Russian, Belarusian, Ukrainian. This agrees with the geolocation of most of the IPs associated with this farm. Unlike the bots, these babel entries appear realistic, with at least one -5 or -N entry and the listed languages corresponding to languages that the account actually contributed to. Between the IP geolocation, the contribution distribution, and the fact that these languages align with the location of the paper authors, I am moderately confident that these are indeed the major languages used by people associated with this farm.

Group 2: mass spamEdit

This group was the meat of the campaign - hundreds of throwaway accounts inserting refspam across a number of wikis. The first users in this group appeared in August 2019. There was a burst of activity in August-September 2020 to spam "Analysis of References Across Wikipedia Languages," "Modeling Popularity and Reliability of Sources in Multilingual Wikipedia," and "Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics ." The spam farm appears to have been inactive for several months after that, then activity resumed in May 2020 and has continued at a steady pace. The last observed activity at time of writing was 2 August 2020; it is unknown whether this is actually a stop to activity or the group moving to a new paper which we do not know about.

The accounts all had names which (to my ignorant American eye) look like a name that might belong to a native speaker of whatever language they were spamming (for example, "Yue Qingshuang" on zhwiki or "Yelena Kokareva" on ruwiki). These accounts inserted references to papers listed above and linked to the spam domains. They also displayed a strong interest in adding or updating Alexa search ranks, as well as listing Alexa ranks in the article body when they added the papers. Some accounts gained autoconfirmed and then edited protected pages; I suspect that the primary purpose of updating the Alexa ranks was to make non-suspicious edits in order to become autoconfirmed.

All sock edits appeared to make use of a template of sorts (translated into each spammed language), substituting in the name of the relevant wiki (or multiple wikis), source(s), and relevant numbers. Numbers appear to have been localized, possibly by automated means. Compare, for example, w:es:Special:Diff/126596637 and w:en:Special:Diff/960292027 (both edits to the ISBN page) - they referred to the local Wikipedia's language, changed the number appropriately, and used the language-appropriate decimal separator. We also saw use of refname and refgroup.

I originally described this group as "bots," but on further review I think this was semi-automated editing. The activity rate (~5 accounts created/day) is more in line with manual operation and there is too much complexity to the way the edits were performed (some creating new sections for the spam and some attaching it to relevant existing sections, for example) for me to believe that this was fully automated.

I do not think that the people who created the language templates were native English speakers. See, for example, w:en:Special:Diff/912311175 - the phrasing is awkward and suggests either machine translation or a non-native speaker. I am not proficient enough in any of the other languages used to identify which one might be the native language.

Other interesting behavior:

  • On enwiki, socks created a userpage with one or two infoboxes. On other wikis, socks generally created a fairly nonsensical babel box with one or two languages (I say nonsensical because they generally had low proficiency in whatever languages they listed, never listed a language as native, and the choice of language rarely had anything to do with the language of the wiki they were editing.
  • Some socks targeted the same article across multiple languages; see for example Special:CentralAuth/Jelena Abbou, who edited the same article on shwiki and srwiki.

AnomaliesEdit

A handful of the users had odd names, such as "Luciana Cruz ES," "Pamela Diego US," "Gigi Jojo KA," and "Isabella EN." There was also "Yulia Over 9000," which is my favorite of the odd names but doesn't fit any pattern. I suspect that this is some kind of bug where the name list they were using was accidentally appended to the username - Cruz edited eswiki, Jojo edited kawiki, and Diego and Isabella edited enwiki.

Sock listEdit

False positives were mostly low-edit users adding 'wikirank.net' in good faith or translating an article which already contained one of the citations.

Open questionsEdit

  • Who is running this sock farm? The obvious answer is "Lewoniewski or someone working with him," but that might be too obvious. I currently have three possible hypotheses:
    1. The obvious answer. The papers had a common set of authors between them (Lewoniewski, Węcel, Abramowicz), so them or someone working with them. The sock farm was written to promote their papers.
    2. This whole thing is a en:joe job by someone trying to discredit Lewoniewsi. Possible, but given how extensive and sophisticated this farm was, I doubt it.
    3. The entire sock farm is part of an academic study about spamming Wikipedia. Off the wall, but these folks do write about Wikipedia a lot in their papers...if it's actually this one, I would like to have some words with whoever approved the study (at least in the US, this sort of study would probably require a review board if it were going to be published).
  • What was the goal of the sock farm? Ties into the above question. Spamming references to papers, unlike traditional spam, probably isn't going to lead to financial gain, and I would be concerned if "number of citations in Wikipedia" is a metric that somebody is using to judge researchers.
  • Are there other papers out there which we haven't identified? This list was generated by going through known publications by Lewoniewski et al., but if this farm is spamming other authors' papers, then we won't be picking that up.

AcknowledgementsEdit

Thanks to Smartse for initially finding the sock farm, User:Beetstra for giving me access to COIBot's database so that I could pull the statistics for this, and Praxidicae for putting in a lot of the labor to remove the cross-wiki spam.