The reason for CU request

edit
 
Regular night on croatian wikipedia
 
Sockpuppeting, one after another, Jan 31 2020.
 
Sockpuppeting, one after another, Feb 7 2020.

Long time ago I noticed a great number of users with similar style of writing. They shared some common characteristics like writing about Croats outside Croatia, including biograpies of people (Croats from Bosnia, Montenegro...) that some most other users would find irrelevant for Wikipedia. Other common subjects of interests included Croatian documentary movies (cca 5-6 users) and hills in Bosnia (also cca 5-6 users). The 3 most important bihevioral characteristics are

  1. Contributing in waves, just enough to satisfy voting criteria (which is 200 edits, but also 50 edits in article namespace in last 12 months). The user would come and make 40 or 50 edits in 1 or 2 hours and then dissapear for a month or more, sometimes up to a year. You can see that in table below where 37 suspected sockpuppets appeared only 50 times in total, most of them just once in 90 days, other 2 or maybe 3.
  2. Coming one after another - they would come one after another in one night, but never 2 at the same time. Kubura works at night very often, bet never together with his sockpuppets!
  3. As many as 30 of them voted   Strong oppose on hr wiki RFC and 12 of them voted against ~riley on 2020 stewards elections.

That is why I created a list of some 90 potential sockpuppets (I'm quite certain about 70, other 20 are with low editcount and hard to judge). I publicly warned Kubura to stop sockpuppeting several times. He didn't complain much, but he preferred to ignore it and keep a low profile.

I tried to propose a more strict voting rules, that would make maintaining 50 or 60 eligible sockpuppet-voters much harder, but Kubura strongly disagreed and effectively stopped the discussion. So, I decided to clear things up by finally creating a CU request. I picked 37 potential sockpuppets, i.e. only the ones active in 90 days prior to the request.

The request and the results

edit

On August 11, I created a local CU request on hr wiki. Immediatelly, all 37 users stopped contruibuting, although as much as 19 of them where active in 12 days prior to check (see table below)! Although Vodomar was active in days before the check, he just went missing after CU request was filed. The same happened to other local CU - Ex13. Vodomar returned on August 27, checking 33/38 users, making just a scaffolding table for the results and promissing to come back soon, and then dissapeared for full 45 days. In the meantime, Ex13 reappeared, some 45-50 days after the CU request, but refused to do the check because "Vodomar already did it".

Vodomar published results about 2 months later, only after he lost his CU rights due to Ex13 giving up his CU rights and also loosing community support (voting showed it's below 70%). He gave true IP ranges of 33/38 users (5 of them were aut of reach) and added RudnikU (talk · contribs), which was not even in request.. The data is later deleted by a steward, but I have a screenshot of these results. I made a table to the right based on that screenshot. The table is incomplete, because some users (Tomec, Herhel) are missing, although they were within the range of CU. Vodomar later "fixed" the data, but last version misses as much as 4 users, so I'm going to go with this one. Small changes in data don't change the point I will try to make.

Aftere original data was hidden, Vodomar posted his analysis, saying (I used google translator):

According to the survey, there are 15 unique x IP subnets (A.B.x.x) that are 65,000 wide IP addresses, of which there are 4 unique ISPs with three geographically different areas. No IP address was repeated in any collaborator meaning A.B.C.D, nor did A.B.C appear in different collaborators, but the pair A.B.C is repeated in eastern collaborators several times. This means that any users i.e. any pair User-A and User-B do not have the same A.B.C.D nor A.B.C, but User-A can have the same A.B.C multiple times.

Although I put A.B for contributors and who are currently deleted from the above contributors, they did so for the privacy of contributors. But I put it all in order to make the answers transparent in order to increase trust within the community so that no one would think that as a CU I am in any way non-transparent or cultivate some prejudices or add more education to someone or lean to one side . The community is divided as it is anyway, and if one part of the community does not trust the CU or the administrators who are on wikipedia.

Checking a large number of collaborators over 90 days is a job that is neither affected nor fruitful, because there are no ready-made tools for this and they have to develop on their own without help. The data that is downloaded from wikipedia does not have a good format, so it should first be downloaded, processed and put in the tool. Then the UA data must be put in a tool that needs to be edited with dates and look at statistical correlations in editing etc. If one really wants to go to extremes then one needs to take lexical tools to examine the correlation between the ways of expressing different contributors. Just to have overturned every stone that the “damaged” associates throw towards the check and which the CU needs to give a transparent answer because everything is thrown into doubt as a set or prejudice that the CU has or does not have.

Without any inclination and prejudice towards what has been examined so far, it cannot be concluded that the examined associates are the same person.

— Vodomar

Partial truth in Vodomar's report

edit

What he didn't say is that out of 4 unique ISPs with three geographically different areas, the 2 unique ISPs with 2 geographically different areas were related to RudnikU (talk · contribs), who was not even in my orignial list of 38 users. When asked about this, CU Vodomar stated that user RudnikU appeared when he did a reverse lookup on one of the IP addresses. That however, cannot hold true because the table of matches specifically excludes user RudnikU from all the other IP ranges (see table below).

However, when user RudnikU is removed from the data, there are no longer "four unique ISP-s and three geographically distinct areas", as stated in the original conclusion. With his exclusion, there are now two unique internet providers (one only for Umuthi GayndeGi), and the same geo-area.

In the original request, user Umuthi GayndeGi was added with a footnote explaining that his pattern of behaviour is somewhat dissimilar from the other users. Of all the users, he is the only one that has not ceased all activity after the CU request was posted (Aug 11). He is the only user from the list that has made an edit after that date (he has an edit on Sep 11). For that reason, he can be removed from the list due to not being a sockpuppet. We are now left with just 1 ISP and 1 geographical location.

Statistical analysis of internet providers and geo-data of 32 users

edit

All users that remain edit from the same ISP and the same geo-area. Croatia has two main ISP-s, so the odds of a single user connecting from the same ISP as Kubura are approx. 50%. If we round the geo-area to the widest possible region, Dalmatia, where Kubura is from, without specifying the geo-area further, we get the odds of a single other user connecting from the same area to be around 15-20%. If we multiply those (0.5 x 0.2 = 0.1), we get the chance of a single user both being from Kubura's region, and having the same ISP: approx. 10%. The odds of all 30 users meeting the two conditions are 1:1030, or 1 : 1.000.000.000.000.000.000.000.000.000.000. (One with 30 zeros). In other words, the odds are astronomical.

Let me show You some visual representation of the odds that these 32 users are indeed different persons...

In Dalmatia region, where all the users come from, lives somewhere around 1/5 to 1/6 inhabitants of Croatia. So let's represent Dalmatians with a number 5 on a dice, while 1,2,3,4,6 represents other parts of Croatia. When Vodomar checked IPs and geographical locations, here is what he got:

                                                               

Regarding internet providers, we have a duopol between A1 and T-com. Let's assume the chances that user will access wikipedia through a provider that Kubura uses are cca 50%. Let's represent it with a coin  . This is what happened when Vodomar checked internet providers of all 32 users:

                               
                               

Why Kubura doesnt have IP overlaps

edit
Analysis of CU results of kubura and 37 potential sockpuppets, based on results published by vodomar, now hidden
First edit Username Editcount Last edit before CU check Number of days user was active IP ranges (65 536 IPs each)
1 2 3 4 5 6 7 8 9 10, 11 and 12 13 and 14
2005-06-29 Kubura (talk · contribs) 158,719 2020-08-27 90 Yes Yes No No Yes Yes Yes Yes Yes No No
2011-06-02 Orašnik (talk · contribs) 682 2020-08-10 1 Yes No No No No No No No No No No
2014-01-20 Neadin (talk · contribs) 477 2020-08-09 2 No Yes No No Yes No No No No No No
2010-12-17 Tomec (talk · contribs) 1410 2020-08-05 1 No No No No No No No No No No No
2012-04-15 Rikovers (talk · contribs) 1259 2020-08-05 2 No Yes No No No No Yes No No No No
2015-08-08 Stijenor NGC (talk · contribs) 10746 2020-08-04 4 No No No No No Yes No No Yes No No
2011-10-18 Arraque (talk · contribs) 1960 2020-08-04 2 No Yes No No No No No No No No No
2012-04-27 Radion (talk · contribs) 600 2020-08-04 1 No No No No Yes No No No No No No
2014-04-19 Gretim (talk · contribs) 2001 2020-08-03 1 No No No No Yes No No No No No No
2010-12-06 Jarebika (talk · contribs) 1556 2020-08-03 1 No Yes No No No No No No No No No
2015-06-16 Umuthi GayndeGi (talk · contribs) 305 2020-08-03 4 No No No No No No No No No Yes No
2010-08-09 Fleezer (talk · contribs) 1186 2020-08-02 4 No Yes No No No No No No Yes No No
2013-01-20 Demet (talk · contribs) 483 2020-08-02 1 No No No No No No Yes No No No No
2014-09-07 Cvicang (talk · contribs) 1586 2020-08-01 2 No No No No No No No No Yes No No
2011-06-03 TekstViler (talk · contribs) 671 2020-08-01 1 No No No No No No No Yes No No No
2015-05-21 Verud (talk · contribs) 692 2020-08-01 2 No No No No No Yes No No Yes No No
2008-09-05 Šedrvan (talk · contribs) 1627 2020-07-31 3 No No No Yes No No Yes No Yes No No
2014-02-10 Anfiets (talk · contribs) 1299 2020-07-31 1 No No No No No No No No Yes No No
2011-01-11 Hergel (talk · contribs) 707 2020-07-30 1 No No No No No No No No No No No
2013-01-10 Kumordinar Žorž (talk · contribs) 453 2020-07-30 1 No Yes No No No No No No No No No
2014-09-16 Tobaccobox (talk · contribs) 181 2020-07-27 1 No No No No No No No Yes No No No
2010-12-27 Uršul (talk · contribs) 1899 2020-07-26 2 No No No No No No Yes No Yes No No
2011-01-02 Zerukruhaivina (talk · contribs) 2853 2020-07-22 1 Yes No No No No No No No No No No
2010-08-11 Gjiuh (talk · contribs) 1206 2020-07-22 1 No No No Yes No No No No No No No
2011-06-05 Dvastaorla (talk · contribs) 572 2020-07-22 1 No Yes No No No No No No No No No
2011-07-21 Leteći oleandar (talk · contribs) 497 2020-07-19 1 No No No No No No No No Yes No No
2016-05-30 Kartervaen (talk · contribs) 1021 2020-07-18 1 No No No No No No No No Yes No No
2015-12-21 ImeldoMax (talk · contribs) 698 2020-07-16 1 No No No No No No Yes No No No No
2011-11-28 XaneZeggi (talk · contribs) 445 2020-07-13 2 No No Yes No No No No No Yes No No
2014-09-16 Malatrad (talk · contribs) 1952 2020-07-11 1 No No No No No No No No Yes No No
2015-09-15 Vinko Ml. (talk · contribs) 554 2020-06-22 1 No No No No No No No No Yes No No
2011-08-07 Pantagana (talk · contribs) 1770 2020-05-29 1 No No No No No No No Yes No No No
2016-01-09 MairMoon (talk · contribs) 225 2020-05-29 1 No No No No Yes No No No No No No
2012-02-06 Soljenko (talk · contribs) 5001 2020-05-26 0
2015-08-22 Hari Tre (talk · contribs) 233 2020-05-26 0
2011-06-01 Kutni Rez (talk · contribs) 845 2020-05-25 0
2010-12-05 SkelaLeop (talk · contribs) 2898 2020-05-21 0
2016-02-09 Donher (talk · contribs) 607 2020-05-15 0
2018-05-15 RudnikU (talk · contribs) 1138 2020-07-14 7 No No No No No No No No No No Yes

Vodomar is claiming that he didn't find any IP match, not even a common A.B.C for any of the users. Here is why...

Kubura is editing through a provider that uses 11 IP ranges, each with 65.536 IP addresses, which means his provider randomizes between 720.896 IP addresses and 2816 A.B.C ranges. As proof, look at the IP ranges of users appearing only twice in a 90-day period, while also having two distinct IP ranges, such as users Neadin, Rikovers, Verud, Uršul, XaneZeggi. Or alternatively even thrice, from three different ranges (Šedrvan).

To elaborate, when checked, he will once appear as A.B.1.xxx, another time as C.D.50.xxx, yet another time as E.F.66.xxx. 11 ranges with a fixed A.B., wherein each A.B. makes up 256 different combinations, while A.B.C. combinations for each of the 11 different ranges make 11 x 256 = 2816 different combinations. Kubura has appeared a total of 79 times, while other users appeared a total of 50 times. Within the 2816 random possibilities of A.B.C, these 50 different ranges of 30 users have never overlapped with Kubura's 79 (he himself does have some A.B.C overlaps between these 79 IP ranges), which is not surprising, as they have covered only around 129 of the 2816 possible A.B.C. ranges in 90 days, which makes for only about 4.5%.

More on the overlapping IP ranges: The provider does not issue all of the 11 ranges with equal frequency. Kubura appears in 9 of the 11 ranges, along with all the other users (except Umuthi GayndeGi and RudnikU). If each column is sorted in turn, it will become apparent how many users belong to each IP range. The least amount of users belong to ranges 3 (only XaneZeggi) and 4 (Šedrvan and Gjiuh). Kubura himself has never appeared in those two ranges in 79 total appearances. The reason for this is that the ISP issues those ranges very rarely. However, XaneZeggi appears with Kubura in range 9, while Šedrvan appears in ranges 4, 7 i 9, which confirms that range 4 is part of the same set of IP addresses issued to Kubura. Gjiuh only appears in range 4, where Kubura is not present. That is because user Gjiuh appeared only once, on the 22. July. Most users only ever appear once in the 90 days, which in turn makes them appear only once in Kubura's IP ranges.

Proposal for additional analysis

edit

I have a reason (which I will not disclose because of privacy reasons) to believe that he uses 2 different computers, one during the day and other during the night. If that is the case, You will find, regardless of a user, one set of IP ranges during the day and another during the night. Of course, some overlaps are possible, even a complete overlap. But if you find one IP range to be used extensively during the night, for many different users, and never during the day, it will mean it's the same person with 2 computers.

Remaining sockpuppets

edit

I believe there is at least 50 to 70 more unblocked sockpuppets of Kubura. For the purpose of protecting validity of current RFAs on hr wiki, I will just list the 32 possible sockpuppets with voting rights, in case they show up in last minute: Soljenko, Anto Kalin, SkelaLeop, Pasac, Potoksedeslav, Chiartop, Gretim, Bruckermann, Malatrad, Pantagana, Jarebika, Ejnal, Gracij, Gjiuh, Hinko~hrwiki, Rondinfront, Nilski krokodil, Kutni Rez, Orašnik, Pomorac na bijelome brodu, Šimungr, Moderni futurist, TekstViler, Žvane, Donher, Va Kozali morčić, Vinko Ml., Ustajala voda, Pas s maslom, Guburljuk, Hari Tre, MairMoon. They are all inactive since at least August 3rd, except Orašnik who briefly appeared in October.

Lasta, sorry for jumping in between, but I am still more than confident that Cikola is also one of them. You know "our" (Cikolas and mine) history after last two votings. Also pls. take a look here: [1] - from June 2020 active only for votings on hr.wiki - all of them in same pattern. Best regards Mark7747 (talk) 11:07, 20 November 2020 (UTC)