Requests for comment/Hiding the number of Russian/Belorussian/Kazakh contributors on the statistics map

This is a subpage; for more information, see the Requests for comments page.


Wikimedia stats displays heatmap of Russian Wikipedia contributors, but excludes three of four countries with highest number of editors: Russia, Belarus and Kazakhstan, due to wikitech:Country protection list.

This leads to the fact that erroneous information is now being spread by Russian propagandists that Russian Wikipedia (which they understood as "Wikipedia in Russia" rather than "Wikipedia in Russian language") is edited mostly in Ukraine, USA, and Western Europe, being “hostile countries” for them, from which they deduce that "Russian Wikipedia is an American/Ukrainian/anti-Russian/liberal propagandistic tool".

Examples

We, ruwiki contributors, believe that this policy does a lot of harm and no good: Russian Internet users tend to believe that ruwiki is an alien resource, and Russia/Belarus/Kazakhstan has so many ruwiki contributors that it's impossible to get any information about certain user from that generalized statistics. We request that this policy be rescinded at least for pair of ruwiki and Russia/Belarus/Kazakhstan.

Local discussions about this:

MBH (talk) 05:18, 17 September 2023 (UTC)[reply]

Discussion

edit
  • Support. Statistics were not needed to put me in prison. Pessimist (talk) 05:46, 17 September 2023 (UTC)[reply]
  • Support. As far as I see, exclusion of several countries is not even mentioned in the map description. So, now the map is simply a lie. Sneeuwschaap (talk) 06:07, 17 September 2023 (UTC)[reply]
  •   Support. Currently this does significantly more harm than good. Well very well (talk) 06:31, 17 September 2023 (UTC)[reply]
  • Support. This is a very strange decision, since the statistics are anonymized, it is simply the number of users in a specific territory, without disclosing personal data. Книжная пыль (talk) 06:57, 17 September 2023 (UTC)[reply]
  •   Support, I don't see any significant harm from only numbers, while excluding Russia, Belarus and Kazakhstan from the statistics just destroys it. «RF_22»/ talk 07:17, 17 September 2023 (UTC)[reply]
  •   Support. Don't give others a false impression of Wikipedia. Футболло (talk) 08:49, 17 September 2023 (UTC)[reply]
  •   Support. The full arguments are stated above: as it stands, the map provides false and absurd information and is being used by anti-Wikipedia propaganda. Generalized statistics do not pose a danger to a specific user. Demetrius Talpa (talk) 09:40, 17 September 2023 (UTC)[reply]
  • I agree, it was always interesting to know what benefits disabling the map for users from Russia brings. Iniquity (talk) 10:07, 17 September 2023 (UTC)[reply]
  •   Comment: The provided examples are links leading to social networks and bloggers that spreading Russian state propaganda. Take, for instance, Fritzmorgen's blog. He happens to be the founder of the Russian propagandistic wiki-based platform known as “Ruxpert,” and it is highly likely that he will continue to spread disinformation aimed at discrediting Wikipedia, regardless of whether statistics are implemented or not. Personally, I am indifferent to the content they share there. These social media posts appear to have little impact and are primarily directed towards an audience that already supports the Russian invasion of Ukraine. Their attempts to discredit Wikipedia persist because our articles are based on objective information provided by international experts, rather than state propaganda. Anyway, I   Support the proposal for implementing statistics on the number of Russian, Belarusian, and Kazakh contributors, or, as an alternative, for completely disabling statistics for the Russian Wikipedia. – Mariâ Magdalina (talk) 11:37, 17 September 2023 (UTC)[reply]
  •   Support. Per above. Rampion (talk) 11:59, 17 September 2023 (UTC)[reply]
  •   Support. As a Kazakhstani user — i.e. user living in the country that’s included in the list, — I cannot see how disabling edit count could possible help protecting editors from the state oppression. If the hazard of being arrested/prosecuted really exists, then hiding this data is the most useless thing that Wikimedia Foundation can do to protect users from endangered countries — and now it turns out that it’s harmful to Wikimedia reputation in these very countries. As it is shown in the topicstarter’s message, the lack of data concerning the amount of Wikipedia users from Russia, Belarus, and Kazakhstan is being actively used in state propaganda, especially in Russia. These claims, despite their absurdity and the absence of any logical basis, can lead Wikipedia to being even less unpopular in these countries, especially in Russia, where such a rhetoric can find quite a remarkable quantity of supporters. Therefore, I believe that this kind of statistics shouldn’t be hidden for the sake of the project. Written by TakingOver // Talk // Contributions 12:24, 17 September 2023 (UTC)[reply]
  •   Oppose: the policy of the WMF on this is sound and does not need changing, what needs changing is the user interface of Wikistats. You can reflect that some of the data is unavailable without putting editors potentially at risk. That is what my task on Phabricator already asked for. stjn[ru] 12:50, 17 September 2023 (UTC)[reply]
    Why do you think editors are at any risk because of this data? MBH (talk) 13:01, 17 September 2023 (UTC)[reply]
    You have mentioned the "sound policy of WMF" for not publishing the discussed stats, which, as I understand, is distilled in this concise phrase:
    "WMF does not release aggregations of sensitive data in countries identified by independent organizations as potentially dangerous for journalists or internet freedom."
    As of me, the reason seems dubious without additional comments/explanations. Here, I share the confusion of MBH as s/he has already expressed earlier. ZharV (talk) 18:57, 19 September 2023 (UTC)[reply]
  •   Support. It's better to close these data totally than to present a distorted information. There are no real or potential risks for users from publishing a generalized statistics. --V1adis1av (talk) 15:25, 17 September 2023 (UTC)[reply]
  • Partially support. The current representation of statistics for Ru-Wikipedia does more harm than good, and even more harm after Feb.2022. Ru-Wikipedia community can not use it at all, because they do not see the most of the edits, but ill-wishers can. If Wikimedia can not show statistics from Russia, plz, hide statistics for Ru-Wikipedia and other Ru-projets entirely.
    Mariâ_Magdalina - yes, all of the current posts are from social networks and bloggers, but the situation is becoming worse: for example, Рыбарь is not simple blogger, he is also television presenter on Соловьёв Live TV Channel which had replaced Russian Euronews TV Channel after Feb.2022; currently he has not used mentioned article on his TV review, but he can. Alex Spade (talk) 15:42, 17 September 2023 (UTC)[reply]
  •   Support This page shows only rounded numbers, it's hard to understand how this can be used to persecute specific editors. AndyVolykhov (talk) 20:41, 17 September 2023 (UTC)[reply]
  •   Support, by issuing this map WMF did a lot of harm to normal wikimedians. Not a clever decision was it. --Ssr (talk) 15:53, 18 September 2023 (UTC)[reply]
  •   Strong oppose --Novak Watchmen (talk) 17:51, 18 September 2023 (UTC)[reply]
    As far as I see, most supporters try to provide explanation for their position. I would expect explanation for the opposing ("strongly opposing", no less) votes as well. Deinocheirus (talk) 18:43, 18 September 2023 (UTC)[reply]
    Just a vote? --SHB2000 (talk | contribs) 12:33, 28 September 2023 (UTC)[reply]
  •   Support Сокрытие общих статистических данных приносит несколько вариантов вреда. 1. Сторонние пользователи статистики могут сделать ложные выводы о роли других государств в жизни ру-википедии. Именно это сейчас и происходит: ко мне за последние пол-года обращалось несколько человек с упрёком в том, что русскоязычную википедии редактируют исключительно участники из-за границы России. 2. Сами пользователи википедии не имеют инструмента для анализа работы ру-википедии. К примеру, я время от времени делаю анализ активности в разных разделах википедии и всякий раз мне приходится обходить статистику русскоязычного раздела стороной - она дефектная и не отражает действительности. 3 таких вариантов вреда можно найти много. А вот пользы я не могу найти. Общая статистика по посещаемости и по активности разных стран в жизни ру-википедии не может угрожать авторам википеледии, но наносит огромный урон имиджу ру-википедии и отношению к ней со стороны простых читателей. По сути, глядя на кривую статистику они (не понимая её сущности) прямо произносят: значит Путин был прав, когда говорил, что западные силы вмешиваются в работу интернета и в том числе википедии. И мне, как викепидисту, к которому часто обращаются с вопросами о функционировании википедии, нечего ответить, веди внешне вылядит именно так, как говорит Путин. Сокрытие общей статистики играет на руку врагам википедии в России. VladimirPF (talk) 18:24, 18 September 2023 (UTC)[reply]
  •   Support. Obvious example of security theater. DenBkh (talk) 19:12, 18 September 2023 (UTC)[reply]
  •   Support. RG72 (talk) 08:38, 19 September 2023 (UTC)[reply]
  •   Strong support. 1) Such protection poses more dangers for users in Russia. 2) Censorship, hiding data and misleading are not the wiki way.--Ctac (talk) 08:59, 19 September 2023 (UTC)[reply]
  • Partially support per Alex Spade. Кстати, не знал, что наш раздел правят из ЮАР, Чили, Мали и Иордании, интересно.--⚡𝙎𝙥𝙚𝙚𝙙 𝙤𝙛 𝙇𝙞𝙜𝙝𝙩⚡ / СО 13:28, 19 September 2023 (UTC)[reply]
  •   Support In an omited way: WMF should disclose the number of editors in a country, however community should decise on a limit where number of editors become statistical secrecy.--A09 (talk) 18:37, 19 September 2023 (UTC)[reply]
  •   Support. Hiding statistics can not help to protect editors but creates lots of inwanted rumors. I am smiling (talk) 06:23, 21 September 2023 (UTC)[reply]
  •   Support. Very confusing Statistics now. — The preceding unsigned comment was added by Kaiyr (talk)
  •   Support. Wikimedia Stats was designed to be a source of accurate and reliable information, and one that wasn't supposed to cause legal/moral issues on the ground – hiding the stats for RU, BY and KZ does the exact opposite of that. As a tangent, what was the reason for hiding the stats apart from potential political issues? --SHB2000 (talk | contribs) 12:33, 28 September 2023 (UTC)[reply]
  •   Support Definitely support. First, what is the reason for hiding, are there ane reasons? It gives misinformation, misrepresentation, it just gives false information, it's just a lie. Also, what is the condition for including territories or not? Why Kazakhstan is blinded? And Ukraine is not blinded? Is there any clear explanation? The-ultimate-square (talk) 15:30, 1 October 2023 (UTC)[reply]
    The conditions are listed on the page of the Country Protection List. In a nutshell, the country either has to be her marked as “Not Free” by Freedom on the Net organization or to have low scores according to Reporters without Borders’ report. Written by TakingOver // Talk // Contributions 15:55, 1 October 2023 (UTC)[reply]
    Что конфидециального в общем количестве праков конкретной википедии? Какие конфидициальные данные можно раскрыть, если указано, что редакторы из России сделали +100500 правок в течении месяца или года? VladimirPF (talk) 16:29, 1 October 2023 (UTC)[reply]
    Этого я сам искренне не понимаю, см. мой комментарий на этот счёт выше. Я просто ответил на один из заданных участником The-ultimate-square вопросов, вот и всё. Не я устанавливал это, мягко говоря, странное правило — не мне знать, кого именно поможет вычислить данное максимально абстрактное число. Written by TakingOver // Talk // Contributions 16:57, 1 October 2023 (UTC)[reply]
  •   Strong support Nothing to add — IмSтevan talk 23:26, 3 October 2023 (UTC)[reply]
  •   Strong support per The-ultimate-square. And it looks much more like discrimination - not protection. -- Екатерина Борисова (talk) 12:27, 10 October 2023 (UTC)[reply]
  •   Weak support. В любом случае следует поставить дисклеймер вверху этой карты, где будут обозначены причины, почему на карте скрыты пользователи из РФ, РБ, РК и тд. Сейчас эта карта скорее полезна разного вида конспирологам, которые неоднократно публиковали дезинформацию о Википедии. --Mitte27 (talk) 10:00, 14 October 2023 (UTC)[reply]
  •   Strong support but ensuring that all measures to protect Russian and Belarusian editors will be enacted. I think that statistics shouldn't be hidden. As a result of the blanking the reputation of Wikipedia in Russia is harmed because that map, in this form, is used to produce falseful claims about Wikipedia and it's community in the pro-Russian media. I also believe that pageviews statistics for the 35 countries should be available (except China and North Korea) given that there is enough anonymisation in the previous standard. NikosLikomitros (talk) 15:02, 19 October 2023 (UTC)[reply]
  •   Support This will not stop people from being arrested, and there is no other purpose for it. QuicoleJR (talk) 14:38, 2 November 2023 (UTC)[reply]
  •   Support Better to also do so for zh projects. --Liuxinyu970226 (talk) 12:06, 12 November 2023 (UTC)[reply]
    If you means Mainland China, definitely no. It is a terrible idea. Thanks. SCP-2000 16:02, 8 December 2023 (UTC)[reply]
    Why? Mainland China has so many zhwiki users that it's impossible to get information about certain user from this stats too. MBH (talk) 18:05, 8 December 2023 (UTC)[reply]
    Most of Mainland China users access Wikipedia through VPN (i.e. other countries). Only very few users can bypass Great Firewall and directly access through some weird technical methods. Thanks. SCP-2000 03:11, 9 December 2023 (UTC)[reply]
    @SCP-2000 If you don't be afraid of arresting by policies, just oppose me, so that these users who use that thing can be feel free to disclosed, and targeted as arrest targets. Liuxinyu970226 (talk) 04:25, 7 January 2024 (UTC)[reply]

WMF response

edit
  • Hello all! My name is Hal, and I'm a senior privacy engineer at WMF. Thank you all for raising this important issue. The Privacy teams at the Foundation were already in the process of updating the Country Protection List prior to this RfC, and community voices and feedback are very helpful in that work. We’re aware the current approach is flawed in its inability to recognize nuance, and isn’t meeting the needs of some communities. At the same time, WMF cares a lot about what data we release and how we release it, because even statistics that seem to be anonymous can in reality be highly revealing. Our goal is to provide accurate information about our projects, in line with our value of transparency, without putting our community in danger. We will continue our work updating the Country Protection List — we plan on releasing more information in a safe way and to create a process which ensures a consistent approach that takes nuance into account as needed. We hope that this work will address some of the concerns raised at this conversation. Stay tuned for more updates. --HTriedman (WMF) (talk) 19:32, 27 November 2023 (UTC)[reply]

    At the same time, WMF cares a lot about what data we release and how we release it, because even statistics that seem to be anonymous can in reality be highly revealing.

I possibly understand how the statistic could be revealing in small communities (e.g. if one user stops editing and then there is one less user from a country — this deanonymizes them). But Russian Wikipedia community is just too large for being able to deduce any meaningful deanonymizing information from it.
I also see that the current statistic rounds up each value to a multiple of 10 — doesn't this absolutely destroy such a possibility of deanonymization?
Well very well (talk) 06:58, 28 November 2023 (UTC)[reply]

  • Greetings Hal, thank you for your informative update on the ongoing efforts to refine the Country Protection List. It's reassuring to witness the Foundation's commitment to transparency and the acknowledgment of the need for nuance in this crucial matter. While the privacy concerns you highlighted are indeed paramount, some community members may seek further clarification on how revealing the number of Wikipedia users from a specific country could pose a risk.
    One consideration that comes to mind is the potential for Wikitech personnel to access IP information, raising concerns about the misuse of such data for malicious purposes. – Mariâ Magdalina (talk) 11:04, 28 November 2023 (UTC)[reply]
    Hi @Мария Магдалина: and @Well very well: These are great questions, thank you both for raising them! I think there are two considerations here —
    (1) As we release this data, WMF wants to avoid en:Data re-identification and en:Reconstruction attacks, which are both made possible by the "Fundamental Law of Information Recovery". This law states that "Giving overly accurate answers to too many questions will inevitably destroy privacy," and holds true even when rounding to the nearest multiple of 10. There are many examples of this being exploited in the wild, the highest-profile of which was an attack that showed it was possible to reconstruct (on a person-by-person level) 30-40% of the US population strictly based on high level aggregated statistics released in the 2010 US Census (explainer). To that end, we will be using en:Differential privacy to statistically mask the contributions of individual editors.
    (2) Since we were already in the process of rethinking the country protection list to account for differential privacy, we will release any additional data about Russian, Belorussian, and Kazakh contributor activity alongside other updates to that policy. --HTriedman (WMF) (talk) 20:02, 28 November 2023 (UTC)[reply]
    @HTriedman (WMF) Based in the rethinking process for the country protection list, is there any possibility that we will see again pageviews statistics about some countries which are safer for editors (e.g. Uzbekistan)? For example, Uzbekistan, where the state actively promotes Wikipedia. Countries like Uzbekistan have millions of pageviews, which effectively anonymises the statistics fully. Regarding Russian, Chinese, Belarusians editors all measures to protect them should be taken, but for countries like Uzbekistan, I think pageviews statistics should be available. NikosLikomitros (talk) 13:26, 2 December 2023 (UTC)[reply]
    Hi @NikosLikomitros! Yes, using differential privacy to release pageview statistics by country about some countries on the CPL is something we're anticipating doing in the near future. Whether or not Uzbekistan is released depends on the final CPL policy we adopt, but it's possible. HTriedman (WMF) (talk) 19:02, 2 December 2023 (UTC)[reply]
    @HTriedman (WMF) Thank you for the response. I am happy to see that at least for some countries, a change may be done. If this can be told in public, in how much months from now we should expect approximately the final decisions on the CPL's final list and policy? NikosLikomitros (talk) 00:00, 3 December 2023 (UTC)[reply]
    Hi @NikosLikomitros — this is an active area of work, and is moving relatively quickly. The timeline depends on a variety of internal approval processes within WMF, but I would expect to see some final approvals early next calendar year. HTriedman (WMF) (talk) 18:56, 4 December 2023 (UTC)[reply]
    Thank you for informing us. I am looking forward to see which will be the countries. NikosLikomitros (talk) 19:44, 4 December 2023 (UTC)[reply]
  •   Comment This seems to be more of a UX issue, the maps produced must make it clear these 12 countries have been removed not treat them as "no editor" visually. Doing it shouldn't be too hard TBH. Amir (talk) 19:54, 6 December 2023 (UTC)[reply]
    This has been changed now. Would that answer your concerns? Amir (talk) 13:22, 7 December 2023 (UTC)[reply]
    I don't see how that fixes the issue mentioned in the opening of the RfC. In absence of correct information, false information is easier to spread. Without statistics about the real editors of ru.wiki, propagandists can easily make up their own without fear of being contradicted. Nemo 07:37, 8 December 2023 (UTC)[reply]
    Well, it does counteract the most bizarre propagandistic statement that "Wikipedia is only edited in Ukraine/USA/Western Europe" -- but the DP solution would still be much better... Well very well (talk) 08:36, 8 December 2023 (UTC)[reply]
  • 25 Jan 2024 Update: Hello all! I'm coming back to this thread with three updates.
  1. Firstly, as of this week the WMF Legal Privacy, Privacy Engineering, and Human Rights teams have updated the Foundation's country protection list policy. It's now known as the Country and Territory Protection List, and it states WMF can now publish statistics about some medium and higher risk countries using stringent differential privacy guarantees. That includes Russia, Belarus, and Kazakhstan.
  2. Secondly, we published an explanatory Diff post about this policy change, which includes some graphs from an exploratory differentially-private analysis of ruwiki editing behavior from 2018-2023.
  3. Finally, we're actively working on integrating this policy into existing differentially-private data releases, such as daily DP geo-pageviews and monthly DP geo-editors. I'm hoping to have the new policy reflected in those data releases by early next month.
Please let me know if you have any questions, and thank you for the feedback in this RfC! HTriedman (WMF) (talk) 22:38, 25 January 2024 (UTC)[reply]
@HTriedman (WMF) Thank you for the information. Is there any timeframe regarding when statistics will become available again? This month or one of the next? Because, I see that Tajikistan, Palestine and Macau (though it will be protected alongside 7 countries) are also not available. NikosLikomitros (talk) 12:14, 1 February 2024 (UTC)[reply]
Please note that Macau is not available as this territory deemed highest risk. Thanks. SCP-2000 14:45, 1 February 2024 (UTC)[reply]
@SCP-2000 Yes, I know it, and I have noted it above. I agree that China and Macau should be protected. But I am curious how some countries that are not in the protection list are also not appearing. NikosLikomitros (talk) 16:41, 1 February 2024 (UTC)[reply]
Hi @NikosLikomitros! So sorry for the long wait to respond to your comment here — this month, I've been working on integrating the new Country and Territory Protection list into our existing differentially-private data releases. I'm happy to announce that going forward, medium and higher risk countries will be included in the DP geo-pageviews (from 15 Feb 2024 on) and the DP geoeditors_monthly data releases (from February 2024 on). For more information about the specifics of the datasets and the CTPL-related updates, please consult the READMEs of the respective data releases (geo-pageviews, geoeditors_monthly).
Unfortunately, the datasets are not as easily-accessible as stats.wikimedia.org or pageviews.wmcloud.org, but you can download the data for a given day or month as a TSV file, or else use code from the example notebook I've created for demonstration purposes to analyze this data. Hope this helps answer you question! (edit: posted early by accident) HTriedman (WMF) (talk) 00:13, 22 February 2024 (UTC)[reply]
@HTriedman (WMF) No problem, it's good that finally we receive again statistics under DP rules. I am very keen with Wikimedia statistics and dynamics, that's why I am very interested on the issue. These datasets give some data but are difficult to navigate. Is it possible that those statistics will go live on March (as stats.wikimedia.org reflected the previous country protection list when June statistics when available)? NikosLikomitros (talk) 01:20, 22 February 2024 (UTC)[reply]
@NikosLikomitros WMF is currently working on figuring out long-term ownership of these datasets, and we need to figure that out prior to making them available to these kinds of websites — so I'm honestly not sure when they will go live on those websites. HTriedman (WMF) (talk) 16:44, 22 February 2024 (UTC)[reply]