IP編輯:增強隱私和解決濫用的措施

This page is a translated version of the page IP Editing: Privacy Enhancement and Abuse Mitigation and the translation is 51% complete.

概要

近年来,互联网用户逐渐意识到,理解他们的个人信息被收集和使用的情况十分重要。多国政府制定了法律来保护用户的隐私。维基媒体基金会的法律和公共政策团队正持续关注全球法律的发展,并研究我们如何才能在保护用户隐私、尊重用户期待方面做到最好,同时维护维基媒体运动的价值观。在这种背景之下,我们需要调查并开始对项目进行技术改进。我们需要和你们协作完成这项工作。

MediaWiki现在在页面历史和日志中公示未注册贡献者的IP地址。毫无疑问会损害其匿名性,甚至会给其带来政府迫害的风险。尽管我们已经告知IP地址会被公开,很少人了解信息公开的后果如何。我们正在着手增强未注册贡献者的匿名性,去识别化其IP地址,就好比读者看不到已注册用户的IP地址一样。这将引入一个「IP掩蔽」用户名,自动生成,人类可读。我们正在讨论具体如何实现。您可以 在评论区留下看法

维基媒体计划有很合理的来存储并公开IP地址,即IP地址在反破坏和反骚扰中扮演重要角色。巡查员、管理员和工作人员藉此识别并封禁破坏者、傀儡账号、有利益冲突的编者以及其他不当参与者。

我们希望和你们一起,找出保护用户隐私的方法,同时让我们的反故意破坏工具与它们现在的工作方式保持一致。这其中最重要的部分是开发新的反破坏工具。一旦这一工作完成,我们会把工作重点移至隐藏IP地址,包括限制能看到其他用户IP地址的人数、减少IP地址在数据库和日志中存储的时间。需要注意的是,这项工作的一个关键部分是确保我们的维基仍然能使用同等或更高级别的反破坏工具,并且不会面临滥用的风险。

维基媒体基金会的目标是建立一套反破坏的工具,使得所有人都能直接获取IP地址的需求不复存在。随着反破坏工具的改进,我们将能够隐藏未注册用户的IP地址。我们很清楚这种改变会影响目前的反破坏工作方式,并且希望保证新的工具仍能有效地反破坏,保护项目不受破坏侵扰且支持社群监督。

我们只有通过与用户查核员、监管员、管理员以及其他参与反破坏工作的人士合作,才能达到这个目标。

这是非常具有挑战性的问题。如果我们失败了,那么保护维基不受破坏的能力就会受到影响,这也是多年来这个项目被不断推迟的原因。但是由于互联网上不断演进的数据隐私标准、新的法律以及用户不断变化的期待,维基媒体基金会认为现在正是解决这个问题的时候。

更新

2021年8月30日

大家好,这里是关于葡萄牙语维基百科自禁止未注册用户编辑之后,相关统计数据的更新。我们在影响报告页面发布了详细报告。本报告包含从数据中获取的统计指标以及我们在葡萄牙语维基百科活跃编者中进行的调查。

总而言之,这份报告呈现出积极的变化。在数据收集期间,我们没有发现任何明显的负面影响。有鉴于此,我们鼓励在两个或更多项目上进行实验以观察是否会发生类似的变化。所有的项目都有自己的情况,在葡萄牙语维基百科上能成立的情形在其他项目上未必也能成立。我们想在两个项目上进行有期限的实验,禁止未注册用户的编辑。我们估计大约需要8个月事件来收集足够的数据以便观察到较为显著的变化。此后,我们会停止实验,允许未注册用户编辑,同时分析收集到的数据。一旦数据发布,社群可以自行决定他们是否继续禁止未注册用户编辑。

我们称此为禁止IP编辑实验。您可以在该页上查看时间线和详细信息。请使用该页及其讨论页就这项实验进行更多讨论。

2021年6月10日

大家好,本项目从上次发布更新以来已经有几个月了。这期间我们与许多用户进行了对话,包括编者社群和基金会内的人。我们和资深社群成员就此项目对反破坏工作的影响进行了讨论,对于讨论中提出的各个忧虑,我们都仔细小心地加以考虑。另外还有许多人支持这项提议,认为它是改进未注册用户的隐私以及减少暴露IP带来的法律风险的一步。

我们曾经不清楚这个项目会是怎样的,当时们的目的在于理解IP地址对于社群的意义。此后我们在对话中收到了很多关于这个问题的反馈,有许多不同的语言、不同的社群参与了讨论。我们非常感谢所有付出时间,让我们了解在他们的维基站点或跨维基环境中,反破坏工作是如何有效进行的。

仅向需要的人公示IP地址的提议

关于这个计划,我们现在已经有一些较为具体提议,可以确保反破坏工作进行同时避免向无关的人披露IP地址。“请注意这是「提议」,不是将要发生的事或者最终决定,我们只是想问问您的意见如何。——您认为哪一点比较好?哪一点不可行?有何其他方法?” 我们和熟练的维基社区成员讨论了,有了以下想法,和我们的法规部门合作,完善了这些想法。 概要如下:

  • 用户核查员, 监管员和管理员可以查看完整的IP地址,仅在他们在设置中选择同意不与未授权者共享之后。
  • 参与反破坏活动的维基人,经社区批准后,可以获得查看IP地址权利。这可以通过与我们项目的管理员类似的方式来处理。这些维基人需要有一个至少已经建立一年的帐户,以及至少500次编辑。
  • 所有拥有至少一年的帐户和至少500次编辑的用户都可以访问IP的一部分,IP地址的尾部八位字节隐藏,仅在他们在偏好设置同意不与未授权者共享之后。
  • 所有其他用户无法读取未注册贡献者的IP地址。

获取IP地址的操作将会被记录下来,以备需要时检查。这和现在记录用户查核操作的日志类似。我们希望藉此平衡用户的隐私需求和社群反破坏工作中获取信息的需求。我们希望能向有需要的人提供信息,但同时需要一个流程,使得只有真正需要的人才能获得这些信息,并且获取操作会被记录下来。

我们希望听取您对这个提议的意见。请在讨论页发表看法。

  • What do you think will work?
  • What do you think won’t work?
  • What other ideas can make this better?

工具开发更新

您可能已经知道,我们正在开发一些新的工具,用来减轻屏蔽IP地址带来的影响,同时对所有人来说也是更好的反破坏工具。我们的项目所能提供给社群的反破坏工具有许多不足,这不是秘密,这些工具有许多可以改进的方面。我们希望开发一种工具让进行反破坏工作的社群成员能更高效地工作。我们也希望降低参与反破坏工作的门槛。

我们此前已经谈过这些关于这些工具的一些想法,我会在下方提供近期的更新。需要注意的是,近几个月以来,由于我们团队正对安全投票进行重大修改以满足即将举行的WMF理事会选举,这些工具的开发进度有所减缓。

IP 资讯功能

 
IP 资讯的一个样例

IP地址相关的信息非常常用,我们正在开发一个显示这些信息的工具。巡查员、管理员和用户查核员目前依靠外部网站来获取这些内容。我们希望通过将这些信息整合到我们的网站中,以便简化查询IP信息的过程。我们最近完成了该工具原型并进行了一轮用户测试以检验我们的方法。我们发现大多数参与采访的编者认为这个工具很有用,并表示有意愿在未来继续使用。您可以阅读该项目页面上的最新消息。我们希望就以下问题征求意见

  • 检查IP的编辑时,会寻找哪些信息?会利用哪些页面来查看这些信息?
  • 对您来说最有用的信息是什么?
  • 当您与他人分享IP相关的信息时,您觉得哪种信息可能将匿名编者置于风险中?

编者匹配功能

这个项目在之前的讨论中也被称为“附近编者”或“傀儡探测”。我们尝试给它起一个合适的名字,以便不熟知“傀儡”一词的人理解。

目前本项目处于早期阶段。维基媒体基金会研究计划中有一个项目就是关于辅助探测两个具有相似编辑特征的编者。这将会有助于将使用不同自动生成用户名的未注册编者联系起来。在我们刚开始讨论这个项目的时候,就有许多支持本项目的声音。我们也得知开发这个功能的一些风险。

我们计划近期开发出一个原型并与社群分享。关于此项目,有一个十分粗糙的项目页面。我们希望可以尽快更新该页。如果您有任何关于这个项目的想法,请至项目讨论页留言。

Data on Portuguese Wikipedia disabling IP edits

Portuguese Wikipedia banned unregistered editors from making edits to the project last year. Over the last few months, our team has been collecting data about the repercussions of this move on the general health of the project. We have also talked to several community members about their experience. We are working on the final bits to compile all the data that presents an accurate picture of the state of the project. We hope to have an update on this in the near future.

Previous updates

30 October 2020

We have updated the FAQ with more questions that have been asked on the talk page. The Wikimedia Foundation Legal department added a statement on request to the talk page discussion, and we have added it here on the main page too. On the talk page, we have tried to explain roughly how we think about giving the vandal fighters access to the data they need without them having to be CheckUsers or admins.

15 October 2020

This page had become largely out of date and we decided to rewrite parts of it to reflect where we are in the process. This is what it used to look like. We’ve updated it with the latest info on the tools we’re working on, research, fleshed out motivations and added a couple of things to the FAQ. Especially relevant are probably our work on the IP info feature, the new CheckUser tool which is now live on four wikis and our research into the best way to handle IP identification: let us know what you need, the potential problems you see and if a combination of IP and a cookie could be useful for your workflows.

工具

Like mentioned previously, our foremost goal is to provide better anti-vandalism tools for our communities which will provide a better moderation experience for our vandal fighters while also working towards making the IP address string less valuable for them. Another important reason to do this is that IP addresses are hard to understand and are really very useful only to tech-savvy users. This creates a barrier for new users without any technical background to enter into functionary roles as there is a higher learning curve for them to work with IP addresses. We hope to get to a place where we can have moderation tools that anyone can use without much prior knowledge.

The first thing we decided to focus on was to make the CheckUser tool more flexible, powerful and easy to use. It is an important tool that services the need to detect and block bad actors (especially long-term abusers) on a lot of our projects. The CheckUser tool was not very well maintained for many years and as a result it appeared quite dated and lacked necessary features.

We also anticipated an uptick in the number of users who opt-in to the role of becoming a CheckUser on our projects once IP Masking goes into effect. This reinforced the need for a better, easier CheckUser experience for our users. With that in mind, the Anti-Harassment Tools team spent the past year working on improving the CheckUser tool – making it much more efficient and user-friendly. This work has also taken into account a lot of outstanding feature requests by the community. We have continually consulted with CheckUsers and stewards over the course of this project and have tried our best to deliver on their expectations. The new feature is set to go live on all projects in October 2020.

The next feature that we are working on is IP info. We decided on this project after a round of consultation on six wikis which helped us narrow down the use cases for IP addresses on our projects. It became apparent early on that there are some critical pieces of information that IP addresses provide which need to be made available for patrollers to be able to do their roles effectively. The goal for IP Info, thus, is to quickly and easily surface significant information about an IP address. IP addresses provide important information such as location, organization, possibility of being a Tor/VPN node, rDNS, listed range, to mention a few examples. By being able to show this, quickly and easily without the need for external tools everyone can’t use, we hope to be able to make it easier for patrollers to do their job. The information provided is high-level enough that we can show it without endangering the anonymous user. At the same time, it is enough information for patrollers to be able to make quality judgements about an IP address.

After IP Info we will be focusing on a finding similar editors feature. We’ll be using a machine learning model, built in collaboration with CheckUsers and trained on historical CheckUser data to compare user behavior and flag when two or more users appear to be behaving very similarly. The model will take into account which pages users are active on, their writing styles, editing times etc to make predictions about how similar two users are. We are doing our due diligence in making sure the model is as accurate as possible.

Once it’s ready, there is a lot of scope for what such a model can do. As a first step we will be launching it to help CheckUsers detect socks easily without having to perform a lot of manual labor. In the future, we can think about how we can expose this tool to more people and apply it to detect malicious sockpuppeting rings and disinformation campaigns.

You can read more and leave comments on our project page for tools.

計劃動機

We who are working on this are doing this because the legal and public policy teams advised us that we should evolve the projects’ handling of IP addresses in order to keep up with current privacy standards, laws, and user expectations. That’s really the main reason.

We also think there are other compelling reasons to work on this. If someone wants to help out and don’t understand the ramifications of their IP address being publicly stored, their desire to make the world and the wiki a better place results in inadvertently sharing their personal data with the public. This is not a new discussion: we’ve had it for about as long as the Wikimedia wikis have been around. An IP address can be used to find out a user’s geographical location and institution and other personally identifiable information, depending on how the IP address was assigned and by whom. This can sometimes mean that an IP address can be used to pinpoint exactly who made an edit and from where, especially when the editor pool is small in a geographic area. Concerns around exposing IP addresses on our projects have been brought repeatedly by our communities and the Wikimedia movement as a whole has been talking about how to solve this problem for at least fifteen years. Here’s a (non-exhaustive) list of some of the previous discussions that have happened around this topic.

We acknowledge that this is a thorny issue, with the potential for causing disruptions in workflows we greatly respect and really don’t want to disrupt. We would only undertake this work, and spend so much time and energy on it, for very good reason. These are important issues independently, and together they have inspired this project: there’s both our own need and desire to protect those who want to contribute to the wikis, and developments in the world we live in, and the online environment in which the projects exist.

研究

 
由維基媒體基金會支持的關於IP掩蔽對我們社群造成影響的報告。

IP掩蔽的影响

IP地址作为半可靠的部分性的身份标志是有价值的,它不能被用户自己轻易地更改。然而由于网络服务提供商及设备配置的原因,IP地址提供的信息并不总是可靠,而且需要较深的技术知识和熟练度才能有效运用IP地址信息,虽然管理员目前不需要证明他们具有这样的能力。这些技术信息也可用于支撑额外的信息(所谓的“行为信息”),并可以显著影响最终实施的管理操作。

在社群方面,是否允许未注册用户编辑是一个长久以来激烈辩论的话题。到目前为止,这些讨论在允许未注册用户编辑方面存在一些问题。这类讨论通常围绕着如何制止破坏进行,而非围绕保留伪匿名编辑的权限并降低编辑门槛展开。由于未注册用户往往与破坏有关,编辑者对于他们往往存在偏见,这也在诸如ORES的工具的算法中有所体现。另外,与未注册用户的沟通中也存在较大的问题,这主要是与缺乏通知有关,而且也不能保证同一个人能够持续关注发送至该IP讨论页的消息。

关于隐藏IP地址的潜在影响,IP地址被隐藏以后,管理员的工作流程会受到显著影响,并且可能在短期内增加用户查核员的工作量。我们预计管理员控制破坏的能力会受到很大影响。不过,我们可以通过提供等同或更好的反破坏工具来降低这种影响,然而老工具过渡到新工具的则需要一定的过渡期,在此期间管理员的反破坏效率会不及以往。为了给管理员们提供合适的工具,我们必须小心地保留或提供某些当前依赖IP信息运作之功能的替代品:

  • 封禁的有效性及预估的附加封禁设置
  • 在未注册用户之间展现出相似性或固定模式的方法,例如地理相似性,某些机构(例如某些编辑是来自同一所高中或大学)
  • 标记出一组未注册用户的能力,例如在特定IP段内不断改变的IP破坏者
  • 限定位置或特定机构的操作(未必是封禁),例如确定编辑是来自开放代理或学校、图书馆一类的开放场所。

根据我们处理临时账户和识别未注册用户的方法,我们或许可以提升与未注册用户的沟通效果。如果我们隐藏IP地址,并保持未登录用户的编辑权限,则对未注册编辑、匿名破坏以及对未注册用户的偏见的相关讨论和担忧不太可能发生重大变化。

用户查核员工作流程

我们在设计新的 Special:Investigate 工具的过程中,与多个计划上的用户查核员进行了交流。通过这些交流,以及实际经历过真实的案例后,我们将通用的用户查核工作流程分为五个部分:

  • 分类评估:分析案例的查核可行性及复杂性。
  • 画像:描述用户的行为模式,以便用来辨别多个账号背后的人。
  • 查核:使用用户查核工具检查IP地址和用户代理。
  • 判断:将技术信息与画像步骤中建立的行为信息进行比对,确定需要采取何种管理措施。
  • 结束:将查核结果在公开及非公开(如有需要)平台报告,并将信息适当存档以便将来使用。

我们也和信任与安全团队成员合作,了解用户查核工具在维基媒体基金会的调查以及需要该团队处理的案件中的作用。

The most common and obvious pain points all revolved around the CheckUser tool's unintuitive information presentation, and the need to open up every single link in a new tab. This cause massive confusion as tab proliferation quickly got out of hand. To make matters worse, the information that CheckUser surfaces is highly technical and not easy to understand at first glance, making the tabs difficult to track. All of our interviewees said that they resorted to separate software or physical pen and paper in order to keep track of information.

We also ran some basic analyses of English Wikipedia's Sockpuppet Investigations page to get some baseline metrics on how many cases they process, how many are rejected, and how many sockpuppets a given report contains.

Patroller use of IP addresses

Previous research on patrolling on our projects has generally focused on the workload or workflow of patrollers. Most recently, the Patrolling on Wikipedia study focuses on the workflows of patrollers and identifying potential threats to current anti-vandal practices. Older studies, such as the New Page Patrol survey and the Patroller work load study, focused on English Wikipedia. They also look solely at the workload of patrollers, and more specifically on how bot patrolling tools have affected patroller workloads.

Our study tried to recruit from five target wikis, which were

  • Japanese Wikipedia
  • Dutch Wikipedia
  • German Wikipedia
  • Chinese Wikipedia
  • English Wikiquote

They were selected for known attitudes towards IP edits, percentage of monthly edits made by IPs, and any other unique or unusual circumstances faced by IP editors (namely, use of the Pending Changes feature and widespread use of proxies). Participants were recruited via open calls on Village Pumps or the local equivalent. Where possible, we also posted on Wiki Embassy pages. Unfortunately, while we had interpretation support for the interviews themselves, we did not extend translation support to the messages, which may have accounted for low response rates. All interviews were conducted via Zoom, with a note-taker in attendance.

Supporting the findings from previous studies, we did not find a systematic or unified use of IP information. Additionally, this information was only sought out after a certain threshold of suspicion. Most further investigation of suspicious user activity begins with publicly available on-wiki information, such as checking previous local edits, Global Contributions, or looking for previous bans.

Precision and accuracy were less important qualities for IP information: upon seeing that one chosen IP information site returned three different results for the geographical location of the same IP address, one of our interviewees mentioned that precision in location was not as important as consistency. That is to say, so long as an IP address was consistently exposed as being from one country, it mattered less if it was correct or precise. This fits with our understanding of how IP address information is used: as a semi-unique piece of information associated with a single device or person, that is relatively hard to spoof for the average person. The accuracy or precision of the information attached to the user is less important than the fact that it is attached and difficult to change.

Our findings highlight a few key design aspects for the IP info tool:

  • Provide at-a-glance conclusions over raw data
  • Cover key aspects of IP information:
    • Geolocation (to a city or district level where possible)
    • Registered organization
    • Connection type (high-traffic, such as data center or mobile network versus low-traffic, such as residential broadband)
    • Proxy status as binary yes or no

As an ethical point, it will be important to be able to explain how any conclusions are reached, and the inaccuracy or imprecisions inherent in pulling IP information. While this was not a major concern for the patrollers we talked to, if we are to create a tool that will be used to provide justifications for administrative action, we should be careful to make it clear what the limitations of our tools are.

常见问题

問:有高級權限的用戶(例如用戶查核員、管理員、監管員)在此項目完成之後仍然可以存取IP地址嗎?

答:我們對這個問題尚未有明確的答案。理想情況下,應使盡可能少的人(包括WMF員工)能夠接觸到IP地址。我們希望僅限那些有需要的用戶能夠查看IP地址。

問:反破壞工具將如何在沒有IP地址的情況下運作?

答:目前有一些潜在的想法可以完成这个目标。例如,我们显示用户的其他信息而非IP给那些工具,并且信息量与先前相同。另外,也可以考虑在不暴露IP地址的情况下,自动确认两个用户账户是否具有相同的IP,这可以用于傀儡调查。或许反破坏工具可以继续使用IP地址,但获取的方式和数量等会受到限制。我们需要与社群紧密合作,以期找出最佳解决方法。

問:如果我們看不到IP地址,那麼我們會看到未註冊用戶的編輯顯示結果是什麼?

答:用戶將能夠看到一個唯一、自動生成、人類可讀的用戶名,而不是IP地址。例如看起來像是「匿名用戶12345」。

問:是否會對每個未註冊的編輯都生成新的用戶名?

答:不會。我們打算實現一些方法,使得生成的用戶名至少部分持久,例如將它們與cookie、用戶IP地址或同時兩者相關聯。

問:該項目是否包括從wiki中移除現有的IP地址?

答:我們不會在此項目中隱藏任何現有的歷史記錄、日誌或簽名中的IP地址。它只會影響該項目啟動後的未來編輯。

问:这个项目是否是特定法律获通过导致的结果?

答:不是。全世界的许多国家和地区的数据隐私标准都在不断演进,这也符合用户的预期。我们一直努力保护用户隐私,我们将持续学习新的标准和预期,并将之付诸实践。这个项目是我们自身发展的下一步。

问:这个项目的时间线是怎样的?

答:如前所述,在获取社群意见之前,我们不会对这个项目作任何明确的决定。我们希望能制定一些合理的早期步骤,以供开发团队尽快开展工作,然后我们可以开始这个我们认为将会持续较常时间的项目,但我们不会急于赶上特定的截止时间。

问:我应如何参与?

答:如果你有关于本项目的想法或反馈,我们会很乐意听到您的看法。我们特别希望得到可能影响本项目的关于工作流程的想法。你可以在讨论页发表意见或填写这个表格,我们会联系你。我们团队中的一些人也会参加 Wikimania,欢迎在那里与你见面。

问:为什么这个提案如此模糊?

答:这其实不是一个真正的提案。我们没有万全之策,但是正在尽力和社区商讨。您或许可以把这个看作如何实现IP掩蔽的技术研究。

问:你们为什么不直接关掉IP用户的编辑权限?

答:IP用户的编辑在不同维基上的影响不同。瑞典语维基百科社区经过讨论,确定他们希望允许IP用户的编辑。日本语维基百科的IP用户编辑占比比英语维基百科要多,但是其回退率仅仅有其三分之一—9.5%对27.4%——意味着这些编辑质量更高。我们认为不应该在所有维基禁止IP编辑。关于IP编辑的研究也表明其有助于吸引更多编者加入。

问:所以现在谁可以看到未注册用户的IP?

答:我们不打算把这项工作负担仅仅交给用户核查员和监管员。我们计划设立一个新的用户权限,让满足特定要求的用户可以获权以查看IP地址,而其他人只能看到不完整的IP地址。我们还在和社区讨论怎么实现最好。

问:这已经决定了吗?

答:没错,已经决定了。维基媒体基金会的法务部门已经声明其必要性。为了保护维基媒体用户的私隐,维基媒体基金会接受了这一建议,正在和社区讨论最佳实现方式。一些维基人可能对此感到不悦,但是这一法律性的决定无关社区共识。社区可以参与决定「如何」实现,我们很需要维基媒体社区的参与。

问:「隐藏」是全域的,适用所有维基媒体项目,还是本地的,仅适用某个维基?

答:全域的。隐蔽的IP在所有维基媒体项目显示相同。

问:执行时会不会解封所有IP用户?

答:不会。这会对这些维基造成破坏。最终的方案一定是一个妥协。我们要在保护隐私和保护维基之间取得平衡。

问:有权查看未注册用户IP地址的用户是否能在一次操作中查看多于一个IP地址?

答:可以。如果有需要的话,我们不希望这一操作变成冗长费时的任务。我们会在提案中加入这一点。

维基媒体基金会法务部门的声明

2021年7月

First of all, we’d like to thank everyone for participating in these discussions. We appreciate the attention to detail, the careful consideration, and the time that has gone into engaging in this conversation, raising questions and concerns, and suggesting ways that the introduction of masked IPs can be successful. Today, we’d like to explain in a bit more detail how this project came about and the risks that inspired this work, answer some of the questions that have been raised so far, and briefly talk about next steps.

Background

To explain how we arrived here, we’d like to briefly look backwards. Wikipedia and its sibling projects were built to last. Sharing the sum of all knowledge isn’t something that can be done in a year, or ten years, or any of our lifetimes. But while the mission of the communities and Foundation was created for the long term, the technical and governance structures that enable that mission were very much of the time they were designed. Many of these features have endured, and thrived, as the context in which they operate has changed. Over the last 20 years, a lot has evolved: the way societies use and relate to the internet, the regulations and policies that impact how online platforms run as well as the expectations that users have for how a website will handle their data.

In the past five years in particular, users and governments have become more and more concerned about online privacy and the collection, storage, handling, and sharing of personal data. In many ways, the projects were ahead of the rest of the internet: privacy and anonymity are key to users’ ability to share and consume free knowledge. The Foundation has long collected little information about users, not required an email address for registration, and recognized that IP addresses are personal data (see, for example, the 2014–2018 version of our Privacy policy). More recently, the conversation about privacy has begun to shift, inspiring new laws and best practices: the European Union’s General Data Protection Regulation, which went into effect in May 2018, has set the tone for a global dialogue about personal data and what rights individuals should have to understand and control its use. In the last few years, data protection laws around the world have been changing—look at the range of conversations, draft bills, and new laws in, for example, Brazil, India, Japan, or the United States.

Legal risks

The Foundation’s Privacy team is consistently monitoring this conversation, assessing our practices, and planning for the future. It is our job to look at the projects of today, and evaluate how we can help prepare them to operate within the legal and societal frameworks of tomorrow. A few years ago, as part of this work, we assessed that the current system of publishing IP addresses of non-logged-in contributors should change. We believe it creates risk to users whose information is published in this way. Many do not expect it—even with the notices explaining how attribution works on the projects, the Privacy team often hears from users who have made an edit and are surprised to see their IP address on the history page. Some of them are in locations where the projects are controversial, and they worry that the exposure of their IP address may allow their government to target them. The legal frameworks that we foresaw are in operation, and the publication of these IP addresses pose real risks to the projects and users today.

We’ve heard from several of you that you want to understand more deeply what the legal risks are that inspired this project, whether the Foundation is currently facing legal action, what consequences we think might result if we do not mask IP addresses, etc. (many of these questions have been collected in the expanded list at the end of this section). We’re sorry that we can’t provide more information, since we need to keep some details of the risks privileged. “Privileged” means that a lawyer must keep something confidential, because revealing it could cause harm to their client. That’s why privilege is rarely waived; it’s a formal concept in the legal systems of multiple countries, and it exists for very practical reasons—to protect the client. Here, waiving the privilege and revealing this information could harm the projects and the Foundation. Generally, the Legal Affairs team works to be as transparent as possible; however, an important part of our legal strategy is to approach each problem on a case by case basis. If we publicly discuss privileged information about what specific arguments might be made, or what risks we think are most likely to result in litigation, that could create a road map by which someone could seek to harm the projects and the communities.

That said, we have examined this risk from several angles, taking into account the legal and policy situation in various countries around the world, as well as concerns and oversight requests from users whose IP addresses have been published, and we concluded that IP addresses of non-logged-in users should no longer be publicly visible, largely because they can be associated with a single user or device, and therefore could be used to identify and locate non-logged-in users and link them with their on-wiki activity.

Despite these concerns, we also understood that IP addresses play a major part in the protection of the projects, allowing users to fight vandalism and abuse. We knew that this was a question we’d need to tackle holistically. That’s why a working group from different parts of the Wikimedia Foundation was assembled to examine this question and make a recommendation to senior leadership. When the decision was taken to proceed with IP masking, we all understood that we needed to do this with the communities—that only by taking your observations and ideas into account would we be able to successfully move through this transition.

I want to emphasize that even when IP addresses are masked and new tools are in place to support your anti-vandalism work, this project will not simply end. It’s going to be an iterative process—we will want feedback from you as to what works and what doesn’t, so that the new tools can be improved and adapted to fit your needs.

Questions

Over the past months, you’ve had questions, and often, we’ve been unable to provide the level of detail you’re hoping for in our answers, particularly around legal issues.

What specific legal risks are you worried about?

We cannot provide details about the individual legal risks that we are evaluating. We realize it’s frustrating to ask why and simply get, “that’s privileged” as an answer. And we’re sorry that we cannot provide more specifics, but as explained above, we do need to keep the details of our risk assessment, and the potential legal issues we see on the horizon, confidential, because providing those details could help someone figure out how to harm the projects, communities, and Foundation.

There are settled answers to some questions.

Is this project proceeding?

Yes, we are moving forward with finding and executing on the best way to hide IP addresses of non-logged-in contributors, while preserving the communities’ ability to protect the projects.

Can this change be rolled out differently by location?

No. We strive to protect the privacy of all users to the same standard; this will change across the Wikimedia projects.

If other information about non-logged-in contributors is revealed (such as location, or ISP), then it doesn’t matter if the IP address is also published, right?

That’s not quite the case. In the new system, the information we make available will be general information that is not linked to an individual person or device—for example, providing a city-level location, or noting that an edit was made by someone at a particular university. While this is still information about the user, it’s less specific and individual than an IP address. So even though we are making some information available in order to assist with abuse prevention, we are doing a better job of protecting the privacy of that specific contributor.

If we tell someone their IP address will be published, isn’t that enough?

No. As mentioned above, many people have been confused to see their IP address published. Additionally, even when someone does see the notice, the Foundation has legal responsibilities to properly handle their personal data. We have concluded that we should not publish the IP addresses of non-logged-in contributors because it falls short of current privacy best practices, and because of the risks it creates, including risks to those users.

How will masking impact CC-BY-SA attribution?

IP masking will not affect CC license attribution on Wikipedia. The 3.0 license for text on the Wikimedia projects already states that attribution should include “​​the name of the Original Author (or pseudonym, if applicable)” (see the license at section 4c) and use of an IP masking structure rather than an IP address functions equally well as a pseudonym. IP addresses already may vary or be assigned to different people over time, so using that as a proxy for un-registered editors is not different in quality from an IP masking structure and both satisfy the license pseudonym structure. In addition, our Terms of use section 7 specify that as part of contributing to Wikipedia, editors agree that links to articles (which include article history) are a sufficient method of attribution.

And sometimes, we don’t know the answer to a question yet, because we’d like to work with you to find the solution.

What should the specific qualifications be for someone to apply for this new user right?

There will be an age limit; we have not made a definitive decision about the limit yet, but it’s likely they will need to be at least 16 years old. Additionally, they should be active, established community members in good standing. We’d like to work through what that means with you.

I see that the team preparing these changes is proposing to create a new userright for users to have access to the IP addresses behind a mask. Does Legal have an opinion on whether access to the full IP address associated with a particular username mask constitutes nonpublic personal information as defined by the Confidentiality agreement for nonpublic information, and will users seeking this new userright be required to sign the Access to nonpublic personal data policy or some version of it?
1 If yes, then will I as a checkuser be able to discuss relationships between registered accounts and their IP addresses with holders of this new userright, as I currently do with other signatories?
2 If no, then could someone try to explain why we are going to all this trouble for information that we don't consider nonpublic?
3 In either case, will a checkuser be permitted to disclose connections between registered accounts and unregistered username masks?

This is a great question. The answer is partially yes. First, yes, anyone who has access to the right will need to acknowledge in some way that they are accessing this information for the purposes of fighting vandalism and abuse on the projects. We are working on how this acknowledgement will be made;the process to gain access is likely to be something less complex than signing the access to non-public personal data agreement.

As to how this would impact CUs, right now, the access to non-public personal data policy allows users with access to non-public personal data to share that data with other users who are also able to view it. So a CU can share data with other CUs in order to carry out their work. Here, we are maintaining a distinction between logged-in and logged-out users, so a CU would not be able to share IP addresses of logged-in users with users who have this new right, because users with the new right would not have access to such information.

Presuming that the CU also opts in to see IP addresses of non-logged-in users, under the current scheme, that CU would be able to share IP address information demonstrating connections between logged-in users and non-logged-in users who had been masked with other CUs who had also opted in. They could also indicate to users with the new right that they detected connections between logged-in and non-logged-in users. However, the CU could not directly the share IP addresses of the logged-in users with non-CU users who only have the new right.

Please let us know if this sounds unworkable. As mentioned above, we are figuring out the details, and want to get your feedback to make sure it works.

Next steps

Over the next few months, we will be rolling out more detailed plans and prototypes for the tools we are building or planning to build. We’ll want to get your feedback on these new tools that will help protect the projects. We’ll continue to try to answer your questions when we can, and seek your thoughts when we should arrive at the answer together. With your feedback, we can create a plan that will allow us to better protect non-logged-in editors’ personal data, while not sacrificing the protection of Wikimedia users or sites. We appreciate your ideas, your questions, and your engagement with this project.

October 2020

This statement from the Wikimedia Foundation Legal department was written on request for the talk page and comes from that context. For visibility, we wanted you to be able to read it here too.

Hello All. This is a note from the Legal Affairs team. First, we’d like to thank everyone for their thoughtful comments. Please understand that sometimes, as lawyers, we can’t publicly share all of the details of our thinking; but we read your comments and perspectives, and they’re very helpful for us in advising the Foundation.

On some occasions, we need to keep specifics of our work or our advice to the organization confidential, due to the rules of legal ethics and legal privilege that control how lawyers must handle information about the work they do. We realize that our inability to spell out precisely what we’re thinking and why we might or might not do something can be frustrating in some instances, including this one. Although we can’t always disclose the details, we can confirm that our overall goals are to do the best we can to protect the projects and the communities at the same time as we ensure that the Foundation follows applicable law.

Within the Legal Affairs team, the privacy group focuses on ensuring that the Foundation-hosted sites and our data collection and handling practices are in line with relevant law, with our own privacy-related policies, and with our privacy values. We believe that individual privacy for contributors and readers is necessary to enable the creation, sharing, and consumption of free knowledge worldwide. As part of that work, we look first at applicable law, further informed by a mosaic of user questions, concerns, and requests, public policy concerns, organizational policies, and industry best practices to help steer privacy-related work at the Foundation. We take these inputs, and we design a legal strategy for the Foundation that guides our approach to privacy and related issues. In this particular case, careful consideration of these factors has led us to this effort to mask IPs of non-logged-in editors from exposure to all visitors to the Wikimedia projects. We can’t spell out the precise details of our deliberations, or the internal discussions and analyses that lay behind this decision, for the reasons discussed above regarding legal ethics and privilege.

We want to emphasize that the specifics of how we do this are flexible; we are looking for the best way to achieve this goal in line with supporting community needs. There are several potential options on the table, and we want to make sure that we find the implementation in partnership with you. We realize that you may have more questions, and we want to be clear upfront that in this dialogue we may not be able to answer the ones that have legal aspects. Thank you to everyone who has taken the time to consider this work and provide your opinions, concerns, and ideas.