Talk:IP Editing: Privacy Enhancement and Abuse Mitigation/IP Info feature

Active discussions

What is not available at this stage? And Communication IssuesEdit

Can I check whether this is an intermediate step which is "let's make it easier, and provide some consistency, by providing information about IP addresses", but which won't (of itself) hide any information?

That is, a non-admin can still use the IP to look up the address (to the degree currently possible), but won't have the information readily given to them? Or is it a non-intermediate step and the IP address will become hidden?

@PSaxena (WMF): I realise you're not the comms lead for the team, but this desparately needed to be further spread - it only just now got put onto the discussion page on IP masking project, and that's the most linked page. I've not seen any mention on broader meta pages, let alone the larger community tech pages Nosebagbear (talk) 00:31, 19 May 2021 (UTC)[]

@Nosebagbear: Your first interpretation is correct - this is an intermediate step. No IPs will be hidden. Sorry for not spreading this more widely. We are actually talking about this project currently on some other wikis (like wikidata and wikivoyage). I admit fault for not notifying about this on the main IP masking page. I will rectify this in the coming week. This project is very much under development and your opinion is welcome. -- NKohli (WMF) (talk) 20:24, 23 May 2021 (UTC)[]

Stack of questions from a frequent proxy blockerEdit

Lots of questions and thoughts here.

  • What source or sources are being looked at for the "might be a proxy" determination? Speaking as one of the active folks at enwiki's WikiProject on Open Proxies, the reliability of different proxy-checking tools/websites varies quite widely. We haven't had any official support (that I know of) in the proxy-hunting department, so I'm very interested to hear what your plan is and how it compares to what we're doing day-to-day. You've said Since this project will be Foundation-maintained, it will probably be much more reliable than some websites our users are dependent on currently. - that does not comfort me. To be blunt, history suggests that a Foundation-maintained project will be delivered half-complete and then left unmaintained while the developers move on to the next shiny project.
  • "Source IP is a colo" would be another useful indicator - colos and other hosting arrangements aren't proxies, strictly speaking, but they're still useful to hide one's origins. A bit of scope creep to add a field for each aspect of interest, so might be worth just having a list of "interesting properties" like colo, open proxy, tor, etc.
  • Right now, we take a fairly proactive approach to VPNs and other anonymizing tools, including a couple of bots to hunt them (w:en:User:ST47Bot reports some, and w:en:User:Procseebot blocks some) fairly wide blocks of known webhost/colo ranges (both softblocks and hardblocks, depending on factors like "are there a lot of VPNs/proxies in their range" and "how sketchy does the colo provider seem"), that sort of thing. How will that work under this new model?
  • Being able to see the activity on a range is a very important tool, both for admins and non-admins - lets us see how much collateral damage a block would do, how often people seem to change IPs, that kind of thing. The proposed feature (and the IP privacy enhancement in general) is focused on individual IPs, which doesn't help much on dynamic ranges.
  • An idea of static-ness of an IP would be useful in determining how long to block for. See comments in the bullet above. I just noticed static/dynamic in the list of exposed data...not perfect (dynamic IPs can vary quite a bit in how "dynamic" they really are) but it's being considered at least.
  • We are going to be consolidating IPv6 /64s into single anons...right? Please.
  • Shameless plug: I actually wrote a tool with a few of these features a while ago. w:en:User:GeneralNotability/ip-ext-info. I use it routinely, so adding this kind of information, even without anonymizing IPs, would be quite useful.
GeneralNotability (talk) 00:49, 19 May 2021 (UTC)[]
More generally: adding this as a normal feature to Mediawiki would be quite useful. It will not, however, solve all of the problems introduced by the IP-hiding proposal. GeneralNotability (talk) 00:54, 19 May 2021 (UTC)[]
I mostly concur with the General here; some additional notes from my side:
  • It's pretty hard to parse this proposal because it's interwoven with IP masking. If masking wasn't a thing, it would be easier to comment on it (and I wouldn't have many demands) – but since it seems like the output of this tool will be the only thing that 99% of Wikipedia users ever see about an IP, it will have to get a lot of things right.
  • I want to point to my comment about proxy blocks and collateral in task T265845, which may be relevant to implementing this (no details here per w:en:WP:BEANS). A red/yellow/green light system (which I'm largely opposed to -- proxy checking requires lots of interpretation; the more condensed the data becomes, the more useless it is for me) would have to be based on APIs that differentiate between different types of proxies and implement Wikipedia-specific heuristics to prevent people from blocking everything that isn't in "the west" based solely on an API result.
  • Range information is crucial when making IP blocks in an anti-abuse context. For the majority of ISPs in the world, a single-IP block won't stop any determined abuser for long. Information about (ASN-)CIDR ranges would be very handy to have on-hand to track and block abusers on dynamic IPs (when IP masking comes, this would likely have to be restricted to those with access to unmasked IPs). Both the ASN CIDR and the underlying wider ranges (if applicable) should be displayed in this case – in many cases, the ASN CIDR does not accurately reflect the ranges people are actually floating around on.
  • This seems to be mostly looking at things from an antivandalism perspective, but in my experience as someone who used to do lots of antivandalism patrolling and now handles 90+ percent of ACC's proxycheck queue and a good chunk of incoming w:en:WP:WPOP reports, I can tell you that I never really looked at a WHOIS output when doing antivandalism patrolling, nor do I know many patrollers who do. Many people who hit the rollback button dozens of times each day don't know what a /64 is. The people who would probably be relying on this tool the most are checkusers, SPI clerks, admins who make lots of rangeblocks and proxy checkers. Consider actively seeking these groups out and consulting with them – these are the people who will feel the impact of IP masking (and it will be very substantial) the most, and who will benefit the most from these added features. Best, Blablubbs|talk 18:54, 21 May 2021 (UTC)[]
@Blablubbs I will encourage you to look at this proposal independent of IP Masking. It is not intended as the one solution to all the problems. We will be building more tools to address more of the challenges that come with IP Masking.
  • Our thinking behind condensing the data in the tool was to make it more easily understandable for people who are not really tech-savvy or don't completely understand the nitty-gritties of IPs (Spur seems to do something similar for the public IP check). I see your point about needing more detailed information and it is probably a common use case for most power users. We can possibly make an "Advanced" mode for the tool for users who need more information and expose everything we can about the IP.
  • About ASN CIDR - noted. Will see if we can incorporate that.
  • That's a good point. We have put it through a few rounds of user testing with admins and checkusers from different language communities which has helped us shape this. As we make more progress, we will invite more people to look over this and provide feedback.
Thank you for your replies, both here and on T265845. -- NKohli (WMF) (talk) 14:22, 26 May 2021 (UTC)[]
@GeneralNotability I am sorry about your experience with Foundation-maintained products. I admit fault for being involved in some of those in the past. Sometimes when teams move on from projects, it is not really up for them to decide what they will be working on. I am hopeful that IP Info won't meet a similar fate. This project is getting the attention it deserves from the WMF. I'll respond to your specific points below:
  • We discovered the same thing about variance in data from different IP information sites. Our long-term plan is to use multiple paid services to show a range of information. For the first iteration of the tool we are going to use the paid version of MaxMind. We picked MaxMind as it seemed to be a reliable service that is also used by other teams in the WMF. I will be happy to hear if you have suggestions about which other sites do you find most reliable when you look into this data? So far I've heard about Spur being liked by some community members. Also trying to look for services that reliably cover diverse geographical areas. Translation is probably too good to hope for.
  • About "colo" - I have not seen this pop-up in any IP services I have used. Is it covered under a different label?
  • Regarding proactively blocking VPNs and other anonymizing tools -- there is a lot of potential for internalizing that functionality within MediaWiki with the advent of IP Info. When we get to a stage where we can rely on IP Info data - we can use that data to flag or auto-block potential bad actors. This could show up in RecentChanges/Log/History pages. This could even potentially tap into the Notification system to alert admins.
  • Yeah, I am not quite sure yet how we could enhance this feature to work well for IP ranges. We are not planning to take away IPs completely from admins and patrollers who rely heavily on IPs on a day-to-day basis. So most people who require this information will not lose out on the ability to track IP ranges. I will make a new update on the IP Masking project page to post our plan in the near future.
  • For static verus dynamic: We get a variance % from MaxMind. We are planning to expose that better. A new mockup will be coming forth soon.
  • "We are going to be consolidating IPv6 /64s into single anons...right?" <--- This sounds to me like something we would need a community RfC on. Do you think most people would share this request? We could probably do this technically but I will have to consult with people who know more about code than I do.
  • This is great! We have been searching high and low for scripts people have built. How do I see it in action? I installed it in my global.js but don't see the icon I am supposed to be seeing...
Thanks for all your feedback. It is really valuable. -- NKohli (WMF) (talk) 13:47, 26 May 2021 (UTC)[]
NKohli (WMF) Thanks for the response, and sorry for the delay in replying - I've been on vacation the past couple weeks.
  • Colos might also be called colocation hosts, datacenters, en:VPS services, or (for some) webhosts. We generally use "colo" as a catchall term. It's a fairly wide range of services, but it boils down to somewhere that you either park your own server(s) or buy a share of someone else's servers, and as part of that the users are using the colocation host's IP block. We usually softblock them (and I believe we have bots that block them automatically), since there are some legitimate uses of those services. However, some are sketchier than others, and some of them are absolutely crawling with VPN endpoints - those get the hardblock treatment.
  • I'm all for working on automatic blocking of proxies, but emphasize Blablubbs's comment above about how the dynamic-ness (and proxy-infested-ness) varies based on ISP and part of the world. If we autoblock everything showing as a possible proxy, Indonesia's going to end up permanently blocked. This is something that will need some thinking.
  • For IPv6: yes, treating a /64 as a single IP is fairly standard practice. See, for example, enwiki's advice at en:WP:/64. There are two main reasons for this. First, /64 is the minimum IPv6 range assignment (I have seen a few cases where it looks like an ISP is assigning smaller ranges within a /64, but that's exceedingly rare, and WHOIS always says the assigned block is a /64). My understanding is that the average (western, at least) ISP delegates a /64 per residential customer, avoiding any need for NAT. Second, blocking individual IPv6 ISPs almost never sticks, since computers routinely shuffle the latter half of their IP address (part of IPv6 privacy extensions, see en:IPv6#Stateless_address_autoconfiguration_(SLAAC) and en:IPv6_address#Temporary_addresses for the gory technical details). Basically: IPs are assigned randomly within the /64, they can change every $amount_of_time and/or every time the computer connects to a new network, and you don't even have to reset your router to grab a new IP. If a /64 is not considered as a unit, it is both near-impossible to assess the editor's contributions over time and trivially easy to grab a new IP to get around the block (and there's effectively zero chance that they'll be back on the first blocked IP, given that we're talking about 2^64 possible IPs).
  • First rule of demos: tell someone "go try my new tool" and it breaks. It wasn't checking links to Special:Contributions, which is what shows up in history views, that should now work - if you look at IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation/IP_Info_feature's history, you now should see a little globe icon next to the IP address who made the random comment last month.
GeneralNotability (talk) 20:35, 6 June 2021 (UTC)[]
Re /64s: My experience is the same as GN's. I've been told that some ("western") providers apparently do shared /64s in some regions, but they still function as /64s (similar to a shared IPv4). Some providers, often Asian mobile networks, don't seem to do /64s at all (or people hop across them so quickly that it doesn't matter), but even in those cases, bundling will generate zero collateral because the address space is so vast and the /64 is so tiny in comparison to the underlying ranges (think /32 and larger). I would consider this a purely technical matter, not something that needs community consultation. Blablubbs|talk 20:51, 6 June 2021 (UTC)[]
It depends largely on the project whether proxies are blocked or not. Yes, I am aware that LTA's use proxies extensively. I also know that the polish wikipedia does autoblock proxies by a bot. This should be an config option.--Snævar (talk) 12:37, 7 June 2021 (UTC)[]

Good proxiesEdit

The IP Info tool should distinguish between different types of proxies, and allow suitably trusted users to classify them.

Wikimedia projects rely on IP blocks to constrain or at least slow down disruption from unregistered and logged-out users. So it's understandable that there are policies and countermeasures against services which make it easier for people to hop to alternate IP addresses.

But not all "proxies" allow IP hopping or have user anonymization as their primary goal. Some will responsibly set the X-Forwarded-For header to show where the user is really coming from. And, if I'm at work and accessing the web through a corporate-sanctioned, 3rd party security service, I can't easily turn that off or route myself through a different one of their datacentres.

It's increasingly common to use cloud-based security providers for protection against malware and phishing (plus they may offer additional security and performance services like CASB and global peering). This is often part of a defense-in-depth approach of browser plugin + endpoint agent + DNS filtering + content filtering on firewall + external security proxy. But for some devices, like iPads, you can't install a plugin or agent, and your only choice is an edge or external filter. Services that provide full TLS inspection include Broadcom (Symantec / BlueCoat) and Netskope. (If you're reading this and know of others, please comment below.)

If I'm behind my employer's NAT with 400 other people, is that different from my employer being behind a provider with 100's of other companies? Either way, I can't easily IP-hop. The difference is scale and the amount of collateral damage from a rangeblock. For more info, please have a look at User talk:Thsdb#Block of Netskope worldwide IP range (request for local unblock on Meta, granted) and Steward requests/Global#Global unblock for 163.116.128.0/17 (request for global unblock, on hold). I would like to see Movement policy and attitudes evolve to recognize how different providers have different characteristics, and not all are block-worthy, but that won't happen if the tools lump everything into a single VPN/proxy/Tor=yes judgement.

You'll never stop playing whack-a-mole with anonymizing services like Nord and Express – their purpose is to move around and avoid blocklists. (Even Mozilla foundation is promoting Mozilla VPN from Mullvad.) But the security services often have well-documented address ranges, e.g. [1].

Mediawiki users, both Wikimedia projects and other stakeholders, need to be able to classify and list known-good ranges in addition to known-bad ones. It's often easier to learn the former than to chase the latter.

// thsdb [formerly ThscDrb] (talk) 02:54, 29 May 2021 (UTC)[]

User:NKohli (WMF), it's encouraging to see at Phab:T269760 that MaxMind might return an "isLegitimateProxy" value. I wonder about MediaWiki installations that can't or won't use Maxmind. // thsdb [formerly ThscDrb] (talk) 06:15, 2 June 2021 (UTC)[]
@Thsdb We're writing the code in such a way that MaxMind can be interchanged for a different service, if desired. But that would change the data that is returned, as expected. NKohli (WMF) (talk) 12:05, 7 June 2021 (UTC)[]

Admin/Checkuser just to see organization/location information is excessiveEdit

This information is crucial for public transparency efforts detecting when organizations are editing. Projects such as CongressEdits can only work because they know the rough location/organization that an originating IP edited from. Restricting it to admins/checkusers would mean that public interest journalism on this topic would be dead - a very serious detrimental tradeoff! 69.172.145.94 21:22, 18 June 2021 (UTC)[]

Hi there. I agree about your point for increasing public transparency but I am sure you agree with me that such transparency should not come at the cost of loss of privacy. You probably know this better than me, but CongressEdits has been marred with controversies and the bot has been suspended by Twitter due to violation of policies.
We can build tools to do what CongressEdits did but do so in a way that does not expose IP addresses to lots of people. We can look into opening up access to the data to more people as we see interest and need for that information. Thanks for your comment. -- NKohli (WMF) (talk) 00:42, 25 June 2021 (UTC)[]
Niharika, I'm very interested by the latter, and would like to see that, with regard to "such transparency should not come at the cost of loss of privacy", I'm not sure I can agree as it rests. Neither of us, I believe, would say "sacrifice all transparency for all privacy; or sacrifice all privacy for absolute transparency". Democracy dies in darkness on the former, and the latter brings the issues that the later releases of Wikileaks caused.
However, this is neither of those - IP Masking as a whole, and thus the specifics within it, fall into the messy gray ground in the middle. IP information is one of the most broadly shared pieces of information in the modern world - we give it to every site we visit. Wikipedia is unusual in the breadth of its spread, but also in the breadth of transparency we provide...and need. In a way, it is the moral obligation we gladly accept as the price for becoming the one-stop shop for information for so much of the world. The need for transparency rises in line with the cost of its absence - the reason it's so critical for politicians applies to us as we grow.
Though Legal didn't deign to participate in this critical debate on the broader topic is itself a blow to the nature of Wikimedia, but that doesn't mean it should be viewed as an either/or settled matter in the execution nuances Nosebagbear (talk) 16:35, 25 June 2021 (UTC)[]
Return to "IP Editing: Privacy Enhancement and Abuse Mitigation/IP Info feature" page.