IP編集:プライバシー保護と荒らし行為軽減/ツールの改善

This page is a translated version of the page IP Editing: Privacy Enhancement and Abuse Mitigation/Improving tools and the translation is 62% complete.

背景

このプロジェクトの目標は2つあります。

第1に、荒らし、嫌がらせ、ソックパペット（多重アカウント）、長期荒らし、偽情報の流布その他の妨害行為からプロジェクト群を守ること。
第2に、匿名の利用者のIPアドレスを非表示にすることで、迫害や嫌がらせ、いじめから守ること。

プロジェクトのトークページほかで行なった協議で聞いたところによれば、プロジェクト群におけるIPアドレスの使い方には以下のようなものがあります：

「近くの」編集者を探すには、IP アドレスが有効。IP 範囲が同じもしくは近似な利用者が見つかる。
未登録編集者の編集履歴を調べるため
複数のウィキ間をまたぐ投稿をIPアドレスで調べるため
VPN もしくは Tor ノード経由で編集を試みているかどうか調べるため
編集者の地理的位置（大学・企業・政府機関など所属先の類推を含む）を調べるため
長期にわたる不正行為者 (LTA) かどうか、IP アドレスを使って検証するため
特定の種類のスパム対策に特化した不正利用フィルターに使用することがある
IP アドレスは範囲指定ブロックのために重要

これ等のワークフローはたいていの場合2つのアカウントが同一人物に操られているかどうかを確認する際に使われます（sockpuppet detectionとも）。 IPアドレスを使用してソックパペットを発見するプロセスには欠陥があります。IPはオンライン化する人・デバイスの増加により一層ダイナミックになっています。IPv6アドレスは複雑であり、範囲を把握するのが大変です。新規参加者には、IPアドレスは覚えにくい意味のない数字の羅列のように見えるのです。そのため、新規参加者がブロックやフィルターでIPアドレスを使用することに慣れるには長い時間と努力が必要になってしまいます。

Our goal is to reduce our reliance on IP addresses by introducing new tools that use a variety of information sources to find similarities between users. In order to ultimately mask IP addresses without negatively impacting our projects, we have to make visible IP addresses redundant to the process. This is also an opportunity to build more powerful tools that will help identify bad actors.

ツールのアイデア

利用者が必要な作業を行うためにIPアドレスから求めている情報をより簡単に取得できるようにしたいと考えています。それを実現するために、検討中の3つの新しいツール/機能があります。

IP情報機能の動作例。

1. IP情報機能

この機能は進行中です。こちらでフォローできます : IP情報機能。

There are some critical pieces of information that IP addresses provide, such as location, organization, possibility of being a Tor/VPN node, rDNS, listed range etc. Currently, if an editor wants to see this information about an IP address, they would use an external tool or search engine to extract that information. We can simplify this process by exposing that information to trusted users on the wiki. In a future where IP addresses are masked, this information will continue to be displayed for masked usernames.

One concern we've been hearing from users we've talked to so far is that it is not always easy to tell whether an IP is coming from a VPN or belongs to a blacklist. Blacklists are fragile – some are not very updated, others can be misleading. We are interested to hear in what scenarios would it be helpful for you to know if an IP is from a VPN or belong to a blacklist and how do you go about looking up that information that right now.

利点:

This would eliminate the need for users to copy-paste IP addresses to external tools and to extract the information they need.
We expect this will cut down on the time spent on fetching the data considerably too.
In the long run, it would help reduce our dependence on IP addresses, which are hard to understand.

リスク:

Based on the implementation, we risk exposing information about IPs to a larger group of people than just the limited set of users who are currently aware of how IP addresses operate.
Depending on what underlying service we use for getting the details about an IP, it is possible that we may not be able to have translated information, but show information in English.
There is a risk of users misunderstanding if the organization/school was behind the edit, rather than the individual who made the edit.

2. 似たような編集者を探す

To detect sockpuppets (and unregistered users), editors have to go to great lengths to figure out if two users are the same. This involves comparing the users’ contributions, their location information, editing patterns and much more. The goal for this feature will be to simplify this process and automate some of these comparisons that can be made without manual labor.

This would be done with the help of a machine learning model that can identify accounts demonstrating a similar behavior. The model will be making predictions on incoming edits that will be surfaced to checkusers (and potentially other trusted groups) who will then be able to verify that information and take appropriate measures.

We could potentially also have a way to compare two or more given unregistered users to find similarities, including seeing if they are editing from nearby IPs or IP ranges. Another opportunity here is to allow the tool to automate some of the blocking mechanisms we use – like automatic range detection and suggesting ranges to block accordingly.

A tool like this holds a lot of possibilities—from identifying individual bad actors to uncovering sophisticated sockpuppeting rings. But there is also a risk of exposing legitimate sock accounts who want to keep their identity secret for various reasons. This makes this project a tricky one. We want to hear from you about who should be using this tool and how can we mitigate the risks.

With the help of the community, such a feature can evolve to compare features that editors currently use when comparing editors. One possibility is also to train a machine learning model to do this (similar to how ORES detects problematic edits).

Here’s one possibility for how such a feature might look in practice:

Finding similar editors with IPs
Finding similar editors with masked IPs

利点:

Such a tool would greatly reduce the time and effort from our functionaries to find bad-faith actors on our projects.
This tool could also be used to find common ranges between known problem editors to make blocking IP ranges easier.

リスク:

If we use Machine Learning to detect sockpuppets, it should be very carefully monitored and checked for biases in the training data. Over-reliance on the similarity-index score should be cautioned against. It is imperative that human review be part of the process.
Easier access to information such as location can sometimes make it easier, not more difficult, to find identifiable information about someone.

3. データベースに長期不正利用者を記録

長期的な不正行為の荒らしが文書化されているとすれば、手動でウィキに文書化されています。これには編集行動の傾向、編集対象の記事、多重アカウントを特定するときの指標、使用された全IPアドレスの一覧表などの情報が含まれます。これら荒らしによって使用された全IPアドレスの記録は多数ページにまたがるため、情報が利用可能だとしても、必要なときに関連情報を見つけるのはどんどん大変な作業になっていきます。長期的な不正利用者を文書化するデータベースの構築により、これをより効率的に実行できるかもしれません。

Such a system would facilitate easy cross-wiki search for documented vandals matching search criteria. Eventually, this could potentially be used to automatically flag users when their IPs or editing behaviors are found to match those of known long-term abusers. After the user has been flagged, an admin could take necessary action if that seems appropriate. There is an open question about whether this should be public or private or something in-between. It is possible to have permissions for different levels of use for read and write access to the database. We want to hear from you about what would you think would work best and why.

長期不正利用者データベースを検索した場合の、仮の見た目。

コスト：

そのデータベースはコミュニティの皆さんの参加を得て、現状で把握している長期不正行為者を登録してもらいます。ウィキによっては相当な作業量になるかもしれません。

利点：

Cross-wiki search for documented long-term abusers would be an enormous benefit over the current system, reducing a lot of work for patrollers.
Automated flagging of potentially problematic-actors based on known editing patterns and IPs would come in handy in a lot of workflows. It would allow admins to make judgements and actions based on the suggested flags.

リスク:

As we build such a system, we would have to think hard about who has access to the database data and how we can keep it secured.

これらの提案はまだごく早期の段階です。ぜひ皆さん、アイデアのブレインストーミングを手伝ってください。費用対効果やリスクについて、見落としはありませんか？改善点があるとしたら、どこでしょうか？皆さんのご意見をトークページでお待ちしています。

編集者が使っている既存のツール

オンウィキのツール

チェックユーザー：チェックユーザー機能は、checkuserフラグを持つ利用者が、保存されている利用者、IPアドレスあるいはCIDR範囲に関する機密情報へアクセスできるようにします。このデータには特定の利用者が使ったIPアドレス群、特定のIPアドレスもしくは範囲から編集をした全利用者、特定のIPアドレスもしくは範囲から投稿された編集全て、ユーザーエージェント文字列、X-Forwarded-For ヘッダを含みます。ソックパペットを検出するために最もよく使用されます。
チェックユーザーは同じメールアドレスで50個以上のアカウントを持つ利用者にアクセスできるようになります。そのような人物の存在が phab:T230436 で（タスク自体は無関係ですが）確認されました。直接的には IP のプライバシーに影響を及ぼしませんが、より厳格な不正行為の管理をやや減衰する可能性があります。

プロジェクト固有のツール (ボットとスクリプトを含む)

ツールを使用したプロジェクト名を書き、その挙動と、可能ならリンクを示してください

外部ツール

ToolForge ツール

Intersect contribs
WHOIS and reverse DNS
Editor interaction analyser – Analyse interactions between two or three users – activity on same pages, during the same time etc.
IPCheck: Allows you to look up information about an IP address including if it is a a proxy, tor node or potential VPN.
GUC – Global user contributions for any user.
Reverse DNS for a range

サードパーティーのツール

大規模な IP アドレスのブロック：http://www.nirsoft.net/countryip/cz.html
ユーザーエージェント文字列の検出：http://www.useragentstring.com/
Nmap
Spamhaus リストおよび XBL (ブラックリストの活用)
Talos – IP 評価 (主にメールスパム対象)