Meta:Requests for limited adminship/Leaderboard (2)

The following discussion is preserved as an archive of a closed Meta-Wiki request. Please do not modify it.

Leaderboard (2) edit

Not ending before 5 March 2023 17:13 (UTC)

Hi, I want to request indefinite limited adminship on this wiki. This is for a very narrow and specific purpose, and (to my knowledge) cannot be fulfilled in any other way. If people think a different right would suit this use-case better, please let me know.

What I want to do is to create filters solely for log and testing purposes.

  • This needs to be done on meta, since the goal is to be able to log edits from various wikis. The plan is to feed this data into a neural network for testing and validation purposes, with the goal to automatically catch LTAs eventually.
  • This cannot be done on any other wiki, since the filter needs to be global. The need for production data means that the Beta Cluster cannot be used either.
  • I already have abuse filter helper, which grants view-only access to all Meta filters (and abuse filter logs globally). I hence only need to be able to create and edit filters. AFM is not appropriate for my use-case.
  • The expectation is that all filters would be log-only, since the actual work would be done with the program (code for this is private). Existing filters wouldn't be affected in any way, and the filters would not affect users either.
  • The MediaWiki API or wikitech:EventStreams cannot be used due to the lack of an equivalent for added_lines (diff is not a substitute).

--Leaderboard (talk) 17:13, 26 February 2023 (UTC)[reply]

@Leaderboard putting aside the appropriateness of adding more global logging filters at all - this seems like it is easily accomplished by you just asking to have a filter added at RFH. — xaosflux Talk 17:36, 26 February 2023 (UTC)[reply]
@Xaosflux: Asking at RFH is what I had in mind and would normally have done that - but having to ask you multiple times could be annoying (and is not something I'd prefer to do if possible) - as I may need to tweak the filters many times (i.e, the logging conditions) from time to time. If there is an issue with adding global logging filters (not aware of any such documentation against that, but could be wrong), that's an different issue. Leaderboard (talk) 18:07, 26 February 2023 (UTC)[reply]
@Leaderboard the later is case-by-case, any filter can impact overall performance / condition limits. — xaosflux Talk 21:21, 26 February 2023 (UTC)[reply]

For the record: Past requests [1][2][3]. Personally I don't feel comfortable supporting adminship for anyone who still has this [4] on their user page. Mentioning other users on a global user page in a personal conflict is not ok in my opinion. --Johannnes89 (talk) 17:49, 26 February 2023 (UTC)[reply]

Comment Comment This is going outside the scope of granting admin rights or rights for abuse filters for the purposes which we typically assign the right. I think it needs a different sort of sign-off than through a review by meta community. So many questions about the data, and writing filters for the purposes of processing outside of the control of WMF. I would much prefer this discussed with the security team through a phabricator ticket, and the developer community sign off rather than starting here. So at this stage I am not inclined to support without that review. They may also be able to address the shortcomings envisaged that are highlighted in the alternate approaches that are considered not possible. Seems a first principle approach is needed, rather than jumping to a solution with other consequences.  — billinghurst sDrewth 22:49, 26 February 2023 (UTC)[reply]

@Billinghurst: If I understand correctly, you would rather have me do something like contact WMF Trust and Safety? It would be helpful if you could point me to the right place on Phabricator for this type of request then. Leaderboard (talk) 04:27, 27 February 2023 (UTC)[reply]
Not Trust and Safety, as it is not a people issue. I see it a technical issue for discussion, and I am not as fully in the loop as I used to be on which group is the right group. I believe it sits with whatever the security team is currently called. Cite your problem in the phabricator ticket, your sort of output that you are looking to generate, and let the technical bunnies work out the direction. The WMF bugmeister should be able to direct you, or people with dual hats like MusikAnimal, Whatamidoing and Martin Urbanec can often guide us through the WMF mechanics as needed.  — billinghurst sDrewth 05:14, 27 February 2023 (UTC)[reply]

Hi @Leaderboard:, let me ensure I understand your goal/needs here correctly. You would like to have abuselog-level detailed information about all (most?) edits, and you intend to achieve that goal by entering a new log-only global abuse filter, which would match all/most edits, effectively giving you access to detailed information about all/most edits via the abuse log (where you're most interested in the added_lines abuse filter variable). If my understanding is correct, I'm opposed to that solution, for the following reasons:

  • AbuseFilter stores the detailed information it generates permanently, to enable Special:AbuseLog to retrieve them at any time. This is fine for regular abuse filters, as those never match a significant amount of edits. A global abuse filter that would intentionally match the majority of edits will increase the number of stored information by several orders of magnitude (to put this into perspective: in January 2023, there were 2037 of Wikidata edit attempts that hit a global filter; in the same month, Wikidata had nearly 24 millions of total edits).
  • To me, this approach seems like trying to fit a huge circle into a cube. It is theoretically possible (with a lot of wasted space), but there are better solutions, such as using a circular storage space. In other words: There are a lot of potential changes in AbuseFilter itself which could break the intended usage (for example, changing how filter throttling works, in a way that'd stop the catch-all filter).

In my opinion, there are other solutions that would provide you with the same information:

  1. Generating the data you need yourself. Especially if added_lines is the only information you need, this sounds to be reasonable easy to do. To get a stream of added_lines, you could store wikitext for all articles, subscribe to recent changes via EventStreams and calculate diffs on the fly for each edit; for a small-to-mid-size wiki (like most global abuse filter clients), the storage consumed by the wikitexts should be bearable. If you want bulk data access, I'd suggest exploring Wikimedia Enterprise's hourly diffs (freely accessible via Wikimedia Toolforge). Those diffs could work well; they have wikitext/HTML for each edit that happened in that hour; if you download enough of the hourly diffs, you should be able to generate diffs for a sufficient number of edits.
  2. Requesting a new public schema added to EventStreams, which would stream abusefilter-generated data for all edits. It could be called something like mediawiki.abusefilter-data, providing real-time access to AbuseFilter-generated information (similar to mediawiki.revision-score, which provides ORES scores). We'd need to double-check feasibility here (considering some wikis restrict availability of Special:AbuseLog). However, I think that so long only AF data would be included (without any information about filter hits), it would be fine.

As Billinghurst said, it's best to fill a task here, exploring potential solutions to get you the kind of data you need. Hope this helps!

Sincerely, --Martin Urbanec (talk) 12:56, 27 February 2023 (UTC)[reply]

@Leaderboard, can you tell me why you want to do this? For example, are you trying to detect edits that add new paragraphs to articles? Whatamidoing (WMF) (talk) 18:31, 27 February 2023 (UTC)[reply]
@Whatamidoing (WMF) and Martin Urbanec:, I would prefer to discuss the specifics off-wiki - would a private Phabricator task be a better choice? Leaderboard (talk) 04:50, 28 February 2023 (UTC)[reply]
@Leaderboard Sure. Feel free to create one. Martin Urbanec (talk) 13:56, 28 February 2023 (UTC)[reply]
Or send me an e-mail message, and I can add Martin to my reply? Whatamidoing (WMF) (talk) 17:49, 28 February 2023 (UTC)[reply]
I can file a task tomorrow; please hold or withdraw this request as appropriate in the meantime. Leaderboard (talk) 17:18, 2 March 2023 (UTC)[reply]
@Leaderboard I'm marking this as withdrawn for you. As noted above, the core of this discussion has forked mostly to considerations if the action you want to do if you had this access is appropriate for any admin to do at all. Please add me to your ticket as well. You can always reintroduce this access request in the future. — xaosflux Talk 11:39, 3 March 2023 (UTC)[reply]

Request withdrawn per above. — xaosflux Talk 11:39, 3 March 2023 (UTC)[reply]

The above request page is preserved as an archive. Please do not modify it. Comments about this page should be made in Meta:Babel or Meta:Requests for help from a sysop or bureaucrat.