Community Wishlist Survey 2022/Larger suggestions/Searching of edit summary

Searching of edit summary

  • Problem: There should be some way to search all edits with a specific editsummary. In Wikidata this can also be used to track a series of edits (such as a edit group).
  • Proposed solution: create a functionality to search edit summaries via SQL (though may not scale for large wikis like Wikidata), or search them via ElasticSearch.
  • Who would benefit:
  • More comments:
  • Phabricator tickets: phab:T60698
  • Proposer: GZWDer (talk) 20:53, 10 January 2022 (UTC)[reply]

Discussion

I think it's already possible to do this with Quarry (e.g. simple request). — putnik 22:16, 10 January 2022 (UTC)[reply]
That doesn't really scale well though. Especially if you need to do a full text search (instead of exact field match) or if you want some fuzzy matching. A query with a like predicate already easily takes over 2 minutes on English wikipedia on a partial search. —TheDJ (talkcontribs) 13:17, 11 January 2022 (UTC)[reply]
Edit summary search is a partial solution but it only works if you know the user name and is not easily discoverable nor well integrated into the wikis. Certes (talk) 18:35, 11 January 2022 (UTC)[reply]
This is ok, but we have to rely on like custom edit summaries and log reasons. Any edit summary were non-frequently used can't be searched in this way. We can search tags and edit filters log, but it's hard to implement. Thingofme (talk) 08:34, 21 January 2022 (UTC)[reply]
  • Wikiget (CLI) has this feature, similar to Edit summary search, for specific users and times. The search can be regex, and be include or exclude ex. Show all edits for Jimbo during 9/11 when the edit-comment started with 'A' wikiget -u "Jimbo Wales" -s 20010911 -e 20010911 -i "^A". -- GreenC (talk) 19:40, 12 January 2022 (UTC)[reply]
  • @GZWDer: I'm going to go ahead and say this is something that we probably can't put into production MediaWiki because the reasons TheDJ mentions above. This is precisely something that is better served as an external tool, where we are allowed to have queries that run over 30 seconds. Sigma's edit summary search tool exists, but it doesn't have localization and doesn't search log summaries. So I think what Community Tech can offer is to build this feature into XTools, along with support for log entries. Would that satisfy your wish? MusikAnimal (WMF) (talk) 17:48, 17 January 2022 (UTC)[reply]
    • @MusikAnimal (WMF): I said that ElasticSearch may help.--GZWDer (talk) 17:50, 17 January 2022 (UTC)[reply]
      Pinging @EBernhardson (WMF) from the Search Platform team for input. Do you think this is something we could do? There are many more edits than there are articles, and compared to normal search, searching edit summaries is probably not a feature that would be used very much. I don't know if the storage/infrastructure needs of ElasticSearch would be warranted here, but it would certainly help with speed and make a production deployment of this feature more feasible. Thanks for any insight you can provide, MusikAnimal (WMF) (talk) 18:19, 17 January 2022 (UTC)[reply]
      Average edits per page are about 1 to 20 but most edis summaries are much shorter than page contents. This is even true when all logs are also indexed (most logs have no summary).--GZWDer (talk) 20:03, 17 January 2022 (UTC)[reply]
      As an underlying technology, elasticsearch is probably a reasonable way to do this kind of search. CirrusSearch, the extension to integrate elasticsearch and MW search, on the other hand doesn't really have anything to handle this.
      The difficulty is that wiki search is page-based, not revision-based. The only place within the existing search system to put these would be to generate a property per page with the full history of edit messages. This is problematic as we have to generate these a few hundred times a second (edits, re-renders due to templates, re-renders due to age, etc.). Due to the way CirrusSearch and Elasticsearch work together it is not possible to only provide the new edit summary to append (or at least, not in a reliable way).
      One method of doing this appropriately would mean creating new indices that have a search document for every revision ever created by mediawiki, and processes to maintain those indices going forward. I worry that this will then run into numerous complications integrating with mediawiki spam prevention. Today Search has a very easy role in spam prevention, we only index the latest content and we trust editors to correct spam in the live pages. Anything revision based will need to integrate with the spam prevention tools and ensure content is supressed.
      It would be a bit of an undertaking, and it would have ongoing maintenance costs, but in summary I think elasticsearch could do this given enough investment.
      EBernhardson (WMF) (talk) 17:35, 18 January 2022 (UTC)[reply]
      Moving to "Larger suggestions" per above. This certainly isn't something our team could take on. But I will probably still add a edit/log summary search into XTools at some point, so look forward to that :) MusikAnimal (WMF) (talk) 03:19, 28 January 2022 (UTC)[reply]

Voting