Community Wishlist Survey 2023/Reading/Extend "Who Wrote That?" tool to more wikis

Extend "Who Wrote That?" tool to more wikis

  • Problem: It is extremely cumbersome to find out who wrote a specific part of the article, get an overview of how the current content maps to the various authors etc. History search based tools like WikiBlame make it merely very cumbersome. Who Wrote That? is a tool that provides a decent experience, but it is only available at a select few large Wikipedias.
  • Proposed solution: Extend Who Wrote That? to more wikis.
  • Who would benefit: Editors who need to track down problematic (or particularly excellent) content, wiki historians, researchers, readers suspicious about the reliability of a page etc.
  • More comments:
  • Phabricator tickets: T243711, T270490 T296590, T298007
  • Proposer: Tgr (talk) 08:18, 5 February 2023 (UTC)[reply]

Discussion

  • Personally I mainly care about huwiki, but the more, the merrier; I assume it makes more sense to do this in bigger blocks; whatever the team feels is achievable. (Eventually, would be nice to extend it to all Wikimedia wikis, except for Wikidata and Commons which are quite large and non-text based so that would be a waste of resources. Enwiki is about half of all wiki content and I imagine the resource cost for a tool like this scales superlinearly, so that doesn't seem like such a tall order.) --Tgr (talk) 08:18, 5 February 2023 (UTC)[reply]
    Thanks for creating this proposal! I believe we're going to address this eventually anyway (at least for a few other popular languages), but with a proper proposal that hopefully does well in voting, it will make it much easier to prioritize, acquire funding if necessary, and so forth. If it means anything to voters, the system that powers Who Wrote That? is WikiWho. The algorithm works amazingly well, but it's very costly as it essentially processes and stores data on every single mainspace revision (i.e. the full history of pages). I think adding many of the popular languages won't be a problem. Doing every single wiki (except Commons/Wikidata) is probably not going to happen anytime soon. I think we'd need to first revise the architecture, do a proper production deployment, and go from there. The storage footprint is currently just too great (for context, the combined size of the currently supported languages is about 3.8TB). It would probably need a dedicated team working on it for a year or more. MusikAnimal (WMF) (talk) 03:29, 6 February 2023 (UTC)[reply]
    It would be very nice to have docker/script image or something which user could just git clone from repository and it would download backup dump of the selected wiki from dumps.wikimedia.org, process it and then download and process new revisions using API to keep it sync to latest version. This would allow hackers from different language versions to test and dev it locally (and and run their own annotation servers if there is more wide interest) Zache (talk) 05:34, 19 February 2023 (UTC)[reply]

Voting