Community Tech/Improved diff comparisons

Tracked in Phabricator:
Task T121469 resolved
Tracked in Phabricator:
Task T128697 resolved

The Improved diff compare screen project aims to help contributors understand a diff in situations where it's impossible to tell what changed from the current design.



Definition of problem: Diffs are difficult to process when it's difficult to compare what changed. Just changing whitespace or moving a paragraph breaks the matching.



This is a slightly less technical overview. For more details, please see our meeting notes and the Phabricator task.

Sept 12


MaxSem (WMF)'s solution for the bad diff from the original proposal is live, fixing the issue.

The problem with that diff was that a small change was made to a very long paragraph, and the diff engine had a limit of 10,000 bytes before it would give up and mark the whole paragraph as changed. This was set as a performance limit, to keep the process of generating the diff from overloading the system. In the new version, the diff engine is estimating the complexity of the diff based on the number of words changed, not on the overall size of the paragraph.

Here's another example of a nightmare diff from Russian Wikisource that's easy to read now, thanks to this new change.

It's not likely that we'll be able to make any other improvements to the diff view this year. Now that the original issue is fixed, we're going to look at the other examples collected on Community Tech/Improved diffs to see if there are any other possibilities. But changing diffs is a high-stress venture -- Max's fix took six months of discussion and testing before it could be released.

One thing that we've learned from this first year of Community Wishlist work is that we need to be more careful about determining the scope of the proposal before people vote on them. This wish started with one discrete example that could be fixed, but the concept of "improve the diff compare screen" is much larger, and could include a complete re-imagining of how diffs are displayed. It's hard to tell afterwards exactly what people were voting for, and figure out whether we've successfully completed that request. In that broader sense, it's pretty much guaranteed that we'll fail, because there will always be more possible improvements. We're going to conduct a second Community Wishlist Survey in November-December 2016, and we're going to focus a lot more on making sure that the proposals are clearly scoped before people vote on them. We're expecting to see some more diff comparison problems as proposals for the next survey.

August 3


The WMDE Technical Wishes team is investigating a way to improve the problem of an edit that moves a paragraph from one place to another and changing a word in the paragraph. There's interesting (somewhat technical) discussion on Phabricator ticket T138922.

Hackathon, April 3


Jon Robson has spent the Hackathon working on improved diffs, using a "unified" view that shows both drafts in the same block of text, similar to the way that diffs look on mobile. He's also working on a way to represent moving a chunk of text from one part of the page to another. You can follow the development on Phabricator ticket T121469. After the Hackathon, we'll test Jon's new draft out on a test wiki, and invite everyone in the community to try it and discuss.

March 7


Interesting discussion on T128697: Wikisource wikis are hit by the paragraph size limit harder than the Wikipedias, because they can't insert arbitrary lines to make new paragraphs -- they have to preserve the formatting of the source. Languages with non-Latin characters (Russian, CJK) are also hit harder, because individual characters take up more bytes.

March 4, 2016


MaxSem (WMF) has identified the problem that causes the bad diff from the original proposal. There's a performance-related limit based on paragraph size -- if the paragraph is bigger than 10,000 bytes, then the diff won't show the words that were changed. But that performance limit was set many years ago, and there may be ways to increase the limit without hurting performance. Max is currently investigating this on this ticket: T128697.

Technical discussion and background


Currently collecting examples and screenshots on Community Tech/Improved diffs.

Internal Community Tech team assessment

Support: Very high. There was unanimous support on the survey, although a couple of people wondered if this was too big for our team to handle. (They might be right, we'll see.) That being said -- the proposal itself was very vague, and there are a lot of different ways to define what needs to be improved.
Impact: Potentially high. This tool is commonly used by editors, and improvements could help many people’s workflows. However, the actual impact won't be clear until there's a concrete proposal.
Feasibility: TBD. We need to analyze what works in current diffs, and what the problems are that people would like us to solve. Some improvements may be easy while others would be major projects.
Risk: High. This wish is not well scoped yet, and there are no firm acceptance criteria. This will need considerable research/design research work, and the CT team does not have these capabilities. Changes to the diff page will also require significant community discussion to approve changes.
Status: We really want to work on this. It's a huge, important problem, and there's no other current WMF team that would take this on. We'll have to really dive into defining the problem, including lots of input from active contributors. Wikimedia Deutschland's TCB team also has "Show changes to the section text in a move" on their list. (link in German) We can work together. We'll focus on this in the spring.

See also