Diff Tool Knows What You Mean

September 10, 2022

We will admit to not being particularly artistic, but we do remember an art teacher telling us that sometimes it is better to draw what isn’t there instead of what’s there — a concept known as negative space. [Wilfred] makes a similar point when explaining his “fantastic diff” tool called, appropriately, difftastic. He points out that when comparing two programs, the goal isn’t so much to determine what changed, but rather what stayed the same. The more you can identify as the same, the less you have to show as a change.

The tool compares source code in a smart way, assisted by tree-sitter which has many different languages already parsed, at least well enough for this purpose. According to [Wilfred’s] post the tool supports 44 different languages ranging from bash and YAML, Verilog to VHDL, and C++ to Rust, among others.

Of course, the tool by itself is worth taking note of. But the real gems in the write-up are things like tree-sitter and a lucid description of the algorithm (borrowed from autochrome) for working out the minimal set of changes.

The code is still under development and the output is not always as clear as he would like. Still, a pretty good tool and a great write-up on the development challenges.

Although Verilog and VHDL are a start, we really want diff for schematics. Oh, and PCB layouts, don’t forget those,e either.

20 thoughts on “Diff Tool Knows What You Mean”

None says:

September 10, 2022 at 2:31 am

Interesting topic. The logical approach is nice, though the visual presentation (and color choice) of WinDiff seems a lot easier on the eyes. The dark background and colorful text is straining to the eyes. It’s much nicer to have thick colored bars of somewhat pastel colors as background color, with neutral text color.

Good color contrast, yet not overwhelming. I find people often struggle with the choice of easy to distinguish color and good design, also in scientific publication. A real issue.

Visual diff is something I will try to use more, lots of things really fail not based on the tools, but based on the time you have to get used to them and integrate them in your workflow (getting around bugs etc. and getting used to it).

After reading the article:
If you have a parser that understands the semantics of the sourcecode, you can reorganize it as well and pretty print it to make comparison easier. To make a proper structural diff tool that respects the intent as well as the original formatting is quite complex, though.

It’s also a shame Pascal/Delphi is not supported by those libraries.

Reply
1. MrSVCD says:
  
  September 10, 2022 at 3:09 am
  
  Difftastic is command line tool so you can, probably, I am not at my computer to test, change the colours and fonts in your terminal to something you like.
  
  Reply
  1. None says:
    
    September 10, 2022 at 7:59 pm
    
    Good default colors matter for accessibility, since it’s a pain to change it everywhere. This is not a matter of taste, but usability.
    
    Also people will take screenshots for articles or books or tutorials, where the colors cannot be selected.
    
    Reply
    1. Twisty Plastic says:
      
      September 12, 2022 at 9:18 am
      
      “Good default colors matter for accessibility” – In an OS or OS Distribution.
      
      “it’s a pain to change it everywhere” – Indeed! This is why to the fullest extent possible every terminal or gui application should follow the terminal or desktop environment’s color scheme settings respectively. If you are thinking about this much at the application level you are doing it wrong.
      
      Don’t ignore the user’s system-wide preferences to force upon them the theme that someone else finds most pleasant or that someone who is differently sighted finds necessary. They will have their own preferences on their own computers.
      
      “Also people will take screenshots for articles or books or tutorials, where the colors cannot be selected.” – If one wants to post or publish pictures of their screen and they are concerned that their readers might not like or have the ability to see their eclectic color choices.. Well that’s up to that person to decide their priorities and set their theme accordingly. This is NOT the application designer’s problem.
      
      Reply
      1. Twisty Plastic says:
        
        September 12, 2022 at 9:19 am
        
        If you need more information than is in the system-theme, such as additional colors for color coding then consider calculating complimentary colors of the various colors that are part of the theme to automatically create something that goes well with it.
2. Anton says:
  
  September 10, 2022 at 4:23 am
  
  This is a console application, so background choice is up to the user. For that matter so are the colors, assuming the program just uses the basic 16 colors (I’m not sure about this)
  
  Reply
  1. None says:
    
    September 10, 2022 at 8:01 pm
    
    See above. Good usability by default matters, and in a diff tool, colors are really a primary issue, and should offer good default usability.
    
    Reply
doppler says:

September 10, 2022 at 2:56 am

Another good one. For window machines. (likely orphan program now) is: http://aptdiff.findmysoft.com/
Sorry only safe place I could find a link.

Reply
Gravis says:

September 10, 2022 at 4:33 am

I love the idea behind the tool but the implementation of the tool itself is lacking as it goes through some unnecessary steps as a result of using tree-sitter. I’m sure it’s fine for minor comparisons but with project-wide parsing the delay will be quite noticeable.

Reply
Greg A says:

September 10, 2022 at 7:58 am

i guess i prefer braindead tools, this seems way overwrought to me. the problem i have with diff in real life is that sometimes for large changesets it keys on the wrong unchanged parts and winds up presenting an indecipherable mess, showing the changes for a function that changed a lot mixed in with code from a function that didn’t change at all. i could imagine this doing better at that but i also could easily imagine it being much harder to figure out what it did when it screws the pooch. real changes are a wild world!

but i have a story.

a year or two ago, i had a diff that took too long to complete. it was big files but still, i couldn’t believe diff was being janky at me. the thing i love about diff’s simplicity is that it always just works, it doesn’t get into O(n^2) on magic input.

so i did “apt update && apt install diff” and the new version didn’t have that problem because “diff going O(n^2)” really isn’t any more acceptable to the diff maintainers than it is to me. and everyone lived happily ever after.

Reply
1. Bevan says:
  
  September 10, 2022 at 5:32 pm
  
  I’m intrigued as to how a diff operation could be anything less than O(n^2).
  For each character (shortest substring) in src1 you need to eventually compare it against each character in src2, hence n*n comparisons.
  Even if you use vectorisation that doesn’t give you an nx speedup.
  
  Heck, even if you only diff a hash of each line, that’s still 2n for the raw hash generation, and then n*n for the line by line hash compares.
  
  But perhaps you mean it should not become O(n^3)
  
  Reply
  1. sweethack says:
    
    September 11, 2022 at 10:01 am
    
    You can run a rolling checksum on both file. They will automatically synchronize on similar content, so you’re left with the windows where there’s a difference. In these windows, you can run a standard diff algorithm so you don’t have a O(N^2) for the whole document, but only for the changed part (which are usually sparse). The whole algorithm run in O(n log n) time.
    
    Reply
  2. Greg A says:
    
    September 11, 2022 at 1:25 pm
    
    you don’t need to eventually compare every char in src1 with every char in src2. if diff could really understand re-ordering, i think it might devolve to that. but since it assumes that src1 and src2 are in the same order (re-orderings are represented as insertion here and deletion there), once it finds a matching sequence, it doesn’t ever compare things before it with things after it. on top of that, like sweethack said, you can use various hash/checksum approaches to compare larger blocks at once.
    
    i don’t know what the actual final complexity of the algorithm is but i use diff a lot, and sometimes on very large files, and it is not typically O(n^2). and when it was, it really was just for a short moment because of a novel feature that they hadn’t finished refining.
    
    Reply
galah says:

September 10, 2022 at 6:03 pm

When I need a schematic diff, I generate a netlist and diff that. Well, 2 netlists from the 2 schematic revisions.
Sometimes a bit of netlist reformatting helps too.

Reply
yury says:

September 11, 2022 at 2:07 pm

Thanks! Hearing about it first time but I was waiting for it 10 years… such an obvious thing for everyone who resolving merge conflict.

Reply
Jay says:

September 12, 2022 at 12:23 pm

Could resists ;)
return comparator(target, value);

Reply
1. rldkfl says:
  
  September 13, 2022 at 1:51 pm
  
  better resist next time…
  
  Reply
Yushuo Hwang says:

September 14, 2022 at 7:43 am

“from bash and YAML, Verilog to VHDL”. Does it support Verilog? I see no verilog in tree-sitter dir.

Reply
1. Al Williams says:
  
  September 14, 2022 at 7:47 am
  
  https://github.com/tree-sitter/tree-sitter-verilog
  
  Reply
  1. Yushuo Hwang says:
    
    September 14, 2022 at 9:07 am
    
    I have downloaded tree-sitter-linux-x64.gz . I have `npm i tree-sitter-verilog` . What should I do next, follow the “http://tree-sitter.github.io/tree-sitter/creating-parsers” ? I am a little bit lost.
    
    Reply

Hackaday

Diff Tool Knows What You Mean

20 thoughts on “Diff Tool Knows What You Mean”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Ask Hackaday: How Much Compute Is Enough?

WheatForce: Learning From CPU Architecture Mistakes

Improving FDM Filament Drying With A Spot Of Vacuum

Spy Tech: Conflicts Bring A New Number Station

The Most Secure, Modern Computer Might Be A Mac

Our Columns

Re-Learning How To Run

Hackaday Podcast Episode 364: Clocks, Cameras, And Free Will

This Week In Security: The Supply Chain Has Problems

Sega Meganet: Online Gaming In 1990

Ask Hackaday: Using CoPilot? Are You Entertained?

20 thoughts on “Diff Tool Knows What You Mean”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns