Compare PDFs Visually

August 6, 2022

Sometimes a problem seems hard, but the right insight can make it easy. If you were asked to write a program to compare two PDF files and show the differences, how hard do you think that would be? If you are [serhack], you’ll make it much easier than you might guess.

Of course, sometimes making something simple depends on making simplifying assumptions. If you are expecting a “diff-like” utility that shows insertion and deletions, that’s not what’s going on here. Instead, you’ll see an image of the PDF with changes highlighted with a red box. This is easy because the program uses available utilities to render the PDFs as images and then simply compares pixels in the resulting images, drawing red boxes over the parts that don’t match.

Obviously, this is best for PDFs that just have a few changes. Inserting a paragraph, for example, makes the output pretty useless. For that, you might consider extracting the text from the PDF using something like pdf2text (which uses the same underlying library this uses to generate images).

The program thows a lot of messages about missing files but seems to do the job anyway. Here is the result of comparing two versions of the Hackaday home page captured to PDF a few minutes apart:

You can see, though, that if a new article was posted and everything slid down by one, you’d have nothing but a giant red block.

It is still a clever idea. There are surprisingly few tools out there for this, although we did find a few others. There are, of course, plenty of Linux tools for manipulating PDFs. Many of them are mashups of other tools like this one is.

12 thoughts on “Compare PDFs Visually”

Greg Chabala says:

August 6, 2022 at 10:08 pm

I’ve been using DiffPDF 2.1.3.1 by Mark Summerfield (open source, and prepackaged for Debian, Ubuntu, and others). Not perfect, but pretty good for tracking down changes in a huge, multi-page PDFs.

One of the problems these tools have in general is that as text changes early in a document, it may change where the page breaks, which shows up as a diff, which snowballs into a larger change at the next page break, and so on. I’ve heard one can sidestep some of that by converting PDFs into infinitely long single pages before comparing them, but I wish that were handled internally in the diff tool.

Report comment

Reply
1. rpavlik says:
  
  August 7, 2022 at 5:14 am
  
  I’ve handled this differently for the OpenXR spec, which I have set up a PDF diff internally. I have a Python script split the PDF into sections before diffing it, which does help reduce the cascading diff problem.
  
  Report comment
  
  Reply
Henk says:

August 7, 2022 at 12:21 am

When I have two copies of the same page and want to ensure that there are no changes, I simply put them together and hold them against the light. Any difference will stand out. Printing singled side helps of course.

Report comment

Reply
1. Dude says:
  
  August 7, 2022 at 3:17 am
  
  Open the pages side by side and cross your eyes. Much faster.
  
  Report comment
  
  Reply
  1. reboots says:
    
    August 7, 2022 at 5:56 am
    
    This is the secret right here. Works instantly for both textual and graphical elements, on any platform, no software required. It’s like having a superpower.
    
    Report comment
    
    Reply
  2. Matthias says:
    
    August 7, 2022 at 5:58 am
    
    Open both pdfs and switch repeatedly with Alt+Tab, the differences will blink; go to the next block with the key for “one screen height down” on both pdfs, this keeps the alignment. Much easier :)
    After an insertion or deletion adjust the following text with the scroll bar and proceed. May get tiring for long pdfs with different page breaks due to an insertion at the beginning, and may not work if the line width was changed. It helps to disable any window-switching animations for this.
    
    Report comment
    
    Reply
Dude says:

August 7, 2022 at 3:17 am

Okay, so if the page alignment is slightly different, does it just redbox the entire thing?

Report comment

Reply
Scorch says:

August 7, 2022 at 7:55 am

Here is yet another tool for comparing multipage PDFs graphically. Generally is good for making sure there were no unintended changes between PDFs (i.e. drawing revisions). But it is true that if things change pages or shift tools like this are less useful.

https://www.scorchworks.com/Drawingdiff/drawingdiff.html

Report comment

Reply
Phill Rogers says:

August 7, 2022 at 3:42 pm

I wrote a script which launches the specified pair of files (any type – the associated app will open) at full screen. It then toggles between them every second. This makes it easy to spot differences on the viewed page. You then scroll each down as required. It’s not brilliant but it does work with any app which opens each document in a new window.

Report comment

Reply
SerHack says:

August 8, 2022 at 12:29 am

Hey! I’m the author of the tool! Thanks for the article, I suppressed the errors due to “missing file”.

Report comment

Reply
Ravi says:

October 19, 2022 at 2:42 am

Amazing stuff, this exactly I am looking for..

Report comment

Reply
Thomas Bohn says:

June 16, 2023 at 12:45 am

Fine tool, but does not detect changes in comments.

Report comment

Reply