Sometimes a problem seems hard, but the right insight can make it easy. If you were asked to write a program to compare two PDF files and show the differences, how hard do you think that would be? If you are [serhack], you’ll make it much easier than you might guess.
Of course, sometimes making something simple depends on making simplifying assumptions. If you are expecting a “diff-like” utility that shows insertion and deletions, that’s not what’s going on here. Instead, you’ll see an image of the PDF with changes highlighted with a red box. This is easy because the program uses available utilities to render the PDFs as images and then simply compares pixels in the resulting images, drawing red boxes over the parts that don’t match.
Obviously, this is best for PDFs that just have a few changes. Inserting a paragraph, for example, makes the output pretty useless. For that, you might consider extracting the text from the PDF using something like pdf2text (which uses the same underlying library this uses to generate images).
The program thows a lot of messages about missing files but seems to do the job anyway. Here is the result of comparing two versions of the Hackaday home page captured to PDF a few minutes apart:
You can see, though, that if a new article was posted and everything slid down by one, you’d have nothing but a giant red block.
It is still a clever idea. There are surprisingly few tools out there for this, although we did find a few others. There are, of course, plenty of Linux tools for manipulating PDFs. Many of them are mashups of other tools like this one is.
I’ve been using DiffPDF 2.1.3.1 by Mark Summerfield (open source, and prepackaged for Debian, Ubuntu, and others). Not perfect, but pretty good for tracking down changes in a huge, multi-page PDFs.
One of the problems these tools have in general is that as text changes early in a document, it may change where the page breaks, which shows up as a diff, which snowballs into a larger change at the next page break, and so on. I’ve heard one can sidestep some of that by converting PDFs into infinitely long single pages before comparing them, but I wish that were handled internally in the diff tool.
I’ve handled this differently for the OpenXR spec, which I have set up a PDF diff internally. I have a Python script split the PDF into sections before diffing it, which does help reduce the cascading diff problem.
When I have two copies of the same page and want to ensure that there are no changes, I simply put them together and hold them against the light. Any difference will stand out. Printing singled side helps of course.
Open the pages side by side and cross your eyes. Much faster.
This is the secret right here. Works instantly for both textual and graphical elements, on any platform, no software required. It’s like having a superpower.
Open both pdfs and switch repeatedly with Alt+Tab, the differences will blink; go to the next block with the key for “one screen height down” on both pdfs, this keeps the alignment. Much easier :)
After an insertion or deletion adjust the following text with the scroll bar and proceed. May get tiring for long pdfs with different page breaks due to an insertion at the beginning, and may not work if the line width was changed. It helps to disable any window-switching animations for this.
Okay, so if the page alignment is slightly different, does it just redbox the entire thing?
Here is yet another tool for comparing multipage PDFs graphically. Generally is good for making sure there were no unintended changes between PDFs (i.e. drawing revisions). But it is true that if things change pages or shift tools like this are less useful.
https://www.scorchworks.com/Drawingdiff/drawingdiff.html
I wrote a script which launches the specified pair of files (any type – the associated app will open) at full screen. It then toggles between them every second. This makes it easy to spot differences on the viewed page. You then scroll each down as required. It’s not brilliant but it does work with any app which opens each document in a new window.
Hey! I’m the author of the tool! Thanks for the article, I suppressed the errors due to “missing file”.
Amazing stuff, this exactly I am looking for..
Fine tool, but does not detect changes in comments.