I have been using this in a CI pipeline to maintain a business-critical PDF generation (healthcare) app (started circa 2010 I think), here is the RSpec helpers I'm using:
We've been using this in the Micro:bit Educational Foundation (microbit.org) to fill a gap in hardware design tooling, and get visual diffs of our schematics and gerbers during PCB design iterations. It's kinda wild that's what we ended up doing, but if you want to be sure your radio layout didn't change at all when you're making a minor revision to a different part of the board, visual diffs are perfect.
That said, next project we want to try something more integrated with EDA tools. If anyone else has followed this path, we'd love to know.
We use this tool in our team regularly for comparison of PDFs we obtain from third party services that might have changed after code-changes on our side. Big thanks to the author <3
I noticed this a while back with a private project of mine. The Github languages breakdown seems broken. Mine is a Python project with a handful of Jupyter notebooks but many many python files. The LOC must be 80% python files but Github sees the project as 50% Jupyter.
Crazy, I'd have thought that modern multi-modal LLMs can do this, but when I tried Gemini, ChatGPT-4o and Claude they all pooped out:
- Gemini at first only diff'd the text, and then when pushed it identified the items in the images and then hallucinated the differences between the versions. It could not produce an image output.
- Claude only diff'd the text and refused to believe that there images in the PDFs.
- ChatGPT attempted to write and execute python code for this, which errored out.
This is definitely not a strength for multi-modal LLM. Multi-modal capabilities are still too flaky especially when looking at a page of a PDF which can have multiple areas of focus.
https://gist.github.com/thbar/d1ce2afef68bf6089aeae8d9ddc05d...
The code contains git-stored reference PDFs, and the test suite re-generate them and assert that nothing has changed.
Helped a lot to audit visual changes, or PDF library upgrades!
It shows the differences in the GUI side-by-side instead of overlayed.
That said, next project we want to try something more integrated with EDA tools. If anyone else has followed this path, we'd love to know.
Whole article is worth reading, but if you want the relevant bits search for “ I wrote a Dart script that would take a PDF of the book”.
- Gemini at first only diff'd the text, and then when pushed it identified the items in the images and then hallucinated the differences between the versions. It could not produce an image output.
- Claude only diff'd the text and refused to believe that there images in the PDFs.
- ChatGPT attempted to write and execute python code for this, which errored out.