Interesting: at first blush, it looks like it's clustered based on image similarity rather than time?
But that must be wrong, the README.md mentions it visualizing time.
I'd love to understand a bit more.
The README.md punts to Wikipedia on Hilbert curves, which is classic Wikipedia, makes sense if you understand it already :) and a 20 minute video on Hilbert curves, which I find hard to commit to, assuming its unlikely it touches on movie visualization via Hilbert curves.
It's definitely hard, and not your responsibility, to explain the scientific concept.
But, I'd love to have your understanding of how the visualization is more interesting.
A Hilbert curve is a mapping between 1D and 2D space that attempts to preserve locality. Two points that are close in 2D space tend to map to two points that are close in 1D space and vice versa.
If you imagine a movie as a line along the time axis with each frame as a pixel, there are multiple way to create an 2D image.
Bargraph is simple approach, but essentially it is still is a one dimensional. We are only using x axis.
Zig-zag pattern is another approach, where you start from top to bottom, left to right. But in this case the relative distance between close frames aren't fully preserved. Two distant frames might appear together, or close frames might end up far apart, which leads to odd looking artifacts.
Hilbert curve is a pattern to map 1D to fill space (2D) such that relative distance between any two points (frames) on the 1D line is somewhat preserved. That's why it appears as clump/blob.
Here it is hard to see the movie progression from start to end but all frames from a scene are always closer, which was what I was aiming. I find it interesting that visual aspect (color/scene) is easy to see here but temporal aspect isn't.
I was excited about the whole 1D to 2D mapping aspect at that time, leading to this toy.
This is really cool. I'd love to see this done for the movie "Lola rennt" (Run Lola Run) which is studied for it's color symbolism throughout the film.
This is using the linked python code - thanks for sharing, OP! - I didn't look into the details but it takes an incredibly long time to save .png files for the source frames despite only selecting 6,400 of them.
Reminds me of the time I applied this technique to basically a webcam stream of the northern night sky. You could immediately see if there were northern lights that night (and when) without having to scrub through the footage. I bet there are other use cases that haven't been explored yet.
This is quite nice. Not sure what the meaning is of a circle versus, say, a linear strip, but it’s very effective for showing the dominant colors over time. I’d love to generally see this for many movies across time; my understanding is most are color graded green/yellow now and it’d be nice to visually see this evolution.
I think it's something people keep rediscovering. It's a pretty fun programming problem that lets you explore lots of different domains at the same time (video processing, color theory, different coordinate systems for visualizing things) and you get a tangible "cool" piece of art at the end of your effort.
I built one of these back in the day. Part of the fun was seeing how fast I could make the pipeline. Once I realized that FFMPEG could read arbitrary byte ranges directly from S3, I went full ham into throwing machines at the problem. I could crunch through a 4 hour movie in a few seconds by distributing the scene extraction over an army lambdas (while staying in the free tier!). Ditto for color extraction and presentation. Lots of fun was had.
I have a cli tool I maintain that finds visually similar images.
As a fun experiment several years ago I extracted all the frames of Skyfall and all the frames of the first Harry Potter movie.
I then reconstructed Harry Potter frame by frame using the corresponding frame from Skyfall that was most visually similar.
The end result was far more indecipherable than I'd ever expected. The much darker color pallet of Harry Potter lead to the final result largely using frames from a single dark scene in Skyfall, with single frames often being used over and over. It was pretty disappointing given it took hours and hours to process.
Thinking about it now, there's probably a way to compensate for this. Some sort of overall pallette compensation.
I imagine the best case outcome would look something like Jack Gallant's 2011 work on visual mind reading, where they trained a model on brain activity watching hundreds of hours of YouTube and then attempted to reconstruct a view of video not in the training set by correlating the new brain activity with frames from the input... Sorry I'm not explaining very clearly but there's a YouTube video of the result naturally :)
That's an interesting idea. I wonder how well the film iris/barcodes could be used to figure out which movies make the best 'palette' to recreate a given scene.
If you had a much bigger corpus, and used some semantics-aware similarity metrics (think embeddings), you could maybe end up with something actually coherent
Probably want a `-vf fps=1/n` in there (where n = number of seconds) to trim the film down to a manageable number of frames.
(you'd ideally do some "clever" processing to combine a number of frames into a single colour strip but that's obviously more faff than just a simple `ffmpeg` call...)
You could also use something like https://github.com/Breakthrough/PySceneDetect to first split the video into camera shots, and then grab a single (or average) frame per shot, leading to a cleaner result.
I have imagemagick installed, it no longer has a 'magick' command you run montage etc. directly. the problem is globbing that huge amount of files, so I'm wondering if you've tested the commands.
I'm not sure the actors are mumbling their words. I think the issue stems from the way sound is decoded on people's devices. Movies used to be mixed down to stereo and TVs have no problem producing the sound. Now that it's possible to stream, say 7.2, the TV will tell the service to send something like that but the TV does a poor job of outputting the center vocal channel through it's two rear $3 speakers so you can hear it. The same audio track decoded with a nice cinema amp and two center speakers will generally be really clear. (Also some TVs have a setting to boost the vocals to try and fix this)
Again, with the dark movies. They seem to be color graded and encoded for HDR screens, and most HDR screens are really a fake type of HDR and so the movies just come out super dark. They look a lot less horrible on a really great high-end TV in a totally dark room.
https://github.com/akash-akya/hilbert-montage
Not an original idea, it was inspired by someone else at that time.
But that must be wrong, the README.md mentions it visualizing time.
I'd love to understand a bit more.
The README.md punts to Wikipedia on Hilbert curves, which is classic Wikipedia, makes sense if you understand it already :) and a 20 minute video on Hilbert curves, which I find hard to commit to, assuming its unlikely it touches on movie visualization via Hilbert curves.
It's definitely hard, and not your responsibility, to explain the scientific concept.
But, I'd love to have your understanding of how the visualization is more interesting.
If you imagine a movie as a line along the time axis with each frame as a pixel, there are multiple way to create an 2D image.
Bargraph is simple approach, but essentially it is still is a one dimensional. We are only using x axis.
Zig-zag pattern is another approach, where you start from top to bottom, left to right. But in this case the relative distance between close frames aren't fully preserved. Two distant frames might appear together, or close frames might end up far apart, which leads to odd looking artifacts.
Hilbert curve is a pattern to map 1D to fill space (2D) such that relative distance between any two points (frames) on the 1D line is somewhat preserved. That's why it appears as clump/blob.
Here it is hard to see the movie progression from start to end but all frames from a scene are always closer, which was what I was aiming. I find it interesting that visual aspect (color/scene) is easy to see here but temporal aspect isn't.
I was excited about the whole 1D to 2D mapping aspect at that time, leading to this toy.
https://binvis.io/#/view/examples/elf-Linux-ARMv7-ls.bin
https://brendandawes.com/projects/cinemaredux
This is using the linked python code - thanks for sharing, OP! - I didn't look into the details but it takes an incredibly long time to save .png files for the source frames despite only selecting 6,400 of them.
I built one of these back in the day. Part of the fun was seeing how fast I could make the pipeline. Once I realized that FFMPEG could read arbitrary byte ranges directly from S3, I went full ham into throwing machines at the problem. I could crunch through a 4 hour movie in a few seconds by distributing the scene extraction over an army lambdas (while staying in the free tier!). Ditto for color extraction and presentation. Lots of fun was had.
As a fun experiment several years ago I extracted all the frames of Skyfall and all the frames of the first Harry Potter movie.
I then reconstructed Harry Potter frame by frame using the corresponding frame from Skyfall that was most visually similar.
The end result was far more indecipherable than I'd ever expected. The much darker color pallet of Harry Potter lead to the final result largely using frames from a single dark scene in Skyfall, with single frames often being used over and over. It was pretty disappointing given it took hours and hours to process.
Thinking about it now, there's probably a way to compensate for this. Some sort of overall pallette compensation.
https://youtu.be/nsjDnYxJ0bo
Learned about this from Mary Lou Jepsens 2013 TED talk, what a throwback
https://www.ted.com/talks/mary_lou_jepsen_could_future_devic...
I spent some time earlier this year on creating mosaics of movie posters using other posters as tiles: https://joshmosier.com/posts/movie-posters/full-res.jpg (warning: 20mb file) Using this on each frame of a scene gave some good results with a fine enough grid even with no repeating tiles: https://youtu.be/GVHPi-FrDY4
"Red," "White," and "Blue" is a trilogy of French films made by Polish-born filmmaker Krzysztof Kieslowski. Each movie follows the color pallet.
Modern digital movies are way too sharp, in a bad way.
(you'd ideally do some "clever" processing to combine a number of frames into a single colour strip but that's obviously more faff than just a simple `ffmpeg` call...)
[0] e.g. https://stackoverflow.com/questions/35675529/using-ffmpeg-ho...
>bash: /usr/bin/montage: Argument list too long
with 114394 files for a 1h20m film.
>montage
>Version: ImageMagick 6.9.11-60 Q16 x86_64 2021-01-25
That and actors that mumble their words...
Again, with the dark movies. They seem to be color graded and encoded for HDR screens, and most HDR screens are really a fake type of HDR and so the movies just come out super dark. They look a lot less horrible on a really great high-end TV in a totally dark room.