Here's a stab at some lossy compression of the Bad Apple animation-- clocking in at 25 seconds long without even touching the sprite sheet.
The compression algorithm finds the best possible dictionary of 32 different 8x8 tiles with which to represent all of the frames of the animation. Then for each 8x8 chunk of each frame that is being compressed, I find the closest possible match out of this dictionary and record the index of this chunk. Finally, I use run length encoding to compress the string of dictionary indexes for each frame. (That's the idea at least...)
That gets things down to around 54 bytes per frame while still being recognizable.
This is inspired by gamax92 (https://www.lexaloffle.com/bbs/?tid=3263) and this C64 version (https://www.youtube.com/watch?v=OsDy-4L6-tQ)
-ElectricGryphon
Really cool! That animation looks excellent. How are you choosing tiles -- a straight-up greedy approach, something optimization-based, or another approach entirely?
It seems like your objective only looks at pixel differences. I understand why, but it seems to result in a lot of high-frequency error, in particular at sprite boundaries. Is there a simple error term that would penalize introducing sharp corners not present in the original animation?
Firstly, wow! Congrats on achieving this so far with the PICO-8.
Secondly, HOW THE HECK was this a 4 MIN demo on the C64?!?! It's INCREDIBLE!!
I didn't know this demo existed, so thanks for bringing it to my attention.
All the best! :D
Reminds me of that Amiga demo with the dancers - State of the Art :D
original touhou video: https://www.youtube.com/watch?v=UkgK8eUdpAo
the compression happens outside pico-8 right?
Yep. I have a python script that runs the compression. I can share this if there is interest.
Yeah, I'd love to see your compression script. I also took a shot at compression (using some very heuristic optimization) and think my results ended up looking noticeably worse. In particular, my dictionary sprites are quite a bit more "geometric" -- basically capturing a few different angles and offsets of edges -- across a range of different optimization parameters (quantization of input images, L1 vs L2 error, etc.). Your sprites have more curved shapes.
My optimization technique was, roughly, to cluster tiles into sprites, then iterate optimizing over sprite assignments and sprite contents until convergence. Then I'd take a pass looking for sprites with low utility (basically, sprites that could be deleted without increasing the error much) and replace them, stopping once no sprite could easily be deleted.
A few things I didn't try:
- injecting random elements into the iterative sprite contents/assignment optimization to avoid local minima
- diffusing error over multiple pixels (wouldn't easily work over tile boundaries, but could work within tiles)
- transforming input images into a linear color space before optimizing
Is there a good way to find a truly optimal dictionary without a license to a commercial MILP/MIQP solver?
For example (these are pretty similar):
L1 error: https://i.imgur.com/sN7tPmg.gif
L2 error: https://i.imgur.com/EwDnU4W.gif
Here's a link to the compression script:
https://github.com/electricgryphon/pico_dictionary_compress
The script gathers all of the unique 8x8 blocks of pixels from all of the images being compressed. Then it calculates a score for each block based on how close it is to all of the 8x8 tiles in all of the frames. (For example, all white is close to a lot more tiles in the image than a small circle would be.)
I sort the 8x8 blocks based on this score and throw out all but the 32 highest scoring blocks. This creates the "dictionary" for the compression.
Another way to optimize this would be to alter the dictionary over the course of the animation so that it always best represented the current region of the animation. For example, a dictionary size of 32 can do a spot on compression of the first 10 frames (which share a number of features), but gets less accurate when we try to cover longer time periods.
Thanks for posting that! I'm also basically doing k-means, but with two major differences:
- my code keeps sprites as integers throughout, yours rounds sprites at the end
- my code tries to replace small/low-value clusters with "better" ones
When I eliminate these two things from my code the numerical error gets slightly worse, but the visual quality, to my eye, gets a bit better (and looks qualitatively more like your result).
Why exactly are you reducing the VQ table to 32 entries? Is it just because you're storing exactly 5 bits per tile? Or does it let you get longer runs in areas that are, e.g., almost solid colors?
(I assume it's not because the table itself gets too large, since I also assume it's only 8 bytes per tile.)
32 characters is the largest power of 2 that also can be encoded using only alphanumerics without needing to use punctuation, as there's only 26 + 26 + 10 = 62 characters.
So it's common when having to string-encode things to base them around 5-bit numbers.
Is it possible to use a similar method for color video compression?
Why would you be reluctant to use anything other than alphanumeric? At the very least, @ precedes 'A' in ascii and backtick precedes 'a', so you could just extend your input/output range down by one value. Seems like a pretty small change for the reward of getting double the table size.
(I suppose @ precedes 'a' in PICO-8 ascii, but you get the idea. Just the other way around on PICO.)
Pico8 compresses the code space with some sort of lzip compression. In the past, I didn't find much improvement in the compressed code size by switching from hex to half ascci perhaps because this "fights" the built in compression algorithm. That said, I haven't tried it lately, or run a more careful study, and this data could behave differently. It's definitely worth trying.
For what it's worth, zep actually listed the characters that are good to use because they're most easily represented by the PICO-8 compressor. Lemme see if I can find the post... ah, yes:
Quoth zep: For future reference, you can use the following string for base-64 encoding. The compression favours the first 58 characters here and stores them as a single byte each when they aren't stored as part of a larger repeating sequence.
|
[Please log in to post a comment]