UPDATE!
PX8 has been replaced by PX9: https://www.lexaloffle.com/bbs/?tid=34058
But I'll leave this here for reference, and for existing projects using PX8.
This is a library mostly for compressing graphics and maps, but can also be adapted to compress sfx. It is designed for data-heavy carts and requires around 450 tokens for decompression, although this can be reduced if needed by removing remap(), hard-coding parameters, and/or removing predicted spans at the cost of compression performance. If someone wants a smaller/weaker version, let me know!
To use it, compress a 2D rectangle to an address in memory, and supply a function for fetching the source values (normally SGET or MGET). For map data, you probably want to set p.cbits to something like 4 first.
COMP(0, 0, 128, 64, 0x2000, SGET) |
0x2000 is the address of the top half of the map, so this will compress the top half of the sprite sheet (128 sprites) and write the compresed data over the map data.
DECOMP() takes the memory address of the compressed data, the top left corner of where to decompress to, and functions for getting and setting decompressed values. So to decompress this data at 0x2000 back to the screen, starting 32 pixels down:
DECOMP(0x2000, 0, 32, PGET, PSET) |
Only the decompression section of the code is needed once you have compressed data stored on a cart. A typical workflow would be to make a utility cartridge that grabs data from multiple source cartridges (using RELOAD() with a 4th parameter to indicate where to read from), and then CSTORE them to a single cartridge (again, using CSTORE()'s 4th external cart parameter).
Here's an example that stores 2-byte lengths at the start of each compressed block, to allow seeking out the start of any given gfx. I've commented the cstore and reload lines so that it will work if you paste it at the end of the main PX8 cart example:
The Algorithm
I started working on PX8 while working on PICO-8's specs, as an important question was how large a game could theoretically be for hard-core users who want to go to the trouble of compressing stuff. It became something of a brainworm, and releasing PX8 is a way to get this out of my system. I hope it is also useful to someone, or at least interesting.
PX8 is (AFAIK) a novel algorithm that appears to work well for typical PICO-8 data, and out-performs pngcrush -brute for the few 16-colour images I tested. Alternating spans of predicted and non-predicted values are stored:
-
Predicted values (colours) are calculated by maintaing a table of the last encountered matching neighbours: if a match top & left is found, that is taken to be the prediction. Otherwise, a match for only top, and then only left are considered. Failing those, the value is taken to be non-predicted.
- Non-predicted values are stored as indexes into CLIST; a list of literal values that are stored in the order they were last encountered. This means that recently encountered values have smaller indexes, and the encoding exploits this, along with the fact that each index can not possibly be for the failed prediction (in which case it would be part of a predicted span).
Each span is strictly made of either all predicted or all non-predicted values. This means that for predicted spans, no additional information needs to be stored except the length of the span itself. And conversely for unpredicted values, the index into CLIST is known to not point at the predicted value. This means that no colour index data needs to be stored for 2 colour images at all, and the compressed data is composed entirely of span lengths.
Mindblowing! THX a lot!
Glad you killed that brainworm finally. :D
Svnt
This is way past cool!
By chance, could there be a version of this that loads 4-color sprites (3 + transparency), or even 2-color sprites?
(I'm not really good with coding, especially compression and what-not.)
I was wondering if it could be modified to store maps that only use X number of tiles so that if you're not using the entire sprite space your compression could be much smaller. It would be great for a puzzle game that only uses a few tiles.
Does the prediction method outstrip doing copies from historic data by much?
I sometimes think there must be a good way to do compression with recursive markov chains that are based on previously-decompressed data. I'd need an optimal way of stringing together chains based on prediction and correction-for-misprediction, and also a way of inserting novel data. Yours is very much like this idea, except it uses fixed-length chains (3 pixels) without recursion and only builds with prediction and novel data. I guess novel data could direct the next prediction, so maybe that would stand in for correcting misprediction, hmm.
PX8 should do a decent job of compressing low-colour images or maps with few tiles, at any bit-depth. I get around 70% compression for 4-colour sprites and 80% for sparse maps like Jelpi's. i.e. you could fit around 3x as many 4-colour sprites or 5x as much map data.
@Felice
It doesn't do much better than LZ-style compression at all! Around 5~10% better for images with little repetition, and (not surprisingly) performs worse when there is something like many similar frames of animation in a spritesheet. The LZ code I tried has room for improvement too, and is potentially cheaper tokens wise. I'd like to revisit it later and make something that can be swapped out for PX8 with the same comp/decomp interface. The only real downside is compression speed, but that's not such a big deal if it's only needed when generating the release version of a cart.
I'd love to see what happens with better predictive models like MC / RMC, although it would of course put strain on the token count and memory consumption (I forgot to mention, PX8 uses half the Lua RAM (512k) on decompress in the worst case!). The other challenge as you mentioned is figuring out how to efficiently encode the predictions and mispredictions -- the current PX8 span-based method is a rather clunky way of exploiting structure in the sequence of predictions.
Argh, compression is strangely addictive.
It is, isn't it? :D
Generic compression is kind of a dry subject, actually, but when you start analyzing actual, specific data sets to find certain exploitable, predictable features, that makes it fun.
By the way, I was thinking about this just yesterday and I remembered that a friend of mine did a similar up/left history+prediction to compress maps in a snes game. I was busy with compressing single sprites at the time (a very unrewarding endeavor), so I didn't ever get more than a bird's eye view of the map compression algorithm, and I can't even remember that view of it now. I really wish I had the source code for that game. :/ So much of what we did would be useful again in PICO-8.
This is really astounding. Now I put together a compressor recently, about 25-lines of code, it's great for compressing audio - but it PALES in comparison to your mighty one that CRUNCHES data very hard.
Started reading the source. Hoo boy ! Some real long-hair stuff. Both out of my range of expertise and understanding !
Good job. This works out much better than your RLE compression test, as a more complete solution. I was making my own compressor/decompressor along those lines at some point, but I ... got distracted. I'm a cat, after all.
|
has anyone tested this with maps?
It seemed to work at my first test, but if I try to copress my levels I get only corrupted data...
reload(0,0,0x3100,"level_1_1.p8") ------------- compression --------------- clen = comp(0,0,128,32,0x6000,mget) decomp(0x6000,0,0, pget, mset) ------------------------------------------ cstore( 0,0,0x3100) |
without comp/decomp, the data from "level1.p8" is correctly loaded^^
Here is "level_1_1.p8":
edit: Ok, the API makes absolutely no sense to me... how is this supposed to be used, when not working with images on the DisplayBuffer? Especially with own xset / xset functions, seems that xset will be called with negative values?!
This one destroys the palette :\
function _set(x,y,val) return poke(0x4300 + y*128+x,val) end cls() w=128 h=30 bpp=8 rawlen = (w*h)*(bpp/8) reload(0,0,0x3100,"mario_l3 - Kopie.p8") -- compression -- clen = comp(0,0,w,h,0x6000,mget) decomp(0x6000,0,32, pget, _set) for iy =0, h do for ix = 0, w do mset(ix,iy,peek(0x4300 + iy*128+ix)) end end cstore( 0,0,0x3100) |
why do the functions not work lineary on the memory? Just like the 'compression test' -thing
comp(source_addr, destination_addr, length) decomp(source_addr, destination_addr, length) |
This should do the trick:
comp(0,0,64,32,0x6000,mget) decomp(0x6000,0,0, mget, mset) |
edit: <snip> actually, I'm not entirely sure if comp should use 64 as the width (map values = byte, pixel = nybble) or not - seems to work ok with 128. I suggest you check that yourself, as you'll be able to spot errors in your map data quicker than I can.
You need to use mget in the decomp function instead of pget. This one took me a while to figure out, but in a nutshell, the algorithm looks at previously decompressed values (to determine whether it can predict the current value or not), which is what that parameter is used for.
Observation while debugging this: take care if you decide to replace pget/mget etc with your own implementation that uses peek/poke, the x,y coordinates will sometimes be out of bounds.
To be fair, px8 could really benefit from having a brief explanation of the various parameters - I'm still not entirely sure what having a cbits value larger than 1 is useful for.
Thank you very much. Now it works also with custom xget/xset functions:
clen = comp(0,0,tilemap_width, tilemap_height ,0x6000,src_get) tilemap_width = peek(0x6000) tilemap_height = peek(0x6001) decomp(0x6000,0,0, _mget, _mset) ... function src_get(x,y) if x < 0 or x >= tilemap_width then return 0 elseif y < 0 or y >= tilemap_height then return 0 end return tilemap[y+1][x+1] end function _mget(x,y) if x < 0 then return 0 elseif y < 0 then return 0 end return peek(0x4300 + x + y * tilemap_width) end function _mset(x,y,val) poke(0x4300 + x + y * tilemap_width, val) end |
Results are pretty good, with:
p.cbits = 4 -- max:7 p.remap = false |
Compressed maps are only 358 from 2505 bytes, for example
Hi all,
I'm loving these PX8 routines (I can't believe I never tried using them before!). But I wonder if I could pick your brains for how I'd prefer to use it - as the way I'm currently implementing it seems "inefficient".
GOAL
I want to store "pages" of graphics (e.g. blocks of 128x64, using the add_gfx() process suggested above) into a single cart's Sprite Sheet memory
(I've already got this part working...)
Then I want to be able to pick, say two "pages" of compressed data and extract them DIRECTLY (or as close to) back into the Sprite Sheet - effectively replacing ALL of the compressed data with uncompressed gfx.
(I'm close to getting this working, by using User Memory to decompress a page/block at a time, then MemCpy'ing back - but it's slower, uses more tokens, and I think flawed as I could overwrite chunks of encoded data that I planned to decompress next).
P.S. When I want to switch gfx data again, I can simply call Reload() to get the uncompressed data back and start again - seems pretty instant.
OPTIONS
OPTION 1:
I know I could to this all to the screen buffer (8k), then MemCopy back in one hit - but I don't want the user to *see* the decompression process. It's a shame User Memory isn't 8k in moments like this.
OPTION 2:
I know I could decompress part of the gfx to another area of memory (e.g. SFX) in addition to User Memory, but it could mean losing up to 4k - which leaves little sfx/music :o/
OPTION 3: ???
Can anyone got any ideas of how best to do this, ideally with as little tokens used? (I'm gonna use a diff. cart for the compression/storage, so it's only the decompression part that needs to be optimal)
For info, the custom functions I'm using with load_gfx/decomp to decompress to Memory (instead of PSET/MSET), is as follows:
function memget(x,y) local addr = 0x4300 + flr(y)*64 + x/2 local pbyte = peek(addr) local left = pbyte % 16 if x%2 == 0 then return left else return (pbyte - left) / 16 end end function memset(x,y,col) local addr = 0x4300 + flr(y)*64 + x/2 local left = peek(addr) % 16 local right = (peek(addr) - left) / 16 local val if x%2 == 0 then val = col+16*right else val = left+16*col end poke(addr,val) end |
Many thanks in advance! :D
@Liquidream I don't think there's a risk of anyone seeing the use of the screen memory as a scratch buffer. Screen memory is only copied to the display on flip() or the end of _draw(), and this is all single threaded so that won't happen until you're done.
I never realized that calling reload() on the current cart is virtually instantaneous and doesn't have the artificial delay! Seems like you could use all of addressable memory for this. :)
Thanks Dan,
That's what I originally thought, but I think I was seeing it draw to the screen without having Flips().
I'm wondering if the problem is that, even though I'm not calling one, a Flip() is happening as the Decomp() takes more than a few milliseconds to complete?
Either way, I'd def prefer to use the screen mem for a scratch pad (if I can do it between draws) :D
Yeah, feels kinda sneaky, right?
I was rather chuffed when I found this - let's put it that way! :D
If you take more than a certain amount of time to call flip(), it seems to occur on its own. This is why tweetcarts can be written without using flip().
Personally, I'd reload() the compressed sprite ram data at 0x4300, since you're allowed to set a different destination address in ram than the source address in rom, and decompress it directly from there into sprite ram.
If it's quick enough and you need 0x4300 for other stuff, use the screen as the reload() destination, but as I said, beware taking too many cycles or it'll auto-flip to the front.
If decompress time is too long to use vram without the auto-flip, maybe break your data up into smaller blocks, so that you can decompress a block this frame, then draw a loading screen or your game screen on top of the scratch data, flip, and move to the next block. You lose a little compression by breaking into smaller blocks, but everything has its cost.
It's okay - I'm a bloody idiot
...I wasn't declaring a "_draw()"
#facepalm
It was @Felice's comment about tweetcarts that made me think/remember that the rules are different without the main functions present. (I'd just build on the PX8 sample above, for testing - which didn't have them either).
I'm happy to say that it now works!
And what's even better is, I can now save loads of tokens using the original PSET/PGET method - as seems that's also fast enough! :D
#WinnerWinner
Thanks all ;o)
It sounds like you have the answers already but FWIW, here's a token-optimized* version of decomp that supports writing to (and reading from) any location in memory.
*= at face value the whole thing uses a few more tokens than the original, but if you need to call decomp more than once in your code you'll start saving tokens (as well as having generic pixel read/writes). As an added bonus you can use the single-number-parameter-as-string optimization, e.g. decomp"384" to save another token per call.
It comes with a few caveats and gotchas though.
-
It's been lifted out of a project I'm working on, and in its current state it probably won't work if it's dropped into the px8 cart for testing without a few alterations.
-
It's not optimized for speed, though I doubt it makes too much difference compared to the original. (especially if you do it at 'load time'. readpixel() is ok for light use at runtime as well, but it became a tradeoff between speed and tokens)
-
The x offset parameter is not supported. The images & data I'm compressing are all 128 x whatever -sized so I didn't really have a use for it.
-
The py parameter is a little weird & I keep confusing myself with it, but is essentially a y coordinate starting from the top left of sprite memory and going up (well, down) from there, as if you were using sset/sget to read all of memory. Alternatively it's an address in memory divided by the width of the screen in bytes, i.e. addr/64, with the restriction that the address must be a multiple of 64. For example to write to the start of screen memory (0x6000), you would use decomp(384), writing to the start of audio memory would be decomp(196) and so on. I'm not sure x offsets are supported properly, so fractional values probably won't work as expected.
- I moved the getval function out into readbits() as I found it useful for other things as well. Some of the variables have been renamed to suit my purposes (readhead==src, databit==bit, databyte==byte). It also reads from the top of cart memory to the bottom, so the math is inverted, but it's otherwise equivalent to (and probably should be replaced with) the original getval().
code:
NB 'setpixel()' has been inlined in decomp() as I don't use it anywhere else in my code - you may want to change that or use a faster version.
I use both option 1 & 2 in my project in pretty much the same ways as you described:
For #1, I hide things offscreen by switching to the 64x64 screen mode and then decompressing to & copying from the bottom (hidden) portion of the screen to sprite memory, before switching back to 128x128. You could also use the top left corner of a 128x128 image as a cheap splashscreen :)
For #2, I store a 128x128 image starting at address 0x3f00 (or lower if you need to use save data) and accepted that I'd have fewer sfx slots to work with. You still get ~44 though IIRC so you're unlikely to blow that budget for all but the most complex audio. Though you do have to be careful about which slots are used, or write some tools to pack & shuffle sfx around.
Hope that helps.
edit: If anyone can squeeze it further I'd love to know. I can see one more token up for grabs in there now I've read my own post back, so it's probably not that optimized after all.
@Catatafish: Thanks for this - will keep it in mind for the future
For now, I'm prob gonna try to squish/refactor the Decomp & Load_Gfx functions into one, with as few tokens as poss. But that's for another day! :o)
Hum...I am clearly doing it wrong but I don't get how such a simple test can be failing in such glory!
Top 64x64: sample image
Bottom 64x64: decompressed output :/
Interesting part:
-- sample image circfill(32,32,16,7) -- compress to gfx comp(0,0,64,64,0x0,pget) -- display back from gfx decomp(0x0,0,64,sget,pset) |
Replace sget with pget & it should give you the correct results.
^^^ Wot @Catatafish said! ^^^
(Just tried it with your code snippet)
This seemed counterintuitive to me too, but (I think) the reason you use the paired get/set for the target decompression, is because it uses this to "read" back as it goes - part of the predictive nature of the routine?
I dunno - I'm already outta my depth! :D
(Apologies in advance for unrelated post)
This reminds me of working with SmileBasic; it could only save images so I jiggered some level data into being converted to an image specifically for saving...anyway cool stuff!
UPDATE!
PX8 has been replaced by PX9: https://www.lexaloffle.com/bbs/?tid=34058
If you're starting a new project, you probably just want to use PX9 instead. Although they use the same interface, I've given them separate names and threads to avoid confusion.
[Please log in to post a comment]