Log In  


(There have been other rants on this topic, but most of the discussion seems to predate the move to token counting.)

So I'm brushing up against the compressed codesize limit with a cart that's barely halfway towards the token limit. It sounds like this is unusual, and from what I can tell it's because I'm commenting the code fairly thoroughly.

And I have to wonder, what is the intent behind the compressed size limit? What is it still meant to accomplish that the token limit doesn't already?

The token count is a rough representation of binary or bytecode size, which was an authentic historical constraint for many systems. It's measurable from within the editor: a coder can see at a glance how many tokens they have left and how they are affecting that number as they type. While it's approximate, it's directly correlated to code complexity, and this makes it fairly intuitive to reason about. The steps for reducing token count are also intuitive: simplify code, improve code sharing, generate data rather than hardcoding it, do more with less.

The token limit defines a scope for pico-8 cartridges; it encourages creative solutions and algorithmic content generation, and it plays off the other cart limits by discouraging tactics like offloading data into code.

Compressed code size, on the other hand, is a representation only of how much entropy your source code exhibits, a limitation no developer historically suffered under. It cannot be easily measured as you make changes, and it has no direct correlation to the complexity of your program or the efficiency of your code; only to the volume of text and the amenability of that text to an arbitrary compression algorithm.

This doesn't foster creativity: all it does is cause unwelcome surprise and punish certain coding styles. Specifically it discourages comments and descriptive naming, which become barriers for anyone else trying to learn from your code. The compressed size limit doesn't even keep you honest; it's trivially easy to bypass with the aid of a minifier, at the cost of making your code unreadable and uneditable.

The compressed size limit does prevent you from packing your code full of string data, which ok fair enough; text storage was a huge constraint for early game developers and led to creative solutions like paragraph books and the z-machine's bespoke text compression format. But characters in text strings seem like they ought to count toward the token limit anyway - and maybe they already do?

5


IMO, the compressed code size shouldn't count comments at all, and I'm pretty sure you're right in thinking that it does. I realize that the carts are meant to decompress with everything intact, including comments and whitespace, so that they can be opened in the editor, therefore the comments and everything are still there in the compressed file. The problem is though that counting comments and whitespace against the compressed limit discourages good commenting, and I feel that the compiler should ignore them in terms of counting the characters, since traditionally a compiler would be stripping them out entirely anyway. So I feel that the compiler should pretend they aren't there, just as any other compiler wouldn't compile with them in there anyway.

I agree with what you said about the characters in strings - they should totally still count, because they are still data used by the compiled program. But comments and whitespace (including newline chars) shouldn't. You're spot on with the notion of unreadable and unmaintainable code because of the limit. I've seen too many carts that name everything with one or two characters, use little or no comments, and/or don't do much indenting, probably to avoid the character limit thing (although some of that could partially be because of the 33 columns in the editor thing as well, to avoid horizontal scrolling. I've been guilty of some of these myself for primarily that reason).


Scathe: comments and whitespace are indeed counted into the compressed size, which is AFAIK just the byte size of the zlib-compressed source code.

From testing in 0.1.5 it appears strings of any length are treated as a single token, so the character and compressed size limits are all that are keeping you from releasing an ebook in pico8 form.

I feel like it would be truer to pico-8's design goals to count individual chars as tokens, but then the token limit would have to be raised (or another separate limit implemented) so as not to break existing carts.


@Viggles @Scathe Pico-8 could simply enforce a maximum cartridge size and throw an error when attempting to run a cartridge that exceeds that limit. How do you guys feel about that?


That's essentially what it already does: the maximum cart size is 16kB+16kB (max code size+fixed size of sprite/map/sound data).

The issue is that "max code size" here means the size of a zipped version of your original source code, not the size of the machine code or bytecode it would - theoretically - compile down to. That's what the token count roughly represents, and is what actually constrained the developers of yore (the ones who weren't writing in BASIC, at least.*)

So this feels like the wrong metric to me. To draw an analogy, it's a bit like limiting the amount of sprite data you can fit by basing it on the size of the photoshop PSD file you drew the sprites in - not the size of the actual 16-color sprites it produces.

(* BASIC programmers operated under much the same constraints as pico-8 now: their source code was what they shipped, and every extra character was disk space they could ill afford. But I would make the point that their source code tended to be unreadably terse as a result, and one of pico-8's great theoretical strengths is that you can easily explore and learn from other people's code.)


So bytecode size limit sounds like the way to go. What would be the downside of using that metric?


The compressed code limit is an actual format limit of the PNG cartridge system. It's the maximum amount of data that can be stored in the pixels of the cartridge image in accordance to the format Zep came up with. It's not in any way intended to be a part of Pico-8's standard limitations (as far as I know), just a side effect of how cartridge saving works.

The reason you can't change it to a bytecode limit is because Zep maintains that all Pico-8 cartridges should be open-source. This is partly because Lua bytecode is not platform-independent and not secure -- if bytecode that has been tampered with could be loaded from Pico-8 cartridges, it could crash the Pico-8 or worse. But if it's open-source, that means the source code (including all comments) has to be exported on the cartridge.

So if you want to resolve this yourself, your best bet is probably to get Zep to un-limit the text format (.p8) if it's limited in the same way, and save to that during development, then use a tool like picotool https://www.lexaloffle.com/bbs/?tid=2691 to strip the comments out before you export it to .p8.png to post on the BBS.


@JTE: Thanks for shedding light on that! That does make a lot more sense, and I had gotten hung up on intent rather than considering there might be practical reasons.

In 0.1.5 the compressed size limit is applied to .p8s as well as .p8.pngs: pico-8 will happily load and play an oversized cart, but refuses to save it in either format.

Previous threads all suggest that the compressed size limit ought to be a nonissue: that carts should hit the token limit before maxing out compressed size. (Perhaps that analysis dates from before pico-8 got more picky about what it treated as tokens.)


On the other hand, it would be fairly easy to store the data in seperate chunks of the png format, that's how I do it with the Nano89 cartridges. They appear as png images as well, but can hold up to 1mb of data, which is the address space limit of the internal mmu which pages the cartride into the 64kb system space of my fantasy console.


As this thread is directed to a topic I had of interest today, "compressed size," I would like to point out that it might be helpful for coders to be able to allocate what they want and for where.

For instance, it is highly unlikely I will ever use MUSIC or the MAPPER. I can do one channel music as a SFX and did so effectively in my Haunted House game. That space could be released to me to allow more coding space.

Alternatively, someone could say I'm going to do it all myself, and free ALL resources except for coding space.

It would not exceed the current memory and still give greater flexibility to the programmer.

On the Apple ][+ computer, you could do this. If you decided not to use HIRES graphics (280x192 (2 pages) and instead chose LORES graphics $0400-$0800. That was an additional $03C00 bytes you could work with instead of just the basic $4000.

To sum up, PICO could add this command in code:

ALLOCATE(Code space, Sprite space, Mapper Space, SFX Space, Music Space).


Well, you can already allocate music and map space for data/other stuff. You might want to use some tokens on functions for compressing/decompressing data there though. However, you can't use it for "coding space", of course.


tbh, counting comments is bullshizz and shouldn't happen.

But if you want a personal/dirty solution you can always just number your comments and include a text file with the source like footnotes.

I know this isn't elegant nor where anyone wants to be, but it is a solution until/if something else gets fixed.


Hmm ... unlimited coding space for REMARKS. I like that, Cabledragon, and so should other coders.

Tobiasvl, I'm definitely using a compressor and decompressor for pixeled data for my paint program.

It's helping quite a bit and I've saved a lot of space instead of just all-out raw default drawing to the sprite page.


I'd vote for a new cart format where the code gets automatically split into:
a) Minified form, which is what gets executed and is bound by the current limitations.
b) "Unminification instructions" which can be used to turn the minified code into the original code without changing semantics (e.g. can only add comments & whitespace) in order to display it on the site. This would not be bound by as many limitations.


2

If we're going this route, maybe it would also be nice to allow users to have a RAM copy of the 3x5 font so it could be adjusted.

Instead of the lowercase letters just looking like smaller uppercase, they could truly be lowercase and have descenders as well.

[40x32]


I have a lot of extra code in my game dedicated to debugging and automated testing, that won't make it into release but is useful for development. So I have a debug build separate from release to include all this extra code (as done in other engines). Except the extra code was blowing up both the token and character limit, and was unusable in practice.

I was desperate to still have it run on my machine to make development easier, so I patched my copy of PICO-8 to extend the token limit. I could not find a way to extend the char limit though, so right now I'm trying ways to minimize the source code itself, without necessarily reducing the token count (mainly renaming variables with shorter names, I'm also trying luamin as suggested by the OP).

Patching the app is clearly not a solution to unlock code limitations for actual game releases, only a hack that allows development with more features. End-users should use a vanilla PICO-8 anyway.

I'm not sure if zep appreciates the move even though it's meant for developers only, so I'm not releasing the patch until I get permission. Honestly, I think it wouldn't be useful for most projects, but since mine is open source, I still want people to be able to download the code and build the different configs, including the debug one (there is actually a full debug with profiling and all, and a config for simulation tests which shows the character running around, and I want to show the latter for demonstration purpose). So in the end, I'd like to provide that patch to people who want to test those configs.

Anyway, there are still many things I can do to reduce the character count, but for the token count I'm stuck as my architecture is already streamlined and there is not much to change. Maybe some hacks like inlining functions used only one time... But that should be done in a build step, not directly in the source to keep the code readable.

As I add more features, the release code itself will grow and may eventually reach the token limit as well. At that time, my hack will become useless as it won't work for end-users.


3

So, in the meantime I made a long post on another thread on how to patch pico8 to skip the splash screen on boot (https://www.lexaloffle.com/bbs/?pid=66912#p), which I consider much more complex than removing the token limit. Now, the token limit patch doesn't sound like a big deal anymore, so I might as well explain it here.

The core idea is to find the token threshold (8192) in the assembly and replace it with a bigger value. Since the address of that threshold depends on the PICO-8 version and platform you use, I'll give you the process of thought rather than a specific address/patch.

Patching token threshold

Basically, I searched for a comparison with the token limit 8192 (0x2000) in the assembly:

$ objdump -d pico8 | rg 'cmp\s+\$0x2000,' -C 2

Among the results, I found this (addresses will differ depending on version/platform):

438460:	48 89 df             	mov    %rbx,%rdi
438463:	e8 78 36 00 00       	callq  43bae0 <count_tokens>
438468:	3d 00 20 00 00       	cmp    $0x2000,%eax
43846d:	0f 8f bd 00 00 00    	jg     438530 <run_program+0x560>
438473:	8d 04 40             	lea    (%rax,%rax,2),%eax

Then I replaced 0x2000 used for cmp @ 438468 with 0x8000 (any hex editor is fine):

438468:	3d 00 80 00 00       	cmp    $0x8000,%eax

and saved it as pico8_4x_token.

That's it!

Multiplying the token threshold by 4 is enough in common cases, because it bumps the limit to 32768 tokens, while the smallest common token chain uses 2 chars per token ({1,1,1,1,1,...}), so the time you reach the token limit, you have already reached 65536 characters.

(uncommon 1-char token chains include "+-+-+-+-..."; but you can always increase the threshold if you need to)

If you want to save your own patch for later, follow the instructions in the post linked at the top.

Remember, this is for development only. More info below.

Limitation notes

This patch allows you to continue working with the PICO-8 editor even when there are more than 8192 tokens. However, it doesn't change the other existing limits. This means that:

(1) code beyond 65536 chars is still truncated (cannot be run correctly in editor)
(2) code with a compressed size > 100% can be run, but not exported to .bin/.html
(3) code with more than 8192 tokens but less than 65536 chars and 100% compressed size can be exported, but not played by a vanilla PICO-8 runtime binary

And if you want go further:

  • You can avoid (1) with minification.
  • You can reduce compressed size in (2) with minification, but my experience showed tokens themselves are still a big contributor to compressed size. In other words, you will have to strip all that extra code before you export your cartridge. As we explained above, the extra tokens should only be used for debug features anyway.
  • For (3), there is a small range of cartridges that have some extra tokens but are still below 100% compressed size, that you could decide to still export. In my experience, the margin is low (I could reach ~10,000 tokens at compressed size 100%, so I only gained ~2000 tokens). In addition, you will have to distribute a patched version of the pico8 runtime binary to allow playing the bigger-than-usual cartridge... And that won't work for the HTML version, nor for users who download your cartridge directly from the BBS and run it on their vanilla PICO-8.

2

I have just tried patching the pico8 runtime executable distributed with game binary exports, and it works! The process is the same, except you must patch the executable file in your binary export.

For a given version of PICO-8, all exports contain the same executable, only data.pod (which contains cartridge data) and the icon change. So you just really have to patch the executable once, and reuse it for any future binary export.

By the way, I forgot to mention that the example I gave above was for Linux 64-bits. However, I tried to do the same with OSX and Windows and interestingly, the assembly code for the cmp instructions was similar.

I didn't exactly try to patch the PICO-8 editor executable on those platforms, but I did manage to patch the pico-8 runtime of Linux, OSX and Windows binary exports.

A quick way to find the cmp line for the Linux, OSX and Windows executables in hex view, 8-bits per group, is to search 3d 00 20 00 00 0f then replace 20 with 80.

If you have any issues ask me for more information. I think I'll make a thread to sum-up my findings, I hope it will be enough for you for now.


1

Looks like patching the HTML/WASM runtime was easier than I thought.
The token limit appears in clear in the minified JS code:

  1. Open your exported web version .js file
  2. Search for var asm = or // EMSCRIPTEN_START_FUNCS. You'll see several minified functions.
  3. Among them, search for "0)>8192". You should find it in the first function. The comparison code is actually "if((o|0)>8192)" but minified var name o may change across PICO-8 versions so don't rely on it too much.
  4. Replace the comparison with ">32768" or anything you'd like.

If you don't want to go overboard and only need extra tokens temporarily, you can just set the limit to 9000 for instance. This is also true for patching above, except you'd write it in hexadecimal.

Your HTML version now works with tokens over the limit!

This, and all previous patching methods, of course break an intended limitation for PICO-8, but since this thread is all about what should be true retro console limitations, so if you think that the visual, audio and final cartridge size (which is already included in the max characters limitation) are what really matters, and your excess of tokens is mostly due to conventions, OOP, whatever, this may be the way to go for you.

However, remember that other people use a vanilla PICO-8 and may prefer downloading your cartridge from the BBS to play games comfortably without downloading a new binary each time (PICO-8 on micro-computers trying to reduce disk usage may be another reason). Your cartridge won't work on their machines. But if you only distribute binary and HTML versions, that's fine (as the license allows you to patch your own exports).



[Please log in to post a comment]