Feature Request: True 64k Coding

dw817 • 2022-10-01*2022-10-01 17:21* •

BBS>

PICO-8>Chat

Hello, and especially to @zep.

I have noticed when you are coding that you count the actual number of characters from your 64k limit when typing code.

For instance, let us see what info gives us in typing from immediate mode with no code:

FILE: UNTITLED_1.P8
TOKENS:         0 /  8192
CHARS:          0 / 65535
COMPRESSED:     0 / 15616

OK, now let's add a single line of code in the source-code editor. PRINT(".")

Now return to immediate mode (ESC) and type INFO and you have:

FILE: UNTITLED_1.P8
TOKENS:         3 /  8192
CHARS:         11 / 65535
COMPRESSED:    11 / 15616

Here is the problem. Now edit that program line so it reads: PRINT("\0") then return to immediate and type INFO again in. This time you get:

FILE: UNTITLED_1.P8
TOKENS:         3 /  8192
CHARS:         12 / 65535
COMPRESSED:    12 / 15616

NOTICE that CHARS went up by one !

I would like to suggest that you are not penalized by keystrokes but actual characters in your code. In this case \0 could be read as a single character.

This may not seem very useful initially, however in the long run it will help, especially those people who store binary data in their code as all of \0 \10 \13 \34 and \92 must all be written using the backslash format if defined as a string in the source-code. And \128 up to \255 if they are to be normally typed characters.

So \92 takes the same amount as space from the available 65535 as * when defined inside a string.

And it will help on space used for those making use of PCM audio and any other 8-bit storage methods in their code.

chars 64k 65535 info memory

GPI • 2022-10-01 2022-10-01 19:08

thats not to trivial as you think
for example

print("hallo\10world")

will become

print("hallo
world")

also a /0 will mark an end of the code...

dw817 • 2022-10-01*2022-10-01 19:15*

Hi @GPI:

No, no code would be changed from where it is. Only the ALOTTMENT would change.

Since LUA is not really limited to 64k, the amount of perceived emory could check for this.

In no way does anything else change except what is SEEN and allowed from the virtual 65535-characters available.

And it would ONLY be looking for "\" characters to take away a single character of virtual memory instead of the actual amount.

Removing true CRs and any line that uses "--" as a remark would also be nice. That is if you have code that is just remarks and nothing else, that would take zero chars of the 65535 and zero tokens besides.

However THAT might be overkill even if desired. :)

For now converting every \ to a single character would be awesome right there.

GPI • 2022-10-01 2022-10-01 19:20

lua is not limit, but the saved format in p8.png or p8.rom is.
Ok, there is the compression-limit the more important limit.

dw817 • 2022-10-01*2022-10-01 19:34*

Yep, @GPI. I don't think compression would be changed at all. It is more of a lifting of the artificial limitations in typing 65535-characters for code and being forced to use 2, 3, and 4-characters for a single 8-bit character.

Also P8SCII would not be affected at all. Whatever characters you type in there do in fact take away from your 65535. Only the \ definitions would convert to a single character of 'storage.'

ultrabrite • 2022-10-01*2022-10-01 21:59*

@dw17 what about newlines, spaces, and... vowels?
seriously, 65535 is a plain text source code limit, that makes sense and that's it.

but then I think the limit should actually be 65536.

dw817 • 2022-10-01*2022-10-01 22:03*

Vowels ? LOL ! OK you got me outta my chair on that one, @ultrabrite. :)

No, I just mean the \ definitions. Newlines, Spaces, and... vowels, they would each take whatever it costs in regular code, there would be no change there.

No other changes - just to convert those \ definitions to single 1-char. storage as that is what they are defined in the code as. And remember it is only the virtual limitation to be adjusted. The actual code can and probably always would use 4-characters, for instance for \255

For I have seen code that creeps mighty close to the 65535 limit and they are chock full of \0 for instance devouring 2-bytes each where 1-byte in allotment could be used instead - allowing them to further develop their code.

As for the extra byte in 65535, that +1 could be a hard-coded zero to denote the end of data so it is always required.

merwok • 2022-10-02 2022-10-02 00:31

this request is in the same category as the one about defining constants: incompatible with the nature of pico8 carts! the code written is the code saved, there is no compilation or transformation, so two characters don’t count as one and that’s unescapable.

dw817 • 2022-10-02 2022-10-02 20:25

Hi @merwok. I am merely referring to the "INFO" command and the artificially set limitations. Nothing more. There is no compilation or transformation.

It is just the cosmetic interpretation of memory usage. Nothing more than this.

aced • 2022-10-02*2022-10-02 21:41*

@dw817 - that sounds like cheating! Where would the challenge be? ;-)

dw817 • 2022-10-03 2022-10-03 00:13

Hi @aced:

It would still be there. Apple ][ Integer Basic is a good example of being able to include 8-bit characters directly in the code without having to resort to "\" to get them.

It would use a single byte at the beginning to represent how many characters (and tokens) are used on that single programming line, and go from there.

It could be done, and the challenge would still remain. If you choose not to use \ anywhere in your code for strings or output, then everything is exactly the same.

And certainly at least I believe - we should not be penalized for documenting our source-codes - where it would take 0-characters if it's a remark.

But that could be extreme and a dealbreaker for hardened P8 coders. They want to be penalized for documenting their code.

dddaaannn • 2022-10-03 2022-10-03 05:54

You appear to be suggesting that only the uncompressed character count be modified, and to leave code storage and the compressed count alone. This won't help except in rare cases where code with many escape sequences is hitting the uncompressed limit before hitting the compressed limit. Typically, data in strings hits the compressed limit first, because it doesn't compress as well as the rest of the code, regardless of whether it is represented in escape sequences or literal P8SCII characters. Allowing an extra uncompressed character makes no difference if there isn't a proportional savings in the compressed bytes.

If you are suggesting that the compressed code limit also somehow be modified, note that the compressed code limit is not "artificial" in the way other limits are. .p8.png files have a fixed number of bytes in which to store compressed code. The compressed byte count is the actual output size of the compression algorithm, not a fake count based on an arbitrary interpretation of the code. PICO-8 can't pretend that \123 is a single character during compression and also maintain how it is represented.

Perhaps you mean to suggest that escape sequences in string literals be converted to P8SCII characters when stored. Perhaps also that if certain characters are best represented as escape sequences, a fixed range of characters in string literals are converted to escape sequences automatically in the editor. This accomplishes the stated goal, at the expense of losing control of how characters are represented in string literals.

For what it's worth, the compression algorithm is already doing the best it can to store code in as few bytes as possible. PRINT('\97') is the same size compressed as uncompressed, but make that a PRINT with 90x \97 in it and it stores 280 characters of code in 40 bytes. Data in strings doesn't compress as well as code in general, but it's easy to imagine that large quantities of escape sequences compress nearly as well as equivalent P8SCII bytes.

dw817 • 2022-10-03*2022-10-03 16:32*

@dddaaannn, you are the voice of reason.

Alright, guys. I didn't consider the p8.png. Yeah, that would suffer if all \ got converted. Hmmm ... Well darnit, it was more the principle of the thing, getting cheated out of 2-3 chars for a single character.

And obviously Apple ][ Integer BASIC saved in only format, tokenized length, which is why it can store 8-bit characters in the code.

There is a very distinct reason I want this. Let's say you store a raw image of a screen that is completely black except for maybe some lines and circles.

The majority of the string in your sourcecode would have '\0' in it, occupying 2-characters instead of just one.

Now I see a way around this yet it breaks the convention and convenience of being able to PRINT a single string to create a picture.

Thanks everyone who interjected. A good pipe dream to be sure.

I will have to think on this ...

merwok • 2022-10-04*2022-10-04 23:05*

no, we don’t have to use escape codes and spend the characters, they can nearly always be replaced by the character they represent: https://www.lexaloffle.com/bbs/?tid=38692

dw817 • 2022-10-05 2022-10-05 02:23

Hi @merwok.

I guess the "meat" of the subject was I wanted to shrink, \0 to one character as that would primarily appear in an empty binary set. I'm fresh outta luck on this I think. :)

slainte • 2022-10-05 2022-10-05 18:17

the counter is not counting "characters in your code", it is counting "characters in your file" which is a totally different (and accurate...) metric as it is now. I would stick to the existing behaviour, I've not seen any code editor counting chars in a different way

dw817 • 2022-10-05 2022-10-05 22:35

Hi @slainte:

Yep, @dddaaannn explained that neatly.

thisismypassword • 2022-10-07*2022-10-07 19:46*

It would be nice if you could write/use literal null and CR characters, though - that way you could use binary [[...]] strings where each byte counted as 1 character. (esp. if [=[...]=] and its ilk were supported too - then even ]] sequences would work)

The main issues with that, I guess, are:

) Null characters are kinda special in good ol' C, which lua and pico8 are written in. So the lua/pico8 tokenizing/parsing code would need to avoid using null-terminating strings (and instead use, e.g. string slices). A cursory glance at the lua source suggests this might already be the case BUT it also seems to be using '\0' for some special purposes that might or might not need to be revised.

) Also, still on Nulls, they indicate an end of uncompressed code in a binary pico8 cart format AND (less importantly) indicate an end of an uncompressed block inside compressed code in a binary pico8 cart (png/rom).
This means that (binary) carts with nulls would always need to be compressed and that the compression of already-compressed binary strings with nulls would not be ideal with the current uncompressed block format.

(Note - this only affects pngs/roms - in p8 files, a special unicode character (⁰) can be used, just like it's used for other control chars - see https://pico-8.fandom.com/wiki/P8SCII)

) As for CRs - the lua spec says that all line-endings are unified into '\n's, and you'd indeed want to keep converting '\r\n's in p8 files into '\n's in the pngs/roms. However, I see pico8 already does this conversion when creating pngs/roms, meaning pngs/roms are not created with '\r's in them.

That means that it shouldn't be a breaking change to allow a new special unicode character (ᵈ) to stand for '\r' and have it be kept as '\r' in the png/rom (even if it precedes a '\n'). All it'd require is:
)) Doing the line-ending conversion before doing the unicode conversion, so the lua lexer will only ever see '\r's if it comes from the new unicode character.
)) Changing the lua lexer (llex.c) to avoid doing its own newline conversion inside strings.

[Please log in to post a comment]

About | Contact | Updates | Terms of Use | Picotron

Follow Lexaloffle:

Generated 2025-01-15 14:26:50 | 0.018s | Q:39

User:
Password: