(recommended: use with PICO-8 0.2.1b or later)
This function can be used to convert binary strings into a format that can be pasted into source code. Binary strings contain all characters from chr(0)..chr(255) and as such include unprintable / unstorable characters. escape_binary_str() adds the needed escape codes and stores the remaining characters as-is. For example character 10 becomes \n and character 0 becomes \0, or \000 when followed by a number (to avoid ambiguity).
This is useful for storing dense binary data efficently (e.g. compressed with PX9). If you are storing structured data in code (like a raw image), it will likely be easier and almost as efficient to store them as a bunch of hexadecimal characters.
function escape_binary_str(s) local out="" for i=1,#s do local c = sub(s,i,i) local nc = ord(s,i+1) local pr = (nc and nc>=48 and nc<=57) and "00" or "" local v=c if(c=="\"") v="\\\"" if(c=="\\") v="\\\\" if(ord(c)==0) v="\\"..pr.."0" if(ord(c)==10) v="\\n" if(ord(c)==13) v="\\r" out..= v end return out end |
Workflow
Step 1. Generate a Binary String
binstr="" for i=1,256 do binstr..=chr(i%256) -- any data you like end ?#binstr -- 256 ?ord(binstr,256) -- 0 ?ord(binstr, 13) -- 13 |
Step 2. Escape the String and Copy to Clipboard
printh(escape_binary_str(binstr), "@clip") |
Step 3. Paste into Source Code
* Turn on Puny Mode (CTRL-P) // to make sure uppercase characters are encoded as punyfont
CTRL-V into source code as a string value bindat="[paste here]". You should get something like this:
bindat="¹²³⁴⁵⁶⁷⁸ \nᵇᶜ\rᵉᶠ▮■□⁙⁘‖◀▶「」¥•、。゛゜ !\"#$%&'()*+,-./0123456789:;<=>?@𝘢𝘣𝘤𝘥𝘦𝘧𝘨𝘩𝘪𝘫𝘬𝘭𝘮𝘯𝘰𝘱𝘲𝘳𝘴𝘵𝘶𝘷𝘸𝘹𝘺𝘻[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~○█▒🐱⬇️░✽●♥☉웃⌂⬅️😐♪🅾️◆…➡️★⧗⬆️ˇ∧❎▤▥あいうえおかきくけこさしすせそたちつてとなにぬねのはひふへほまみむめもやゆよらりるれろわをんっゃゅょアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワヲンッャュョ◜◝\0" |
4. Enjoy your Binary Data
The contents of bindat can now be accessed with ord(bindat, index) (note that index is 1-based).
Hi zep! (First comment here but not the first time this is discussed on discord :)
Could you confirm that in addition to 'ord', we can also use 'sub' to get more that one byte at once, and use 'poke' to get the data into memory?
sub returns a string - @merwork: are you expecting that to work:
poke4(0x4300,sub(byte_str,1,4)) |
??
No, I would expect sub to return a substring of the initial byte string!
Edit: tested 'sub', works as intended (doesn’t cut any character), so it can be used to split a big encoded string into sub-strings. This makes it possible for example to encode many levels in one string then get one to load it, rather that being forced to define separate strings. We can’t use 'poke' though, so to load binary data we have to call 'ord' in a loop.
0.2.3 changelog:
> Added: ord(str, pos, num) returns num results starting from character at pos (similar to peek)
One call, no loop \o/
I noticed someone referring to this post and also that it hasn't been updated in a while, so I figured I ought to take a stab at refining the function. This comes in at about half the tokens (now 57) and probably performs better, though this sort of function probably doesn't get used much at runtime, so maybe that's not so important.
-- ordinal (0..255) -> escape sequence table ord_esc=split("¹²³⁴⁵⁶⁷⁸\t?ᵇᶜ?ᵉᶠ▮■□⁙⁘‖◀▶「」¥•、。゛゜ !?#$%&'()*+,-./0123456789:;<=>?@abcdefghijklmnopqrstuvwxyz[?]^_`abcdefghijklmnopqrstuvwxyz{|}~○█▒🐱⬇️░✽●♥☉웃⌂⬅️😐♪🅾️◆…➡️★⧗⬆️ˇ∧❎▤▥あいうえおかきくけこさしすせそたちつてとなにぬねのはひふへほまみむめもやゆよらりるれろわをんっゃゅょアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワヲンッャュョ◜◝",1,false) ord_esc[0]="\\0" -- nul ord_esc[10]="\\n" -- newline ord_esc[13]="\\r" -- cr ord_esc[34]="\\\"" -- quote ord_esc[92]="\\\\" -- backslash function str_esc(s) local r="" for i=1,#s do r..=ord_esc[ord(s,i)] end return r end |
BTW I had to convert a literal tab in my split("...") string into "\t" because the BBS code parser converts tabs to spaces. This seems like a possible problem. I really wish you'd preserve tabs in code previews and just set the CSS "tab-size" value to something appropriate for PICO-8. I suggest 2, as always, but do 4 or 1 or whatever, just as long as you keep the tab character as-is. Code blocks should never be molested in any way other than styling them, really.
Edit: Here's the code inside a cart just so it can run the unit test, which just compares my method's results with yours, and eventually break when something changes. 😜
That's nice and streamlined, but it doesn't add 2 extra zeroes to \0 glyphs if they're followed by numeric symbols, which could cause errors.
Yep, @JadeLombax and @Felice. For instance in my compressor I must use \48 to \57 for digits. If I don't the data messes up.
Oh, drat, I missed that element of zep's converter. I'll see if I can come up with something that's streamlined but still works well, hmm. Lemme think.
I've been working with a system similar to this, but \0 isnt read, it will also corrupt the next byte if its a 0. are there any solutions for this?
yes @teddblue that's fixable: "\000" is another way to encode the 0 byte, "\005" for 5, etc. you should use this way to avoid issues if the next byte is an ascii number ("0"-"9", bytes 48-57)
escape_binary_str (above) handles this with the local pr =
line -- take a look at that part of the code
yea i just have my encoder use "\000" instead of "\0" now and it works great.
pancelor shared that some editors might convert a pasted \t
to spaces, so the snippet can be edited to add a line to handle this!
How would one obtain of a non-escaped binary string in the first place? Like, the binary string of a GFX element, for instance. Copying GFX gives you a hex value.
@teven()#6
Does this suit your purpose?
str="convert a string to hex" foreach({ord(str,1,#str)},function(v) ?sub(tostr(v,1),5,6) end) |
This seems to convert a string to hex? I'm looking for a way to convert data to a binary string; for instance, GFX (spritesheet) data. I'm poor at bitwise math, so I can't tell if I'm getting it right or wrong. But I'm trying to learn how to covert data to a binary string, and interpret data from that string.
@ Steven()#6
Ah, I misinterpreted it!
dlen=0x2000 --full --dlen=0x1000 --half str="" foreach({peek(0,dlen)},function(v) v=chr(v) str..=v ?v --stop() end) |
This code allows you to check the unescaped binary characters in the sprite sheet one by one.
The str variable also stores the result of concatenating those characters.
(Unescaped binary strings pick up control codes, so they cannot be displayed as a single piece of text or copied and pasted properly.)
[Please log in to post a comment]