Here are a few token count discrepancies:
print(12) -- 3 tokens print(-12) -- 3 tokens print(~12) -- 3 tokens print(-~12) -- 4 tokens print(~-12) -- 5 tokens print(~~12) -- 4 tokens ?12 -- 2 tokens ?-12 -- 3 tokens ?~12 -- 2 tokens ?-~12 -- 3 tokens ?~-12 -- 4 tokens ?~~12 -- 3 tokens |
Also this inconsistent behaviour with spaces:
print(-12) -- 3 tokens print(- 12) -- 4 tokens print(~12) -- 3 tokens print(~ 12) -- 3 tokens |
I see this piece of a regex somewhere in PICO-8's future:
[~-\s]* |
Fixed (or improved?) for 0.2.1:
I think the cleanest way to make this consistent is to treat a single '~' (with no following whitespace) as part of a number token, the same way '-' is. So the count results become:
print(12) -- 3 tokens print(-12) -- 3 tokens print(~12) -- 3 tokens print(-~12) -- 4 tokens print(~-12) -- 4 tokens * was 5 print(~~12) -- 4 tokens ?12 -- 2 tokens ?-12 -- 2 tokens * was 3 ?~12 -- 2 tokens ?-~12 -- 3 tokens ?~-12 -- 3 tokens * was 4 ?~~12 -- 3 tokens print(-12) -- 3 tokens print(- 12) -- 4 tokens print(~12) -- 3 tokens print(~ 12) -- 4 tokens * was 3 -- token count increases! |
Still a little weird, but consistently weird.
Thanks for giving this some thought! Still a bit puzzled by "-12" being one token and "- 12" being two, but yeah, at least it’s consistent!
I thought that "-12" = 1token was because minus is counted as a part of the number now. and "- 12" Implies that your doing math with other numbers.. But IDK
Due to how priority works, it’s not always part of the number; for instance in -2^4 the number can’t be -2 otherwise the result would be wrong.
I feel like ~-12 should still be just one token. It's a constant value, at least conceptually.
I don't know if the compiler folds it into one final value. It ought to recognize that the leaf node on the AST is a constant 12 or 0x000c, and that its parent is a unary minus, so those two nodes can fold into 0xfff4, making a new constant leaf node. And that node's parent is a unary not, so those two can also fold, to 0x000b.
But even if it does, I dunno how hard it is to recognize that outside of the compiler. The problem with fixing -2^x goes away when you have the AST, because the negative node is the parent of the power node, and the power node should be aware that one of its children isn't a constant. But recognizing it with regexes or the like is a lot trickier.
[Please log in to post a comment]