This code uses 11% CPU according to Pico-8, but if I delete the v[0]=1
lines in the if false
block it uses 7% CPU:
v={} ::_:: if false then v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 v[0]=1 end for i=0,1023 do v[0]=5 v[0]=5 v[0]=5 v[0]=5 end flip() goto _ |
Other observations:
- If I put the
v[0]=1
lines in a for loop with a huge iteration count this code uses 7% CPU. (No bug.) - If I put this code into
_update()
or_draw()
it uses 7% CPU. (No bug.) - Adding / removing
v[0]=5
lines bumps CPU by roughly 3% per line. - Replacing
v[0]
withv[1]
makes no difference. - Pico-8 does act on this CPU info, the contents of the
if false
block will affect whether Pico-8 visibly stutters if I add draw code and the non-executing lines push CPU over 100%. - If I add a table
u
and dou[0]=5
in the second loop the bug still happens, no change in behavior. - The number of top loop
v[0]=1
assignments only seems to matter between about 8 and 16 - the CPU usage seems to saturate outside of that range. - If I make
v
local, this behavior persists, but it seems to take morev[0]=1
lines to trigger it. - I can produce this behavior on both 0.2.5e and 0.2.5c, @freds72 reports also being able to repro on 0.2.4c
Since this only happens outside of _update()
(I think??) I assume this isn't a big issue for most carts, but for tweetcarts etc. it does seem fairly important.
Oh ! That is a serious finding, @luchak !
When I wrote S2 years ago I clearly told it to go AROUND any statements that were false so no matter how big the code was - if the condition was false, it would go around in all cases.
Star to support fix of this bug.
Edit: Hmm, my original post below the line misunderstood what the code was doing. I'm leaving it there so people know for sure I can be an idiot sometimes. ;)
I think it might be that the lua compiler is smart enough to recognize an empty block (if false then end) and elide the condition test entirely, yet stupid enough to keep a constant test condition as long as the block isn't empty.
Which in turn might be just enough to get you across (or not across) the threshold of the first audio interrupt I described in my otherwise-misguided response below.
You're unaware of some confounding factors here.
You might be using up enough cycles that PICO-8 is doing the equivalent of an VSYNC interrupt during your timing to update the screen. If you want to time something accurately, you have to be aware of these.
IIRC, there is also an interrupt that fires off to update audio, which I think happens at 120Hz, but DO NOT quote me on that because it's been a very long time since I tested it and A) I might be misremembering and B) the virtual hardware might have changed since then.
Ideally you do your timing in a way that fits between possible interrupts. It's hard to force timing but you can at least sync to VSYNC by using flip() before your timed code. Then maybe time some dummy code that you expand linearly until the time jumps unexpectedly. Then you know how much time you can spend between the flip() and the next interrupt with just your own code running.
You definitely don't want to do timing within _update(), because you don't have any idea when the last VSYNC was.
I'm not sure, but holdframe() might disable the VSYNC interrupt. Or it might not. You'd need to test. But you'd still need to deal with the audio interrupts, which I dunno if you can disable. Maybe if volume is set to 0, but I doubt it.
ok, I think I kind of understand what's going on. Unfortunately there is no clear fix for this.
Lua doesn't do dead code elimination, and vm instructions for the the dead code are still generated. This is affecting the code generation at the end of the program.
When there are too many table lookups (including variable references) within a function, the code generator runs out of "table registers" (internally similar to locals, I think?) and starts emitting more costly vm instructions to do global table lookups.
This is the reason that swapping the order of live / dead code matters, and why the effect doesn't appear when using a _draw() callback. PICO-8 injects a lot of code at the start of a program, so code generation at the top level becomes more easily starved of registers.
Changing the structure of the function tree can improve this, but push the problem somewhere else. For example, wrapping userland code in a function does "solve" the example shown here (and makes some of my carts a bit faster!), but also makes other tweetcarts run a little slower because the resulting extra upvalue lookups is a net loss (I found a couple that were 2~3% slower).
So, I think I'm going to put this in the too hard/dangerous basket.
@Felice there are audio interrupts, but they don't interact with stat(1) at all. They might cause the vm to stall in the real world (it happens on a separate thread and can block the vm's thread until mixing is finished), but that's just wall time only, and number of executed vm instructions is the same.
@zep Thank you for the update! That's an interesting resolution for sure. I was wondering why I wasn't able to see anything notable on https://www.luac.nl/ while playing around with variations of the code I posted above, but lack of the injected PICO-8 preamble seems like that might explain it.
I only know there are audio interrupts because they DO interact with stat(1), or at least they did in the past. You may have fixed that in the past two years or so, but if you're not sure you did, then you should really test that to be sure you're actually right about the two not interacting.
Never mind, I retract what I said. Either my old testing was severely flawed or my new testing shows you did indeed hide the audio interrupt cycles, which is great since I always thought the audio chip should be running in parallel. Yay? :)
[Please log in to post a comment]