With the release of pico 0.2.3 the code slowed down a bit and exceeded 100% cpu, so I lowered the maximum fps from 420 to 390.
Controls: change color with left and right, change speed with up and down.
You can change the initial board by changing the spritesheet. Use colors 0 and 7.
The board is stored as a bitmap, 1 bit per cell at address 0x4300.
Updating the board:
32 cells are processed in parallel using bitwise operations.
The bits are added together using the following functions:
function add2(a, b) sum = a ^^ b carry = a & b return sum, carry end function add3(a, b, c) tmp = a ^^ b sum = tmp ^^ c carry = a & b | tmp & c return sum, carry end |
Let's use the following map to refer to the neighboring cells:
abc -- above d.e -- current fgh -- below |
We get the sum of b+g using add2(above, below).
We get the sum of a+d+f and the sum of c+e+h by using add3(above, current, below). To access the sum of a+d+f or c+e+h we shift the result left or right by 1.
Now we have the 3 sums as 2 digit binary numbers:
sum bit0 bit1 b+g: s0 c0 a+d+f: s1 c1 c+e+h: s2 c2 |
We add these numbers one column at a time to get bit0 and bit1 of the final sum:
bit0, c = add3(s0, s1, s2) sum, carry = add3(c0, c1, c2) bit1 = sum ^^ c |
A cell is alive in the next generation if:
- it has 3 neighbors, or
- it's alive now and has 2 neighbors.
In the 1st case: 3=11 in binary so the sum is 3 if bit0=1 and bit1=1 and carry=0. The formula is bit0 & bit1 & ~carry
Similarly in the 2nd case the formula is: current_cell & ~bit0 & bit1 & ~carry
The result is: bit0 & bit1 & ~carry | current_cell & ~bit0 & bit1 & ~carry
This can be simplified to (bit0 | current_cell & ~bit0) & bit1 & ~carry, which is equal to (bit0 | current_cell) & bit1 & ~carry
To speed up the main loop the add2 and add3 functions are inlined, redundant computations are removed and the loop is unrolled 4 times.
Drawing:
The board is expanded from 8 bits to 32 bits using a lookup table and poked to screen memory.
excellent binary operation use case 👌
question: why using 0x4300 region and not a plain table?
that’s faster to access than $(address+8).
I tested both table and memory, and using memory is faster.
I guess poke4(address,a,b,c,d) is faster than table[i],table[i+1],table[i+2],table[i+3]=a,b,c,d. Also when drawing I need to access individual bytes, with a table I would have to shift the 32 bit numbers and mask by 0xff to access the bytes.
got it - haven’t saw the individual poke when glancing over code!
This is a version that runs at 210 fps, but has smoother animation. Use up or down to change speed.
[Please log in to post a comment]