Log In  


Cart #rld_conway-2 | 2021-09-16 | Code ▽ | Embed ▽ | License: CC4-BY-NC-SA
12


With the release of pico 0.2.3 the code slowed down a bit and exceeded 100% cpu, so I lowered the maximum fps from 420 to 390.

Controls: change color with left and right, change speed with up and down.
You can change the initial board by changing the spritesheet. Use colors 0 and 7.

The board is stored as a bitmap, 1 bit per cell at address 0x4300.

Updating the board:
32 cells are processed in parallel using bitwise operations.
The bits are added together using the following functions:

  function add2(a, b)
    sum = a ^^ b
    carry = a & b
    return sum, carry
  end

  function add3(a, b, c)
    tmp = a ^^ b
    sum = tmp ^^ c
    carry = a & b | tmp & c
    return sum, carry
  end

Let's use the following map to refer to the neighboring cells:

  abc -- above
  d.e -- current
  fgh -- below

We get the sum of b+g using add2(above, below).
We get the sum of a+d+f and the sum of c+e+h by using add3(above, current, below). To access the sum of a+d+f or c+e+h we shift the result left or right by 1.
Now we have the 3 sums as 2 digit binary numbers:

  sum   bit0 bit1
  b+g:    s0   c0
  a+d+f:  s1   c1
  c+e+h:  s2   c2

We add these numbers one column at a time to get bit0 and bit1 of the final sum:

  bit0, c = add3(s0, s1, s2)
  sum, carry = add3(c0, c1, c2)
  bit1 = sum ^^ c

A cell is alive in the next generation if:

  1. it has 3 neighbors, or
  2. it's alive now and has 2 neighbors.

In the 1st case: 3=11 in binary so the sum is 3 if bit0=1 and bit1=1 and carry=0. The formula is bit0 & bit1 & ~carry
Similarly in the 2nd case the formula is: current_cell & ~bit0 & bit1 & ~carry
The result is: bit0 & bit1 & ~carry | current_cell & ~bit0 & bit1 & ~carry
This can be simplified to (bit0 | current_cell & ~bit0) & bit1 & ~carry, which is equal to (bit0 | current_cell) & bit1 & ~carry

To speed up the main loop the add2 and add3 functions are inlined, redundant computations are removed and the loop is unrolled 4 times.

Drawing:
The board is expanded from 8 bits to 32 bits using a lookup table and poked to screen memory.

12


FAST


excellent binary operation use case 👌

question: why using 0x4300 region and not a plain table?
that’s faster to access than $(address+8).


I tested both table and memory, and using memory is faster.
I guess poke4(address,a,b,c,d) is faster than table[i],table[i+1],table[i+2],table[i+3]=a,b,c,d. Also when drawing I need to access individual bytes, with a table I would have to shift the 32 bit numbers and mask by 0xff to access the bytes.


got it - haven’t saw the individual poke when glancing over code!


1

Cart #rld_conway_blur-0 | 2021-09-16 | Code ▽ | Embed ▽ | No License
1


This is a version that runs at 210 fps, but has smoother animation. Use up or down to change speed.



[Please log in to post a comment]