Log In  


Just a few notes from playing around with userdata. These notes assume 8M VM cycles/sec and large enough arrays to avoid substantial overhead and fully realize economies of scale.

  • Fast ops - add/mul/copy/etc. - cost 1/16 cycle.
  • Slow ops - div/convert/etc. - cost 1/4 cycle.
  • matmul is charged as a fast op based on the size of the output matrix. I'm a little suspicious that the answer seems to be so simple, so I'm wondering if I missed something.
  • copy and memmap/memcpy are approximately the same speed for 64 bit datatypes. For smaller datatypes, memcpy is proportionally faster, though of course you then have to manage strides/spans yourself. memcpy should also enable reinterpret_cast type shenanigans.
  • There is substantial overhead for small spans. If you use spans of length 1 you pay 1/4 cycle/span, same as a slow op. It looks like this may be a flat cost per span, but I'm not sure. Using the full/strided forms of the ops does not seem to have noticeable additional costs beyond the per-span cost.
  • For full-screen rendering, you have about 1 cycle/pixel at 480x270x60Hz. This includes whatever scaling/conversion you need to do at the end of the process. So realistically, you'll get in the neighborhood of 10 additions/multiplications per pixel. Exact numbers depend on whether you need a divide at the end, and whether or not you can work in u8.
  • userdata flat access w/ locals seems to cost 1 cycle/element including the assignment.
  • userdata get is 1/4 cycle / element at scale ... but each explicit assignment will cost you 1 cycle on top of this.
  • userdata set is 1 cycle / element at scale.

There also seems to be some interesting behavior happening where multivals, even very large multivals, do not seem to noticeably increase CPU usage when passed to pack or set. While I'm enjoying taking advantage of this for f64 to u8 conversions at the same cost as convert, I'm worried this might not last.

7



[Please log in to post a comment]