EDIT3: round 2 -- with updated algorithms, methodology, and results
EDIT2: added Catatafish's method -- we have a new champion!!
EDIT: added solar's method.
EatMoreCheese's thread about triangle rasterizers got me thinking about the different "trifill" methods that have been posted to the BBS—and so, in the spirit of the holiday season, I wrote a small profiler to pit them against each other in a brutal, winner-takes-all competition.
Methodology: I measure the time it takes for each routine to draw the same table of 300 randomly-generated triangles ten times over. Vertex extents are in the range [-50, 178].
CAVEATS: This is not an "apples-to-apples" comparison, or even apples-to-genetically-modified-oranges. For example, scgrn's method draws n-gons (not just triangles) and creamdog's method draws particularly chunky triangles. For personal edification only—no code-shaming intended!
Results:
Let me know if you'd like me to change your entry, or add others!
See round 1 here:
Wow, I just noticed RECTFILL is faster at drawing a line than LINE is. Thanks for posting this.
I currently have my own version as well that seems just a little slower than Gryphon's. I am currently poking through Gryphon's for ideas!
(My INTP function is the same as LERP.)
Oh wow! I got my time down to 2.833 from 4.3 just by changing from memset() to rectfill(), saved a bunch of tokens too!
This is awesome, thanks for making it!
Cool idea Musurca!
In full disclosure, I copied code from Nusan in the first place--I think from his original Space Limit demo. (https://www.lexaloffle.com/bbs/?tid=2734)
Another great learning cart! Cheers to all the contributers!
Muddling over why RECTFILL would be faster than LINE... do you think it is because LINE has an algorithm to draw the best pixels between diagonal points (can't remember what that is called... Bayesian pathfinding?) And RECT or RECTFILL only makes straight lines?
However, I assume that RECTFILL is being used to fill one line at a time in the triangles... I wonder if it has to draw more than one line? Would it make sense to try getting a horizontal line function that could be even faster?
@gcentauri: to put this into perspective—since PICO-8 is a "fantasy console," there's no reason for any built-in function to be slower than another, really, as they all run in a negligible amount of time behind the scenes.
What we're trying to determine are the artificial delays that zep has introduced to simulate the workings of his fantasy hardware. Often these delays make an intuitive sense, like rectfill() being faster than line() because it avoids the overhead of Bresenham's algorithm. But sometimes they seem entirely arbitrary, like rectfill() being faster than memset(), which doesn't make any sense at all unless you imagine that PICO-8 is running on some rather eccentric hardware.
There's really no way to know for sure unless you run a profiler.
Also: yes, rectfill() is generally being used to fill in only one line at a time. The standard "trifill" algorithm entails dividing an arbitrary triangle into two smaller triangles, and then drawing both triangles one column at a time. However—you might be onto something. I wonder if, given the odd speed of rectfill(), it might not make sense to approach drawing large triangles by drawing a large rectangle tangent to all three sides, then drawing the smaller resulting triangles by column...
Here's mine. I'd love to believe the result but I'm not sure if the triangles it renders are visually correct compared to the other versions, but still, perhaps this'll be of use to someone.
edit: code bugfix - I'd forgotten I sorted the points in a different place. It renders correctly now.
Based on the routines from 'The Black Art of 3d Game Programming'. Mistakes were made.
Also thanks for the heads-up on rectfill - I tried that a long time ago & switched to lines as I got faster results, not sure why. No longer the case though - it's about 1.3 seconds slower with line.
In the real-world program I'm using this in, just switching over to rectfill saved about 8% cpu(!). I need as much perf as I can get, and that is definitely one of the bigger wins. Cheers!
To add to musurca's post above, I don't know if it's just an oversight, or part of the design of pico-8 was to have 'secrets to discover' like this, and I guess I've had a lot of fun figuring this out myself but it'd be really great to have an instruction 'cycle' table in the manual, a bit like a simplified version of the ones in old processors' datasheets. (6502 Programmers Manual - see page 234. I'm not suggesting anything as detailed as this one but you get the idea.)
@Catatafish—crazy! you halved Gryphon's time. The result was so dramatic that I thought it might be a mistake myself but I dropped your trifill method into a 3D test cart (not based on Gryphon), and it looked fine aside from a few corner-case artifacts. Nicely done!
As an aside, I took another look at the method that I contributed, and changed my rect() calls to rectfill(). Weirdly this drops my time from ~1.7-2 secs to ~0.4333 secs. While this is great for my overall sanity (I couldn't figure out why other methods were so dramatically outperforming my own, when the approach was roughly the same), it does raise a question: why was rectfill() constructed to be so much faster than all other drawing methods, including rect()?
There may be a rational answer—or it may be an oversight that was introduced in 0.1.10. Either way, having a Pico-8 cycle table as you suggest would be really helpful. The community could put it together with some profiling, but of course the numbers may change drastically over new releases.
Thanks :) What I was aiming for is a tradeoff between speed and token count. You can definitely get faster though...
I should probably point out it's extremely limited compared to the others - for example it doesn't support things like electricgryphon's gorgeous scanline shading, or as you note, visual accuracy (also with resolutions higher than 128x128, gaps are visible between adjacent triangles at certain angles, even with integer math). I have no idea how I would even begin to approach n-gons.
I think I know the artifacts you mean, (the 'thin line' fix you use looks interesting & might resolve one of them, thanks - the extra pixels on corners are probably here to stay though) but now you mention it, it'd be interesting to see a visual fidelity comparison for all these methods.
I plugged each of the functions into this 3D project I'm working on. CPU usage is at bottom right.
Some notes on the use in my engine...
Triangles way off to the left or right are still given to the trifill function for rendering. It's up to the function to ignore it.
Triangles that are effectively vertical lines are never given to the trifill function. If the functions don't handle this properly you wouldn't see it here.
I draw the triangles overlapping by 1 pixel to fill gaps in walls. This may not be necessary depending on the function.
As an aside: In my project itself I have a special quadfill type function that renders walls much more quickly. In this example I'm forcing everything through trifill anyway.
Interestingly, musurca's function (although a tiny bit glitchy looking!) works most efficiently in my example here, never reaching 100% CPU.
For reference, here are the triangles being drawn as wireframes, and CPU usage with no triangles drawn at all.
creamdog:
gryphon:
musurca:
nusan:
scgrn:
solar:
catatafish:
Interesting! Thanks for doing that solar.
I'd been wondering about that quadfill trick after I saw another gif of your engine..
For 3d use I clip triangles a little earlier in the pipeline, but it's a fair point if someone was going to use this as-is.
Adding basic horizontal clipping to my code makes it a little slower than musurca's. Damn.
Aaaah... I have to say this is my first 3D engine thing I've worked on and maybe that's a standard thing to clip the triangles earlier, I'd say that'd make a big difference to the results here where the function doesn't check for it itself.
Wow, great comparison! Thanks, solar. I was surprised by the result, but I could guess at a couple reasons I'm doing better in this benchmark:
— I draw vertical columns instead of horizontal lines, and throw out any columns with x-values outside of the screen boundaries. (Pico-8 really punishes you performance-wise for drawing outside of the canvas.) solar's scene involves a right-to-left pan with geometry laid out horizontally, so it's an advantageous situation for my method.
— I don't use any helper functions, and err on the side of maximizing speed over token count in general.
Also, re: clipping: in my 3D engine, I cull triangles in the following way:
-see if the triangle normal points within 90 degrees of ray from camera to one triangle vertex;
-if so, project the vertices, and only discard the triangle if all three points are behind the camera, or outside of the same horizontal or vertical screen boundary. Otherwise send it to the triangle rasterizer and do any additional clipping there.
My rationale (backed up by some profiling a while ago, I think) was that the most significant bottleneck in a Pico-8 3D engine seemed to be fill rate. As long as you don't draw anything outside of the canvas, you shouldn't need to do any complicated triangle clipping/splitting. But curious if others are approaching this in a different way...
Nice result, I did some tests a while ago with drawing quads. Using idea from FRedShift : following each edge and storing X min and max value in two array along Y. You then just draw a rectfill by array cell. With this technique you can draw convex polygon easily. I'm curious how it would compare to other techniques. If I have some time, I will make a simple test function.
I think each technique as a bias toward small, big, horizontal or vertical triangle, so it could be interesting to test with each constraint.
Good point about the different constraints - now I think about it 95% of my clipping occurs on one vertical plane, and my triangle sizes are generally quite small (around 16x8 - 16x32 pixels per quad).
The vertical rasterization trick is a neat idea, I can see that working well for first person -type scenes.
@solar - oh, no, I wasn't suggesting one way is better or worse; There are many ways to organize the rendering pipeline and it does depend on what you're drawing. I'd say it's a damn good effort for your first go :)
More importantly though, I'm looking forward to playing the games in these .gifs!
I noticed the function called "NuSan" is from my olf Space Limit demo. I did a better version for Alone in Pico. From my test it's a bit slower than Catatafish version, but without the artifacts with very big triangles.
I also did a quick test with the edge method, but it's about twice slower. It should become interesting with polygon though, maybe even quads (just add "makeedge" call per polygon side).
All my test has been done using integer positions.
With some finagling, I have brought the time for my triangle render down to around .37 seconds on average.
I got the most significant speed increase by replacing the lerp functions with uniform additions on every scan-line. (I feel like I should have realized that one sooner...)
Removing the dither code also shaved some time, and allowed me to do something else:
--If triangles are tall and skinny, they are rasterized left to right instead of top to bottom.
Finally, I added some bounds checking to throw out triangles that are completely off the screen.
Code:
(it's a shame that leading tabs/spaces don't show up in the code snippets on the BBS)
Nice ElectricGryphon, your function seems a bit faster. It's about the same code than mine, but your switch of rasterizing direction do a great job. I think there has to be a simple way to factorise the code, and avoid having a function twice as long because of the direction switch.
I've updated the benchmark to "round 2" with the new functions posted here. Please note that I've changed the methodology somewhat (now rendering a table of 300 triangles, ten times over), so your times will change a bit. However I've also added a new metric "tris/sec" which will hopefully remain more consistent if I end up tinkering with the table size again.
While the results average out better now, it's still worth running the benchmark a couple times in a row to see how the results vary based on new conditions.
@NuSan— yeah, grabbed the wrong method in round 1. Sorry about that! Your n-gon method is really interesting though—would like to compare it to scgrn's.
These are amazing, thanks guys.
I'm making a vector editor using a web app to draw vectors so i can push the data into a pico8 cartridge.
Here are the first real tests:
https://twitter.com/gabrielcrowe/status/899220992895184896
Can I use some of these algos to try and get some faster speed out of my render?
Be warned that Zep plans to fix some of the FillRect timing bugs that many of these Tri renders rely on to get their speed in the next release.
The cart is CC-licensed, it's fine to use any of it.*
That tech looks awesome by the way :)
*=probably.
He needs to fix the poke/peek timing bugs as well. You can put a ton of math, including a peek, into a poke's second argument and the poke will still only take 1 cycle total.
I know people will be annoyed if their spiffy fast games stop being fast, but exploiting bugs to run faster than you're supposed to seems counter to the purpose of playing with a limited-spec fantasy console. If you want to run fast, without constraints, just use Löve2D or something.
Felice and mole5000—there's no way to be sure until release, but I would guess that you won't see a dramatic speed drop in many of these trifill methods. Zep has suggested via Twitter (while showing a GIF of the Gryphon 3D engine running at full speed) that in 1.11 "horizontal fills & circles are now cheaper" and that only carts that exploit the "free backwards-rectangles bug" will be slower. Some of the algorithms collected here may occasionally benefit from this bug due to lack of bounds checking, but they do not methodically or intentionally exploit it.
And since Zep is apparently adding support for fill patterns to rectfill(), we may even be able to run a "shaded trifill thunderdome" in the near future (I hope).
Woo, fill patterns? I need to read his twitter more often.
I actually meant to ask if he could give us that. He must be puh-sye-kik.
I've added alternating lines for this demo:
https://twitter.com/gabrielcrowe/status/904421706659495936
I used a few of the examples here for testing.
Hi folks,
I'm late to the party but here is an updated Triangles Benchmark showing tokens size and two new rasterizers in 163 and 335 tokens respectively.
Let me know what you think and if I missed any cool development since the cartridge posted in this thread.
Cheers,
I was also working on my own triangle rasterizer. I started noticing some framerate drops, so I've started measuring performance. But this thread helped me a lot.
I was using memcpy, assuming that it was the fastest, but it seems like classic optimization knowledge does not apply to pico. I got a 20% speedup from replacing memset with rectfill, and I also don't have to sacrifice resolution, though the patterns I was able to make with memset are pretty rad!!!!!
And it looks like a lot of other assumptions are wrong, so basically I can rewrite half of my rasterizer and get a lot better performance, for example for memset I needed some code to align it to the right memory adress, I don't need that part anymore as rectfill works directly on screen.
I made my own tri function : https://www.lexaloffle.com/bbs/?tid=49930
[Please log in to post a comment]