Log In  

Cart #34502 | 2016-12-30 | Code ▽ | Embed ▽ | License: CC4-BY-NC-SA
26

EDIT3: round 2 -- with updated algorithms, methodology, and results
EDIT2: added Catatafish's method -- we have a new champion!!
EDIT: added solar's method.

EatMoreCheese's thread about triangle rasterizers got me thinking about the different "trifill" methods that have been posted to the BBS—and so, in the spirit of the holiday season, I wrote a small profiler to pit them against each other in a brutal, winner-takes-all competition.

Methodology: I measure the time it takes for each routine to draw the same table of 300 randomly-generated triangles ten times over. Vertex extents are in the range [-50, 178].

CAVEATS: This is not an "apples-to-apples" comparison, or even apples-to-genetically-modified-oranges. For example, scgrn's method draws n-gons (not just triangles) and creamdog's method draws particularly chunky triangles. For personal edification only—no code-shaming intended!

Results:


Round 2: electricgryphon retakes the crown with a blistering ~5600 tris/sec, followed by Catatafish in a close second, and leaving musurca and NuSan tied for third (on average).

Round 1: Catatafish's method takes first place with an absolutely insane ~0.4 secs, followed by the method from the Gryphon 3D engine in second place at a very stable ~0.8 secs.

A lot of interesting discoveries—among them that rectfill() beats rect(), line(), AND memset().

Let me know if you'd like me to change your entry, or add others!

See round 1 here:


Cart #34326 | 2016-12-28 | Code ▽ | Embed ▽ | License: CC4-BY-NC-SA
26

P#34266 2016-12-26 21:36 ( Edited 2018-07-05 08:24)

That's super useful information!! Thanks for that! :D

P#34267 2016-12-26 22:36 ( Edited 2016-12-27 03:36)

Wow, I just noticed RECTFILL is faster at drawing a line than LINE is. Thanks for posting this.

I currently have my own version as well that seems just a little slower than Gryphon's. I am currently poking through Gryphon's for ideas!

(My INTP function is the same as LERP.)

function trifill(x1,y1,x2,y2,x3,y3,col)

--[[ @todo works slower for me
 local x1=band(x1,0xffff)
 local y1=band(y1,0xffff)
 local x2=band(x2,0xffff)
 local y2=band(y2,0xffff)
 local x3=band(x3,0xffff)
 local y3=band(y3,0xffff)
--]]

---[[
 local x1=flr(x1)
 local y1=flr(y1)
 local x2=flr(x2)
 local y2=flr(y2)
 local x3=flr(x3)
 local y3=flr(y3)
--]]

 local minx=min(min(x1,x2),x3)
 local maxx=max(max(x1,x2),x3)

 local miny=min(min(y1,y2),y3)
 local maxy=max(max(y1,y2),y3)

 if maxx<0
  or minx>127
  or maxy<0
  or miny>127
 then
  return
 end

 if(col~=nil) color(col)

--[[ in practice, a rare case, a waste of some logic
 if miny==maxy or minx==maxx then
  line(
   minx,
   miny,
   maxx,
   maxy
  )
--circfill(minx,miny,2,8)
  return
 end
--]]

 -- order points such that 1 is top-most, 2 is left-most, 3 is right-most
 local c={}
 if miny==y2 then
  add(c,{x2,y2})
  if x1<x3 then
   add(c,{x1,y1})
   add(c,{x3,y3})
  else
   add(c,{x3,y3})
   add(c,{x1,y1})
  end
 elseif miny==y3 then
  add(c,{x3,y3})
  if x1<x2 then
   add(c,{x1,y1})
   add(c,{x2,y2})
  else
   add(c,{x2,y2})
   add(c,{x1,y1})
  end
 else
  add(c,{x1,y1})
  if x2<x3 then
   add(c,{x2,y2})
   add(c,{x3,y3})
  else
   add(c,{x3,y3})
   add(c,{x2,y2})
  end
 end

 local w=maxx-minx
 local h=maxy-miny

 --range of 0 to 1
 -- @todo putting stuff in range
 -- of 0 to 1 may not even
 -- be necessary (just hurt my
 -- head less while making it)

 local x1n=(c[1][1]-minx)/w

 local y2n=(c[2][2]-miny)/h
 local x2n=(c[2][1]-minx)/w

 local y3n=(c[3][2]-miny)/h
 local x3n=(c[3][1]-minx)/w

 for ly=max(miny,0),min(maxy,127) do
  lyn=(ly-miny)/h
  if c[2][2]<c[3][2] then
   -- bottom left point is raised
   if lyn<y2n then
    --area above raised point
    lminx=intp(x1n,x2n,lyn*(1/y2n))
   else
    --area below raised point
    lminx=intp(x2n,x3n,(lyn-y2n)*(1/(1-y2n)))
   end
   lmaxx=intp(x1n,x3n,lyn)
  else
   -- bottom right point is raised
   if lyn<y3n then
    --area above raised point
    lmaxx=intp(x1n,x3n,lyn*(1/y3n))
   else
    --area below raised point
    lmaxx=intp(x3n,x2n,(lyn-y3n)*(1/(1-y3n)))
   end
   lminx=intp(x1n,x2n,lyn)
  end
  rectfill(
   lminx*w+minx,
   ly,
   lmaxx*w+minx,
   ly
  )
--[[
  line(
   lminx*w+minx,
   ly,
   lmaxx*w+minx,
   ly
  )
--]]
 end

end

P#34268 2016-12-26 22:54 ( Edited 2016-12-27 03:55)

Oh wow! I got my time down to 2.833 from 4.3 just by changing from memset() to rectfill(), saved a bunch of tokens too!

This is awesome, thanks for making it!

P#34277 2016-12-27 02:18 ( Edited 2016-12-27 07:18)

Cool idea Musurca!

In full disclosure, I copied code from Nusan in the first place--I think from his original Space Limit demo. (https://www.lexaloffle.com/bbs/?tid=2734)

P#34278 2016-12-27 02:23 ( Edited 2016-12-27 07:23)

Another great learning cart! Cheers to all the contributers!

Muddling over why RECTFILL would be faster than LINE... do you think it is because LINE has an algorithm to draw the best pixels between diagonal points (can't remember what that is called... Bayesian pathfinding?) And RECT or RECTFILL only makes straight lines?

However, I assume that RECTFILL is being used to fill one line at a time in the triangles... I wonder if it has to draw more than one line? Would it make sense to try getting a horizontal line function that could be even faster?

P#34303 2016-12-27 09:53 ( Edited 2016-12-27 14:53)

@gcentauri: to put this into perspective—since PICO-8 is a "fantasy console," there's no reason for any built-in function to be slower than another, really, as they all run in a negligible amount of time behind the scenes.

What we're trying to determine are the artificial delays that zep has introduced to simulate the workings of his fantasy hardware. Often these delays make an intuitive sense, like rectfill() being faster than line() because it avoids the overhead of Bresenham's algorithm. But sometimes they seem entirely arbitrary, like rectfill() being faster than memset(), which doesn't make any sense at all unless you imagine that PICO-8 is running on some rather eccentric hardware.

There's really no way to know for sure unless you run a profiler.

Also: yes, rectfill() is generally being used to fill in only one line at a time. The standard "trifill" algorithm entails dividing an arbitrary triangle into two smaller triangles, and then drawing both triangles one column at a time. However—you might be onto something. I wonder if, given the odd speed of rectfill(), it might not make sense to approach drawing large triangles by drawing a large rectangle tangent to all three sides, then drawing the smaller resulting triangles by column...

P#34310 2016-12-27 14:46 ( Edited 2016-12-29 18:15)

Here's mine. I'd love to believe the result but I'm not sure if the triangles it renders are visually correct compared to the other versions, but still, perhaps this'll be of use to someone.
edit: code bugfix - I'd forgotten I sorted the points in a different place. It renders correctly now.
Based on the routines from 'The Black Art of 3d Game Programming'. Mistakes were made.

-- expects an array in the form { x0, y0, x1, y1, x2, y2, color }
function gfx_draw(vbuf)
 --for n=1,tri_count do
 -- local vbuf = vertexbuffer[n]
  local v0x, v0y, v1x, v1y, v2x, v2y, ps = vbuf[1], vbuf[2], vbuf[3], vbuf[4], vbuf[5], vbuf[6]

    if v1y<v0y then
     v0x,v1x = v1x,v0x
     v0y,v1y = v1y,v0y
    end

    if v2y<v0y then
     v0x,v2x = v2x,v0x
     v0y,v2y = v2y,v0y
    end

    if v2y<v1y then
     v1x,v2x = v2x,v1x
     v1y,v2y = v2y,v1y
    end

  color(vbuf[7])
   if v0y == v1y then -- flat top
    rasterizetri_top(v0x,v0y,v1x,v2x,v2y)
   elseif v1y == v2y then -- flat bottom
    rasterizetri_bottom(v0x,v0y,v1x,v2x,v2y)
   else -- general case
    local newx = v0x + ((v1y-v0y)*(v2x-v0x)/(v2y-v0y))
    rasterizetri_bottom(v0x,v0y,newx,v1x,v1y)  
    rasterizetri_top(v1x,v1y,newx,v2x,v2y)
   end -- triangle cases
 --end -- triangle loop

end

function rasterizetri_top(v0x,v0y, v1x, v2x,v2y)
 if (v1x<v0x) v0x, v1x = v1x, v0x
 local height=v2y-v0y
 local dx_left, dx_right = (v2x-v0x)/height, (v2x-v1x)/height
 if v0y<0 then
  v0x-=dx_left*v0y
  v1x-=dx_right*v0y
  v0y=0
 end
 if (v2y>128) v2y=128
 for y=v0y,v2y do
  rectfill(v0x,y,v1x,y)
  v0x+=dx_left
  v1x+=dx_right
 end
end

function rasterizetri_bottom(v0x,v0y, v1x,v2x,v2y)
 if (v2x<v1x) v1x, v2x = v2x, v1x
 local height=v2y-v0y
 local dx_left, dx_right, xend = (v1x-v0x)/height, (v2x-v0x)/height, v0x
 if v0y<0 then
  v0x -=dx_left*v0y
  xend-=dx_right*v0y
  v0y=0
 end
 if (v2y>128) v2y=128
 for y=v0y,v2y do
  rectfill(v0x,y,xend,y)
  v0x+=dx_left
  xend+=dx_right
 end
end

-- for benchmark integration:
ntable={}
tritable={}
flattable={} -- added
for i=1,100 do
 tritable[i]={{flr(rnd(256)-rnd(128)),flr(rnd(256)-rnd(128)),
              flr(rnd(256)-rnd(128)),flr(rnd(256)-rnd(128)),
              flr(rnd(256)-rnd(128)),flr(rnd(256)-rnd(128))},flr(rnd(15))+1}
 local t=tritable[i][1]
 ntable[i]={{t[1],t[2]},{t[3],t[4]},{t[5],t[6]},tritable[i][2]}
 flattable[i] = {t[1], t[2], t[3], t[4], t[5], t[6], tritable[i][2] }  -- added
end

function my_raster(a)
 --local v=a[1]
 gfx_draw(a)
end

a7=profile(my_raster,flattable)

Also thanks for the heads-up on rectfill - I tried that a long time ago & switched to lines as I got faster results, not sure why. No longer the case though - it's about 1.3 seconds slower with line.

In the real-world program I'm using this in, just switching over to rectfill saved about 8% cpu(!). I need as much perf as I can get, and that is definitely one of the bigger wins. Cheers!

To add to musurca's post above, I don't know if it's just an oversight, or part of the design of pico-8 was to have 'secrets to discover' like this, and I guess I've had a lot of fun figuring this out myself but it'd be really great to have an instruction 'cycle' table in the manual, a bit like a simplified version of the ones in old processors' datasheets. (6502 Programmers Manual - see page 234. I'm not suggesting anything as detailed as this one but you get the idea.)

P#34324 2016-12-27 18:38 ( Edited 2016-12-28 01:22)

@Catatafish—crazy! you halved Gryphon's time. The result was so dramatic that I thought it might be a mistake myself but I dropped your trifill method into a 3D test cart (not based on Gryphon), and it looked fine aside from a few corner-case artifacts. Nicely done!

As an aside, I took another look at the method that I contributed, and changed my rect() calls to rectfill(). Weirdly this drops my time from ~1.7-2 secs to ~0.4333 secs. While this is great for my overall sanity (I couldn't figure out why other methods were so dramatically outperforming my own, when the approach was roughly the same), it does raise a question: why was rectfill() constructed to be so much faster than all other drawing methods, including rect()?

There may be a rational answer—or it may be an oversight that was introduced in 0.1.10. Either way, having a Pico-8 cycle table as you suggest would be really helpful. The community could put it together with some profiling, but of course the numbers may change drastically over new releases.

P#34330 2016-12-27 20:38 ( Edited 2016-12-28 21:49)

Thanks :) What I was aiming for is a tradeoff between speed and token count. You can definitely get faster though...
I should probably point out it's extremely limited compared to the others - for example it doesn't support things like electricgryphon's gorgeous scanline shading, or as you note, visual accuracy (also with resolutions higher than 128x128, gaps are visible between adjacent triangles at certain angles, even with integer math). I have no idea how I would even begin to approach n-gons.

I think I know the artifacts you mean, (the 'thin line' fix you use looks interesting & might resolve one of them, thanks - the extra pixels on corners are probably here to stay though) but now you mention it, it'd be interesting to see a visual fidelity comparison for all these methods.

P#34339 2016-12-27 22:50 ( Edited 2016-12-28 21:35)

I plugged each of the functions into this 3D project I'm working on. CPU usage is at bottom right.

Some notes on the use in my engine...

Triangles way off to the left or right are still given to the trifill function for rendering. It's up to the function to ignore it.

Triangles that are effectively vertical lines are never given to the trifill function. If the functions don't handle this properly you wouldn't see it here.

I draw the triangles overlapping by 1 pixel to fill gaps in walls. This may not be necessary depending on the function.

As an aside: In my project itself I have a special quadfill type function that renders walls much more quickly. In this example I'm forcing everything through trifill anyway.

Interestingly, musurca's function (although a tiny bit glitchy looking!) works most efficiently in my example here, never reaching 100% CPU.

For reference, here are the triangles being drawn as wireframes, and CPU usage with no triangles drawn at all.

creamdog:

gryphon:

musurca:

nusan:

scgrn:

solar:

catatafish:

P#34395 2016-12-28 19:48 ( Edited 2016-12-29 02:20)

Interesting! Thanks for doing that solar.
I'd been wondering about that quadfill trick after I saw another gif of your engine..

For 3d use I clip triangles a little earlier in the pipeline, but it's a fair point if someone was going to use this as-is.
Adding basic horizontal clipping to my code makes it a little slower than musurca's. Damn.

P#34403 2016-12-28 22:33 ( Edited 2016-12-29 03:33)

Aaaah... I have to say this is my first 3D engine thing I've worked on and maybe that's a standard thing to clip the triangles earlier, I'd say that'd make a big difference to the results here where the function doesn't check for it itself.

P#34406 2016-12-28 22:42 ( Edited 2016-12-29 03:42)

Wow, great comparison! Thanks, solar. I was surprised by the result, but I could guess at a couple reasons I'm doing better in this benchmark:

— I draw vertical columns instead of horizontal lines, and throw out any columns with x-values outside of the screen boundaries. (Pico-8 really punishes you performance-wise for drawing outside of the canvas.) solar's scene involves a right-to-left pan with geometry laid out horizontally, so it's an advantageous situation for my method.
— I don't use any helper functions, and err on the side of maximizing speed over token count in general.

Also, re: clipping: in my 3D engine, I cull triangles in the following way:
-see if the triangle normal points within 90 degrees of ray from camera to one triangle vertex;
-if so, project the vertices, and only discard the triangle if all three points are behind the camera, or outside of the same horizontal or vertical screen boundary. Otherwise send it to the triangle rasterizer and do any additional clipping there.

My rationale (backed up by some profiling a while ago, I think) was that the most significant bottleneck in a Pico-8 3D engine seemed to be fill rate. As long as you don't draw anything outside of the canvas, you shouldn't need to do any complicated triangle clipping/splitting. But curious if others are approaching this in a different way...

P#34409 2016-12-29 00:03 ( Edited 2016-12-29 10:17)

Nice result, I did some tests a while ago with drawing quads. Using idea from FRedShift : following each edge and storing X min and max value in two array along Y. You then just draw a rectfill by array cell. With this technique you can draw convex polygon easily. I'm curious how it would compare to other techniques. If I have some time, I will make a simple test function.
I think each technique as a bias toward small, big, horizontal or vertical triangle, so it could be interesting to test with each constraint.

P#34420 2016-12-29 04:41 ( Edited 2016-12-29 09:41)

Good point about the different constraints - now I think about it 95% of my clipping occurs on one vertical plane, and my triangle sizes are generally quite small (around 16x8 - 16x32 pixels per quad).

The vertical rasterization trick is a neat idea, I can see that working well for first person -type scenes.

@solar - oh, no, I wasn't suggesting one way is better or worse; There are many ways to organize the rendering pipeline and it does depend on what you're drawing. I'd say it's a damn good effort for your first go :)

More importantly though, I'm looking forward to playing the games in these .gifs!

P#34453 2016-12-29 16:20 ( Edited 2016-12-30 01:02)

I noticed the function called "NuSan" is from my olf Space Limit demo. I did a better version for Alone in Pico. From my test it's a bit slower than Catatafish version, but without the artifacts with very big triangles.

function steptri(x1,y1,x2,y2,x3,y3,c)

    if(y2<y1) then
        if(y3<y2) then
            y1,y3=y3,y1
            x1,x3=x3,x1
        else
            y1,y2=y2,y1
            x1,x2=x2,x1
        end
    else
        if(y3<y1) then
            y1,y3=y3,y1
            x1,x3=x3,x1
        end
    end

    y1 += 0.001 -- offset to avoid divide per 0

    local miny = min(y2,y3)
    local maxy = max(y2,y3)

    local fx = x2
    if(y2<y3) then
        fx = x3
    end

    local d12 = (y2-y1)
    if(d12 != 0) then
        d12 = 1.0/d12
    end
    local d13 = (y3-y1)
    if(d13 != 0) then
        d13 = 1.0/d13
    end
    local cl_y1 = clip(y1)
    local cl_miny = clip(miny)
    local cl_maxy = clip(maxy)

    local steps = (x3-x1) * d13
    local stepe = (x2-x1) * d12

    local sx = steps*(cl_y1-y1)+x1
    local ex = stepe*(cl_y1-y1)+x1

    for y=cl_y1,cl_miny do
        rectfill(sx,y,ex,y,c)
        sx += steps
        ex += stepe
    end

    sx = steps*(miny-y1)+x1
    ex = stepe*(miny-y1)+x1

    local df = (maxy-miny)
    if(df != 0) df = 1.0/df

    local step2s = (fx-sx) * df
    local step2e = (fx-ex) * df

    local sx2 = sx + step2s*(cl_miny-miny)
    local ex2 = ex + step2e*(cl_miny-miny)

    for y=cl_miny,cl_maxy do
        rectfill(sx2,y,ex2,y,c)
        sx2 += step2s
        ex2 += step2e
    end
end


I also did a quick test with the edge method, but it's about twice slower. It should become interesting with polygon though, maybe even quads (just add "makeedge" call per polygon side).

function makeedge(edgeleft,edgeright,x1,y1,x2,y2)
    local d1 = 0
    if (x1!=x2) then
        d1 = (x2-x1)/(y2-y1)
    end
    local ny1 = clip(y1)
    local ny2 = clip(y2)
    if (y1<y2) then
        for i=ny1,ny2 do
            edgeright[i] = x1 + (i-y1)*d1
        end
    else
        for i=ny2,ny1 do
            edgeleft[i] = x1 + (i-y1)*d1
        end
    end
end

function edgetri(x1,y1,x2,y2,x3,y3,c)
    local edgeleft = {}
    local edgeright = {}

    makeedge(edgeleft,edgeright,x1,y1,x2,y2)
    makeedge(edgeleft,edgeright,x2,y2,x3,y3)
    makeedge(edgeleft,edgeright,x3,y3,x1,y1)

    local miny = min(min(y1,y2),y3)
    local maxy = max(max(y1,y2),y3)
    miny = max(miny,0)
    maxy = min(maxy,128)

    for i=miny,maxy do
        rectfill(edgeleft[i],i,edgeright[i],i,c)
    end
end


All my test has been done using integer positions.

P#34473 2016-12-29 19:35 ( Edited 2016-12-30 00:35)

With some finagling, I have brought the time for my triangle render down to around .37 seconds on average.

I got the most significant speed increase by replacing the lerp functions with uniform additions on every scan-line. (I feel like I should have realized that one sooner...)

Removing the dither code also shaved some time, and allowed me to do something else:
--If triangles are tall and skinny, they are rasterized left to right instead of top to bottom.

Finally, I added some bounds checking to throw out triangles that are completely off the screen.

Code:

function solid_trifill_v3( x1,y1,x2,y2,x3,y3, color1)

          local min_x=min(x1,min(x2,x3))
         if(min_x>127)return
          local max_x=max(x1,max(x2,x3))
         if(max_x<0)return
          local min_y=min(y1,min(y2,y3))
         if(min_y>127)return
          local max_y=max(y1,max(y2,y3))
         if(max_y<0)return

          local x1=band(x1,0xffff)
          local x2=band(x2,0xffff)
          local y1=band(y1,0xffff)
          local y2=band(y2,0xffff)
          local x3=band(x3,0xffff)
          local y3=band(y3,0xffff)

          local width=min(127,max_x)-max(0,min_x)
          local height=min(127,max_y)-max(0,min_y)

    if(width>height)then --wide triangle  
          local nsx,nex
          --sort y1,y2,y3
          if(y1>y2)then
            y1,y2=y2,y1
            x1,x2=x2,x1
          end

          if(y1>y3)then
            y1,y3=y3,y1
            x1,x3=x3,x1
          end

          if(y2>y3)then
            y2,y3=y3,y2
            x2,x3=x3,x2          
          end

         if(y1!=y2)then  
            local delta_sx=(x3-x1)/(y3-y1)
            local delta_ex=(x2-x1)/(y2-y1)

            if(y1>0)then
                nsx=x1
                nex=x1
                min_y=y1
            else --top edge clip
                nsx=x1-delta_sx*y1
                nex=x1-delta_ex*y1
                min_y=0
            end

            max_y=min(y2,128)

            for y=min_y,max_y-1 do

            rectfill(nsx,y,nex,y,color1)
            nsx+=delta_sx
            nex+=delta_ex
            end

        else --where top edge is horizontal
            nsx=x1
            nex=x2
        end

        if(y3!=y2)then
            local delta_sx=(x3-x1)/(y3-y1)
            local delta_ex=(x3-x2)/(y3-y2)

            min_y=y2
            max_y=min(y3,128)
            if(y2<0)then
                nex=x2-delta_ex*y2
                nsx=x1-delta_sx*y1
                min_y=0
            end

             for y=min_y,max_y do
                rectfill(nsx,y,nex,y,color1)
                nex+=delta_ex
                nsx+=delta_sx
             end

        else --where bottom edge is horizontal
            rectfill(nsx,y3,nex,y3,color1)
        end
    else --tall triangle -----------------------------------<><>----------------
          local nsy,ney

          --sort x1,x2,x3
          if(x1>x2)then
            x1,x2=x2,x1
            y1,y2=y2,y1
          end

          if(x1>x3)then
            x1,x3=x3,x1
            y1,y3=y3,y1
          end

          if(x2>x3)then
            x2,x3=x3,x2
            y2,y3=y3,y2          
          end

         if(x1!=x2)then 
            local delta_sy=(y3-y1)/(x3-x1)
            local delta_ey=(y2-y1)/(x2-x1)

            if(x1>0)then
                nsy=y1
                ney=y1
                min_x=x1
            else --top edge clip
                nsy=y1-delta_sy*x1
                ney=y1-delta_ey*x1
                min_x=0
            end

            max_x=min(x2,128)

            for x=min_x,max_x-1 do

            rectfill(x,nsy,x,ney,color1)
            nsy+=delta_sy
            ney+=delta_ey
            end

        else --where top edge is horizontal
            nsy=y1
            ney=y2
        end

            if(x3!=x2)then
            local delta_sy=(y3-y1)/(x3-x1)
            local delta_ey=(y3-y2)/(x3-x2)

            min_x=x2
            max_x=min(x3,128)
            if(x2<0)then
                ney=y2-delta_ey*x2
                nsy=y1-delta_sy*x1
                min_x=0
            end

             for x=min_x,max_x do

                rectfill(x,nsy,x,ney,color1)
                ney+=delta_ey
                nsy+=delta_sy
             end

          else --where bottom edge is horizontal

                rectfill(x3,nsy,x3,ney,color1)

          end

    end
end


(it's a shame that leading tabs/spaces don't show up in the code snippets on the BBS)

P#34486 2016-12-30 02:52 ( Edited 2016-12-30 07:52)

Nice ElectricGryphon, your function seems a bit faster. It's about the same code than mine, but your switch of rasterizing direction do a great job. I think there has to be a simple way to factorise the code, and avoid having a function twice as long because of the direction switch.

P#34489 2016-12-30 03:43 ( Edited 2016-12-30 08:43)

I've updated the benchmark to "round 2" with the new functions posted here. Please note that I've changed the methodology somewhat (now rendering a table of 300 triangles, ten times over), so your times will change a bit. However I've also added a new metric "tris/sec" which will hopefully remain more consistent if I end up tinkering with the table size again.

While the results average out better now, it's still worth running the benchmark a couple times in a row to see how the results vary based on new conditions.

@NuSan— yeah, grabbed the wrong method in round 1. Sorry about that! Your n-gon method is really interesting though—would like to compare it to scgrn's.

P#34509 2016-12-30 11:53 ( Edited 2016-12-30 21:55)

These are amazing, thanks guys.

I'm making a vector editor using a web app to draw vectors so i can push the data into a pico8 cartridge.

Here are the first real tests:
https://twitter.com/gabrielcrowe/status/899220992895184896

Can I use some of these algos to try and get some faster speed out of my render?

P#43475 2017-08-20 07:06 ( Edited 2017-08-20 11:06)

Be warned that Zep plans to fix some of the FillRect timing bugs that many of these Tri renders rely on to get their speed in the next release.

P#43476 2017-08-20 07:16 ( Edited 2017-08-20 11:16)

GOOD

P#43477 2017-08-20 08:02 ( Edited 2017-08-20 12:02)

The cart is CC-licensed, it's fine to use any of it.*

That tech looks awesome by the way :)

*=probably.

P#43528 2017-08-22 00:08 ( Edited 2017-08-22 04:14)

@mole5000

He needs to fix the poke/peek timing bugs as well. You can put a ton of math, including a peek, into a poke's second argument and the poke will still only take 1 cycle total.

I know people will be annoyed if their spiffy fast games stop being fast, but exploiting bugs to run faster than you're supposed to seems counter to the purpose of playing with a limited-spec fantasy console. If you want to run fast, without constraints, just use Löve2D or something.

P#43588 2017-08-24 13:26 ( Edited 2017-08-24 17:27)

Felice and mole5000—there's no way to be sure until release, but I would guess that you won't see a dramatic speed drop in many of these trifill methods. Zep has suggested via Twitter (while showing a GIF of the Gryphon 3D engine running at full speed) that in 1.11 "horizontal fills & circles are now cheaper" and that only carts that exploit the "free backwards-rectangles bug" will be slower. Some of the algorithms collected here may occasionally benefit from this bug due to lack of bounds checking, but they do not methodically or intentionally exploit it.

And since Zep is apparently adding support for fill patterns to rectfill(), we may even be able to run a "shaded trifill thunderdome" in the near future (I hope).

P#43597 2017-08-25 04:18 ( Edited 2017-08-25 08:55)

Woo, fill patterns? I need to read his twitter more often.

I actually meant to ask if he could give us that. He must be puh-sye-kik.

P#43598 2017-08-25 05:43 ( Edited 2017-08-25 09:43)

I've added alternating lines for this demo:

https://twitter.com/gabrielcrowe/status/904421706659495936

I used a few of the examples here for testing.

P#43912 2017-09-04 11:01 ( Edited 2017-09-04 15:01)

Hi folks,

I'm late to the party but here is an updated Triangles Benchmark showing tokens size and two new rasterizers in 163 and 335 tokens respectively.

Let me know what you think and if I missed any cool development since the cartridge posted in this thread.

Cheers,

P#53942 2018-07-04 17:22 ( Edited 2018-07-04 21:22)

I was also working on my own triangle rasterizer. I started noticing some framerate drops, so I've started measuring performance. But this thread helped me a lot.

I was using memcpy, assuming that it was the fastest, but it seems like classic optimization knowledge does not apply to pico. I got a 20% speedup from replacing memset with rectfill, and I also don't have to sacrifice resolution, though the patterns I was able to make with memset are pretty rad!!!!!

And it looks like a lot of other assumptions are wrong, so basically I can rewrite half of my rasterizer and get a lot better performance, for example for memset I needed some code to align it to the right memory adress, I don't need that part anymore as rectfill works directly on screen.

P#53946 2018-07-05 02:22 ( Edited 2018-07-05 06:23)

You can get nice patterns using fillp(....)

P#53950 2018-07-05 04:24 ( Edited 2018-07-05 08:24)

I made my own tri function : https://www.lexaloffle.com/bbs/?tid=49930

P#119766 2022-10-28 14:01

[Please log in to post a comment]

Follow Lexaloffle:          
Generated 2024-03-28 13:18:55 | 0.105s | Q:87