Hi there,
I'm looking for a way to optimize a bit my script.
I have a projected shadow on some of my sprite that is pretty dumb: I read every pixel of the sprite and print a pixel of shadow according to a given offset.
It implies that I ready every pixel of every sprite in need of a shadow.
Some optimization is needed because when I try to display a shadow for a huge set of sprite, the frame rate drops.
I was wondering if using Peek and Poke would be faster than Pget and Pset?
Any idea if it would speed up the thing?
Thanks.
Yes, and using memset(...) on entire rows would probably be even faster. But either way you have to make sure that the rows of your projected shadow contain an even number of pixels aligned on an even boundary (x%2==0), since each byte of memory covers two pixels: (p0<<4) + p1. If not, you'd have to clean up the ends of each row with a special pset() operation.
But the BEST way to do this is probably to use spr() and pal(): pal() to shift the sprite colors to your shadow color, then spr() to blit an offset copy of the sprite where the shadow should be. Then you draw the original sprite on top of it.
Thanks for the answer. I was thinking about using pal() but... I use a color ramp to fake shadow. Depending on the "ground" color I put a different color for each shadow pixel.
Something like in this example.
I will then try memcpy the pixel set to user data, modify it, and memcpy it back to screen.
Ah, I see—yeah, in that case, you'd have to use the memcpy method you propose. The other way you could go—if you're running at 60fps and don't mind a little flicker—is to blit the shadow every other frame so that it appears to blend into the background.
Oh. Smart. Wouldn't it also work at 30fps?
(the memcpy is not event needed as I can compute everything directly onscreen)
You could try it! At 30fps the flicker may be more apparent and distracting. OR, if you're locked to 30, you could try using two shadows with an alternating dither (i.e. every other pixel transparent) and just flip between them.
Well, with the poke and peek method frame rate is better but not THAT better :(
30fps alternating scanline technique:
and 60fps alternating frame technique:
Another idea just occurred to me for 30fps shadows (and then I'm going to stop procrastinating) -- precompute 4 or 5 randomly dithered shadow frames and write them to the spritesheet, then cycle between them every frame, like so:
Couple of things:
- Are you calculating the shadow for every pixel in the sprite (including the shadow pixels that get end up covered by the sprite), or just the visible shadow edges?
- Are you recalculating these shadow masks every frame, or are you caching them in a table first?
Thanks for your replies guys (and musurca for your examples and ideas).
Here is what I have so far.
The effect is fine, but, at some point with the rest of the gigantic map, the frame rate drops a little bit at the end of the map (I need a way to optimize the map loading / rendering too but it'll be in another post).
To answer your questions catatafish :
- Are you calculating the shadow for every pixel in the sprite (including the shadow pixels that get end up covered by the sprite), or just the visible shadow edges?
I still compute the shadow for every pixel in the sprite (even the one that'll be hidden by the sprite) BUT I compute shadows only for relevant sprites (border ones).
- Are you recalculating these shadow masks every frame, or are you caching them in a table first?
I recalculate them every frame... simply because as the background moves (and the shadow caster moves too) chances are that the color will change every frame too :S no ?
Thanks for your help.
Another suggestion if you are really gung ho about the look up table colors. Since you say this:
"I read every pixel of the sprite and print a pixel of shadow according to a given offset."
Looking at your image above, there are relatively few shadow pixels in it. For instance, you only have to output shadow pixels if the pixel in the original sprite is transparent. You also don't have to loop over the entire sprite. With some hand coded rects, you could cut down on a lot of reads. Easily 75% of them.
Experimenting with another idea to memcpy() the framebuffer into sprite memory and use pal() to perform the LUT. Should allow it to work in far fewer function calls. So far that part works, I just need to figure out how to apply the sprite's mask to the blitted chunk of the framebuffer. I think I should be able to do more palette fiddling and an extra blit though.
The benefit of doing it this way should be many fewer function calls. Most of the cost is per chunk copied, and not per pixel.
Ok! Seems to work well. Probably not going to finish it enough to really test the performance well though...
-- copy a 32x16 rect function blitshadow(dst, src) for i = 0, 15 do local offset = 64*i memcpy(dst + offset, src + offset, 16) end end -- load a lut draws a sprite as black function masklut() pal() for i = 0, 15 do pal(i, 0) end end -- the lut used by shadowlut() lut = { 0, 0, 1, 0, 2, 0, 5, 6, 4, 4, 9, 3, 1, 5, 8, 6, } -- load the shadow lut function shadowlut() pal() for i = 0, 15 do pal(i, lut[i + 1]) end end -- apply a shadow to a 32x16 chunk of the screen function shadowtile(x, y, maskf) local px, py = x*32, y*16 clip(px, py, 32, 16) -- blit a clean copy to the sprite sheet local src = 0x6000 + px/2 + py*64 blitshadow(2096, src) -- mask out the shadowed pixels and copy again masklut() maskf() blitshadow(3120, src) -- reset the camera, blit in screen space camera() -- draw the tile back to the screen with the lut applied shadowlut() spr(76, px, py, 4, 2) -- now draw the original pixels back with the shadowed parts masked out pal() spr(108, px, py, 4, 2) end local cx, cy = 0, 0 function _draw() cls(12) clip() pal() camera(cx, cy) -- draw something that you want to be shadowed... mapdraw(0, 0, 0, 0, 64, 64) local t = time() local sx, sy = 8 + 8*sin(t), 8 + 8*cos(t) -- this part could be vastly improved -- currently applies the shadowing to all pixels -- could either render many sprites in the mask function -- or render them one at a time, and only shadow the 32x16 tiles they overlap for j = 0, 7 do for i = 0, 3 do shadowtile(i, j, function() spr(0, sx + 4, sy + 4, 8, 8) end) -- need to reset the camera after calling shadowtile() camera(cx, cy) end end -- need to reset clip after calling shadowtile() clip() -- finally draw the sprite as usual spr(0, sx, sy, 8, 8) end |
Not sure how one submits a cart to an existing thread...
@lvictorino That approach makes sense.
To elaborate on what I was getting at - if you know which pixels of the sprite are casting visible shadows you can cache them, e.g. as relative coordinates, to avoid having to re-scan the sprite pixels every frame. Checking visibility is obviously more expensive than your current method but as it's a precalculation step it won't matter in-game.
This means the number of peek & poke calls you need to make (or whichever method you choose) is reduced to a minimum, and you're not spending cpu on pixels that are never going to be seen.
edit: @slembecke - nice work! I think you can just use 'save @clip' and paste it directly into the post.
This is kind of unrelated to your question, but while we're all focusing on shadows, it seems you aircraft has a light source in the top left of the screen, while the ground map has a light source coming from the bottom left.
@slembecke : whoa nice work! I never used clip() (I don't even understand what it's for) but I'll look into it. Thanks.
@Catatafish : Ok, you're right. I don't know exactly why I haven't thought about caching visible shadow position before. But it's an awesome tricky, very easy to implement. Thanks a lot, I'll try that.
@Stompy : You're right, but the effect is really weird when the shadow is set to reflect a bottom left light :S
I'm trying to do something similar as what's being discussed here.
You can see an example of what I'm trying to do here:
press x to toggle the light off/on to see the difference in performance when walking
My example project
So I went with trasevol_dog's approach based on the following example. I had to make a bunch of tweaks to work in my project but it doesn't run very well and i've had a hard time getting the perf and memory usage to improve. [trasevol_dog's example]( https://www.lexaloffle.com/bbs/?pid=34140#p34161)
Now I'm hoping to try this approach instead as it sounds like it'll be a lot better performance-wise. I really have no idea where to start with the bit blit stuff and copying memory around. Does anyone know of any resources I could look into for more explanation into how to do this? I tried following the code above but couldn't get very far with understanding how it works enough to adapt into my own project.
Thanks!
This conversation was so interesting to read! It's so impressive how much you can learn by using pico or, at least, reading about it as I mostly do =P
This might be tricky because you run out of colors, but does the draw bitmask help at all? I'm on mobile right now and the PICO-8 wiki pages regularly crash every several seconds when I try to browse them, but the Hardware State section of the Memory page talks about a memory location you can poke() and I believe lets you set some bits of color to 0, some bits to the color you drew, and some bits to the pre-existing color on the screen.
It'd limit your palette for the background even further, so it might not work for this, but it might be worth playing with - I could imagine setting colors 4-7 to my desired background colors, 0-3 to their corresponding shadows, and changing the hardware mode to something which knocks the 4s bit down to 0 to draw shadows.
(Or, if you want to go a little further, knock the 4s and 8s bits both down to zero, and use some number of colors in the 8-15 range as background colors with shared shadow colors.)
[Please log in to post a comment]