Using vectors for 1000 entity movement

supercurses • 2025-04-08 2025-04-08 08:35 •

BBS>

Picotron>Chat

Hello,

I have a feeling I don't fully understand vectors. I am thinking of the optimal way to handle the updating of a large number of entities for something like a Vampire Survivors game. Right now, in my testing, parallel arrays appears to have the lowest impact on CPU. 1000 mobs move towards the player and cpu is 0.601, metatables go up to 0.823

I tried vectors this morning and my cpu stat is 1.4.

Vectors are a bit new to me and I'm wondering if I going about this the right way.

function _init()
	mobs={}
	player={}
	player.v = vec(240, 130)
	for i=1, 1000 do
		local spawn_radius = max(470, 270) / 2 + 50
		local player_x, player_y = player.v:get(0, 2)
		local spawn_x = player_x
		local spawn_y = player_y
		local angle = rnd(1)
		local x = spawn_x + spawn_radius * cos(angle)
		local y = spawn_y + spawn_radius * sin(angle)

		v = vec(x, y)
		add(mobs,{v=v,spd=rnd(0.5)})
	end

end

function _update()

	for mob in all(mobs) do
		-- Calculate direction vector (from mob to player)
		local direction = vec(0, 0)

		-- Get mob and player positions
		local mob_x, mob_y = mob.v:get(0, 2)
		local player_x, player_y = player.v:get(0, 2)

		-- Create direction vector
		direction = vec(player_x - mob_x, player_y - mob_y)

		-- Normalize the direction (make it length 1)
		local mag = direction:magnitude()
		if mag > 0 then
			direction = direction:div(mag)

			mob.v = mob.v:add(direction:mul(mob.spd))
		end
	end
end

function _draw()
    cls()
    for mob in all(mobs) do
        local x, y = mob.v:get(0, 2)
        spr(1, x, y)
    end
    print(stat(1), 0, 0, 8)
end

Isogash • 2025-04-08 2025-04-08 13:42

Parallel arrays are actually a performance optimization in many cases outside of Pico-8, but for different reasons than why they appear to be efficient in Pico-8.

The key to understanding CPU performance in Pico-8 is knowing what things cost in terms of cycles; see https://pico-8.fandom.com/wiki/CPU for a full explanation.

The takeaway is that everything you do costs cycles, so the less code you have, the faster it will be. The reason vectors are slow here is simply because you have added a lot of extra steps involved in creating vectors and calling their methods; parallel arrays are fast because there are fewer table creations, and you probably aren't using method calls.

Having said that, the main bottleneck in your code might be the vector normalization, as vector:magnitude() is probably using the sqrt() function, which costs a whopping 48 cycles.

There is a much faster way to get a normalized vector than dividing by magnitude though. I don't have Pico-8 to hand so can't test this but it should work.

-- assign global trig functions to local variables outside loop to save cycles (6 cycles)
local atan2 = atan2
local cos = cos
local sin = sin

-- within loop, normalize (dx, dy) to (nx, ny) (13 cycles)
local a = atan2(dx, dy) -- (5 cycles)
local nx = cos(a) -- (4 cycles)
local ny = sin(a) -- (4 cycles)

supercurses • 2025-04-08 2025-04-08 19:36

Thanks @Isogash - I will give this a go and will also look into CPU cycles

zep • 2025-04-09 2025-04-09 03:02

Hi @supercurses

It is true that :magnitude is a little expensive, but it is much cheaper in Picotron than in PICO-8. The main danger of using vectors cpu-wise is getting data in and out of them, and creating new tiny 2x1 userdata's for every operation. Whenever possible, I'd recommend keeping operations in userdata form (instead of getting and setting components), and using e.g a:add(b,true) instead of a = a + b -- the true argument means the result is written to a instead of creating a new userdata object and garbage collecting the old one.

Here's another version of _update that removes temporary object creation by reusing a single local vector (direction) so that no new userdata objects are created. I also replaced the all(mobs) for loop to avoid the function call overhead. This one runs at around 35% at the start, and reaches ~62% once all the sprites are visible:

function _update()
	local direction = vec(0,0)
	for i=#mobs,1,-1 do -- backwards incase want to delete something
		local mob = mobs[i]
		player.v:sub(mob.v,direction) -- direction = player.v - mob.v		
		direction:mul(mob.spd / direction:magnitude(),true)
		mob.v:add(direction,true)
	end
end

There is not a huge advantage in using actor.vec_xy over actor.x,actor.y to reduce CPU ~ I imagine their main use is nicer syntax. To increase performance for a large number of entities, I think the best bet would be to keep them in one large 2d userdata, but that only works for very simple logic / movement that might not apply here (e.g. like in /system/demos/pixeldust.p64).

zep • 2025-04-09*2025-04-09 03:07*

update: 25% cpu at the start if replace the "for all(mobs)" in the draw function with "for i=1,#mobs ..." -- all() and foreach() are huge cpu hogs for large iterations!

supercurses • 2025-04-09*2025-04-09 07:25*

Thanks @zep, after switching all my tests to for i=1, #mobs there is now only a marginal difference between parallel arrays (mob_x table, mob_y table) and metatables in terms of CPU, obviously there is in terms of memory.

I might have the terms wrong here

Parallel Arrays (SOA?) (mob_x table, mob_y table) - 0.6016 / 1904794
Array of Structures (add(mobs, {x=10, y=10}) - 0.6731 / 2041270
Closure-based OOP (function that returns update and draw functions) - 0.6016 / 2401290
Metatable-OOP - 0.6017 / 2041322

Will try user data approach next

Closure was created by Claude...

-- Closure-based approach
function create_mob(x, y)
    -- These local variables become the object's private state
    local x = x
    local y = y
    local spd = 1

    -- The table we return contains methods that "close over" the local variables
    return {
        update = function()
            local dx = player.x - x
            local dy = player.y - y
            local distance = abs(dx) + abs(dy)
            dx = dx / distance
            dy = dy / distance

            -- These functions can access and modify the local variables
            x = x + dx * spd
            y = y + dy * spd

        end,

        draw = function()
            spr(1, x, y)
        end,

        get_pos = function()
            return x, y
        end
    }
end

supercurses • 2025-04-09*2025-04-09 09:33*

Hmm..maybe I don't have the right approach for userdata. CPU for this test is 0.6791 which is the highest.

Spawning:

mobs = userdata("f64", 5, mob_count)
	for i = 0, mob_count-1 do 
			local spawn_radius = max(470, 270) / 2 + 50
			local angle = rnd(1)
			local x = player.x + spawn_radius * cos(angle)
			local y = player.y + spawn_radius * sin(angle)

			mobs:set(0, i, x)       -- Set x position
			mobs:set(1, i, y)       -- Set y position
			mobs:set(2, i, 1)       -- Set speed
			mobs:set(3, i, 0) 	-- DX
			mobs:set(4, i, 0)		-- DY
   	end

updating:

for i=mob_count,1,-1 do
		   local mob_x = mobs:get(0, i)
		   local mob_y = mobs:get(1, i)
		   local mob_speed = mobs:get(2, i) 

		   local dx = player.x - mob_x 
		   local dy = player.y - mob_y 
		   local distance = abs(dx) + abs(dy)

		  set(mobs, 3, i, (dx / distance) * mob_speed)
		  set(mobs, 4, i, (dy / distance) * mob_speed)
end
-- add dx, dy to x, y
mobs:add(mobs, mobs,  3, 0, 2, 5, 5, mob_count)

drawing:

for i=0, mob_count-1 do 
		local x = mobs:get(0, i)
        	local y = mobs:get(1, i)
		spr(1, x, y)
end

[Please log in to post a comment]

About | Contact | Updates | Terms of Use | Picotron

Follow Lexaloffle:

Generated 2025-04-20 07:24:33 | 0.011s | Q:20

User:
Password: