Log In  


There seems to have been a change in the virtual cpu cost of +=

Previously, in 0.2.4b, both x=x+y and x+=y cost 1 cycles.
Now, in 0.2.5g, x=x+y costs 1 cycle while x+=y costs 2 cycles.
(Where x and y are locals)

The same happens with other operators that cost 1 cycle, e.g. -=/- and &=/&

This feels like a bug since I wouldn't except x+=y to be costlier than x=x+y

Below code shows the perf. difference.

function testme_calib(name, func, calibrate_func, ...)
  -- based on https://www.lexaloffle.com/bbs/?pid=60198#p
  local n = 1024

  -- calibrate
  flip()
  local unused -- i am not sure why this helps give better results, but it does, so.

  local x,t=stat(1),stat(2)
  for i=1,n do
    calibrate_func(...)
  end
  local y,u=stat(1),stat(2)

  -- measure
  for i=1,n do
    func(...)
  end
  local z,v=stat(1),stat(2)

  -- report
  local function c(t0,t1,t2) return(t0+t2-2*t1)*128/n*256/60*256*2 end -- *2 for 0.2.x

  local s=name.." :"
  local lc=c(x-t,y-u,z-v)
  if (lc != 0) s..=" lua="..lc
  local sc=c(t,u,v)
  if (sc != 0) s..=" sys="..sc

  printh(s)
  print(s)
end

function testme(name, func, ...)
  return testme_calib(name, func, function() end, ...)
end

testme("+", function(x,y) x=x+y end, 1, 2)
testme("+=", function(x,y) x+=y end, 1, 2)

10


Huh! I remember feeling like RP-8 took an unexpected CPU hit at some point, I wonder if this was the cause...

Oh, wow, this makes a huge difference now - like 4% CPU on the synth filter inner loop alone. Would be very nice to get this one fixed.


+1


If anything, it should cost less, because conceptually you don't need to parse a second token to know which two vars are involved, and you only need to obtain one reference for both the input and output vars, i.e. you implicitly know both are x.

I dunno if the interpreter or the bytecode is aware of this, though. It should be, but Lua's code is written to have a very small memory footprint so that it will fit in an embedded system's instruction cache, so it doesn't tend to have a lot of exceptional-case handling.


1

In addition, it looks like x*=y costs 3 cycles now, whereas x=x*y costs just 2, so this affects all operators, not just ones that cost 1 cycle.


this seems to only affect local vars -- I tested with my load #prof tool and confirmed it with this cart:

function _draw()
 local x,y=2,3
-- x,y=2,3
 for i=1,20000 do
--  x+=y
  x=x+y
 end
end


[Please log in to post a comment]