Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework GPU #58

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Rework GPU #58

wants to merge 14 commits into from

Conversation

SwissalpS
Copy link
Contributor

Refactor of GPU code to avoid code duplications and generaly cleaner code.
Removes some bugs and only changes behaviour slightly.

  • less aborts as sizes and coordinates are clamped
  • e.g. when copy/pasting sections, instead of aborting because of overflow, the copied area is adjusted to source and destination real estate.

Fixes #45
Hopefully also fixes #44

@TheEt1234

This comment was marked as spam.

This was referenced Apr 28, 2024
@SwissalpS SwissalpS added the needs testing Possible compatibility issues label Apr 28, 2024
@SwissalpS
Copy link
Contributor Author

Output from observer mooncontroller:

10000
01000
00100
00010
00001

1
1
1
1
1

1000000
1100000
0110000
0010000
0001000
0001100
0000110
0000010
0000001

1
1
1
1
1
1
1
1
1
1
1

1000
1100
0110
0010
0001

in observer mooncontroller I have this code:

local sET = event.type

if 'digiline' == sET then

  local sO

  mem.sBM = mem.sBM .. '\n'
  for y, t in ipairs(event.msg) do
    mem.sBM = mem.sBM .. '\n'
    for x, s in ipairs(t) do

      sO = 'f' == s:sub(1, 1) and '0' or '1'
      mem.sBM = mem.sBM .. sO

    end
  end

elseif 'terminal' == sET then

  print(mem.sBM)

elseif 'program' == sET then

  mem.sBM = ''

end

In master lua/moon-controller I have:

if 'program' == event.type then

  local d = digiline_send

  local tDB = {
    { 1, 1, 5, 5 },
    { 1, 1, 1, 5 },
    { 1, 1, 7, 9 },
    { 1, 1, 1, 11 },
    { 1, 1, 4, 5 },
  }

  local sI, iB
  for i, t in ipairs(tDB) do

    sI = tostring(i)
    iB = i - 1
    d('g', {
      {
        command = 'createbuffer',
        buffer = iB,
        xsize = t[3],
        ysize = t[4],
        fill = 'ffffff',
      },
      {
        command = 'drawline', --rect',
        buffer = iB,
        x2 = t[3],
        y2 = t[4],
        x1 = t[1],
        y1 = t[2],
        color = 'aa00aa'
        --edge = "332211",
        --fill = "010101",
        --antialias = true
      },
      {
        command = 'sendregion',
        buffer = iB,
        x2 = t[3],
        y2 = t[4],
        x1 = t[1],
        y1 = t[2],
        channel = sI,
      },
    })
    
  end
end

---- Original message ----
How does the new drawline implementation handle these 5 simple test cases:
1,1 to 5,5
1,1 to 1,5
1,1 to 7,9
1,1 to 1,11
1,1 to 4,5
... when those are also the corners of the buffer?

@TheEt1234
Copy link

From issue #44:

The original case that started the issue was fixed, so has the (65, 1, 1, 1) case when buffer size is 63x63

So the line bug is most likely patched

@SwissalpS SwissalpS marked this pull request as ready for review May 18, 2024 20:20
@SwissalpS
Copy link
Contributor Author

Thanks for testing.

It has been three weeks and nobody has asked for any other fixes, so I've marked this PR as ready for review/merge.

Copy link
Member

@BuckarooBanzay BuckarooBanzay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code changes look good but i haven't tested anything (i haven't used that part of the mod that much anyway :D)

@cheapie
Copy link

cheapie commented Sep 19, 2024

I tested this with a few of my more demanding programs and it seems to be working fine, but I didn't explicitly try to break it so I can't comment on the validation parts... not that the validation was much good in my original.

I think it might also be a little faster, although I've updated MT since the last time I did much performance testing - I was able to hit 56 FPS in my donut ad (32x32, copies about three 16x16 areas per frame) and 25 FPS in my wireframe 3D cube (64x64, 12 lines drawn per frame) when outputting to digiscreen, of course with Luacontroller overheating disabled.

local packeddata = ""
for y=1,buffer.ysize,1 do
for x=1,buffer.xsize,1 do
packeddata = packeddata..packpixel(buffer[y][x])
Copy link

@TheEt1234 TheEt1234 Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this would be laggy to do this with string concatination (though, remains to be tested)

edit: i put up the comment in the wrong (but still relevant) place, sorry

I suggest (pseudocode):

packeddata = {}
for each pixel:
    table.insert(packeddata, packpixel(pixel))
end

packeddata = table.concat(packeddata)
-- rest of the code can work as normal

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt there would be much difference either way, not that I've observed a sendpacked command taking non-negligible time anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the loops are so short I doubt a difference can be noticed. That's partly why I hadn't touched it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I am considering to add the suggested change for "good practice" reasons. It might be faster without string concat but there will probably be more overhead constructing table and then looping it again to table.concat :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packpixel() also uses string.concat. I don't want to go and change that too without actual proof that there is benifit. If we really want to optimize at that level, we'd better not call packpixel() in the loop but add that code directly in the loop.

Copy link

@TheEt1234 TheEt1234 Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm assuming you ran that benchmark multiple times, because single runs of this type aren't sufficent)

I did originally test it only a single time, i wasn't aware i needed to test multiple times but i have now tested it multiple times:

(Everything is in seconds)
The repeated concatination (tested 8 times):
lowest - 0.000522
highest - 0.001384

The table concatination (tested 7 times):
highest - 0.000289
lowest - 5.4999999999999e-05

So i think its pretty clear that table.concat is faster than string concatination if you have like a lot of stuff you want to concat

Me explaining the tests

The table was testing it on was a 1D array with 1000 elements
i was basically testing if:

ret = {}
for i=1,1000 do
   ret[i] = t[i]
end
return table.concat(ret)

was faster than

ret = ""
for i=1,1000 do
   ret = ret .. t[i]
end
return ret

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. One slight difference:
your test does:

t[i] = x

while the actual code uses a function:

table.insert(t, x)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, table.concat is clearly faster for 1000 strings, even with the overhead for the table, creating and destroying one table is faster than creating and destroying all those intermediate strings.

It might be worth benchmarking the actual code though, because that should perform at most 255 string concatenations (max 64x64 image size), which might be a lot closer in performance, or even possibly slower.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the longest the old method took was half a millisecond with 4x the maximum data size it would usually handle and it's for a command this infrequently-used... I think you'd be hard-pressed to call any variant here "slow".

(random thought experiment, though: at what point does the discussion here end up using more CPU time than the change would actually ever save?)

Copy link
Contributor Author

@SwissalpS SwissalpS Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some players build big. Arrays upon arrays of RGB-glass etc. ;)
But yes, we are splitting small hairs here.

@SwissalpS
Copy link
Contributor Author

Please re-test. I didn't have the time to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs testing Possible compatibility issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2nd crash w/ gpu crash w/ gpu
5 participants