Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize composite compute shader #90

Open
Plagman opened this issue Aug 31, 2020 · 4 comments
Open

Optimize composite compute shader #90

Plagman opened this issue Aug 31, 2020 · 4 comments

Comments

@Plagman
Copy link
Member

Plagman commented Aug 31, 2020

We ideally won't have to use it in the embedded case once we get layers going, but wouldn't hurt in the meantime and for nested.

We'd need to:

  • Review the thread count and traversal model to make sure we're optimal with waves and memory
  • Remove the branches based off of constant buffer data. Make one pipeline per layer count, probably, and bake swapChannels into the pipeline depending on current output format. In theory spec constants can do all that?
  • Make sure we saturate BW once we get there, look further if not.
@Plagman
Copy link
Member Author

Plagman commented Aug 31, 2020

Oh, and given we're often doing 2 or 3 layers where the upper layers have smaller extents than the base, like when a cursor, notification, or both are overlaid: see if there's cute stuff we can do with checking layer extents to short-circuit the border sample in a way that ends up quicker. There ways to only do that once per wave I think.

@Plagman
Copy link
Member Author

Plagman commented Aug 31, 2020

How to test the small-extent overlay optimization case: making sure the optimization helps for a game with a cursor like Factorio, but doesn't hurt when the Steam BP overlay is fully up in a SteamOS session scenario ( gamescope -e -- steam -tenfoot -steamos ).

Ideally the first case is almost as quick as only compositing one layer, where the second is not any slower than fully compositing two layers without any extra branches.

@Plagman
Copy link
Member Author

Plagman commented Sep 1, 2020

Other ideas to reduce bandwidth use: avoid sampling alpha for the base layer (it should not be needed for anything), avoid storing alpha for the target composite buffer.

@Plagman
Copy link
Member Author

Plagman commented Sep 1, 2020

Also, seems like we might not need any extent optimization for small overlay layers like cursors/notifications. Need to confirm further, but it seems like compositing the Factorio cursor was quicker than a full overlay layer. Most of the sampling is done outside of the texture border, so it makes sense that this would not involve memory bandwidth.

@Plagman Plagman closed this as completed Sep 1, 2020
@Plagman Plagman reopened this Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant