Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorder bindings #141

Open
FL33TW00D opened this issue Mar 27, 2024 · 3 comments
Open

Reorder bindings #141

FL33TW00D opened this issue Mar 27, 2024 · 3 comments
Labels
performance brrrrrrrrr

Comments

@FL33TW00D
Copy link
Collaborator

@group(0) @binding(0)
var<storage, read> X: array<f32>;

@group(0) @binding(1)
var<storage, read> W: array<f32>;

@group(0) @binding(2)
var<storage, read> B: array<f32>;

@group(0) @binding(3)
var<storage, read_write> Y: array<f32>;

Our kernel preamble currently looks like the following, seems fine right? WRONG!
In reality, you want your bindings ordered in ascending order of change.

What does that mean?
Your read_write bindgroup should be @group(0) @binding(0), because every operation has a single output.
Simple, but boring change to make throughout the codebase.

@FL33TW00D FL33TW00D added the performance brrrrrrrrr label Apr 4, 2024
@AmineDiro
Copy link

Hello ! I want to thank you first for this amazing project and I would love to contribute!

I have been looking for this kind of library for a while to write a GPU-powered wasm semantic search module (docvec ) .

As I understand it, to ensure the best performance across the board, you would prefer to have @group(0) contain the values that change least frequently and each @group contain data that changes at progressively higher frequencies. This logic is at group level does it also apply to BindGroupEntries ?

@FL33TW00D
Copy link
Collaborator Author

FL33TW00D commented Apr 14, 2024

Hi @AmineDiro,
Great to hear that you want to contribute, and docvec already looks very useful!

This issue was quickly written in response to reading the following: https://toji.dev/webgpu-best-practices/bind-groups#grouping-resources-based-on-frequency-of-change

As I understand it, we want to group resources based on frequency of change.
Every(?) operation that runs on the GPU has 1 output, whereas the number of inputs is variadic. Therefore, if we reorder the bindings and have the read_write binding in it's own group, then we can avoid changing it!

I'm unsure if it will have a meaningful impact on performance, more reading needs to be done. I try and open these issues when ideas arise, it may not be a fruitful avenue.

Join the discord if you want to contribute/discuss more: https://discord.gg/XFe33KQTG4

@AmineDiro
Copy link

Thanks for the prompt response! I'll take a look a look at this and jump on the discord 👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance brrrrrrrrr
Projects
None yet
Development

No branches or pull requests

2 participants