You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding the fused_window_process. With the integration of the window process in the CUDA files, is the speed improvement significant? Could you provide some quantitative data to illustrate the performance gains?
Additionally, for tasks of a smaller scale, is it necessary to utilize the window process, or would it be better to use a default implementation of torch.roll?
Looking forward to your response!
The text was updated successfully, but these errors were encountered:
Hi, thank you for your excellent work!
I have a question regarding the
fused_window_process
. With the integration of the window process in the CUDA files, is the speed improvement significant? Could you provide some quantitative data to illustrate the performance gains?Additionally, for tasks of a smaller scale, is it necessary to utilize the window process, or would it be better to use a default implementation of
torch.roll
?Looking forward to your response!
The text was updated successfully, but these errors were encountered: