Clarification on Speed Improvement with `fused_window_process` and Its Necessity for Small-Scale Tasks #371

Fanqyu · 2024-10-16T09:56:58Z

Hi, thank you for your excellent work!

I have a question regarding the fused_window_process. With the integration of the window process in the CUDA files, is the speed improvement significant? Could you provide some quantitative data to illustrate the performance gains?

Additionally, for tasks of a smaller scale, is it necessary to utilize the window process, or would it be better to use a default implementation of torch.roll?

Looking forward to your response!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on Speed Improvement with `fused_window_process` and Its Necessity for Small-Scale Tasks #371

Clarification on Speed Improvement with `fused_window_process` and Its Necessity for Small-Scale Tasks #371

Fanqyu commented Oct 16, 2024

Clarification on Speed Improvement with fused_window_process and Its Necessity for Small-Scale Tasks #371

Clarification on Speed Improvement with fused_window_process and Its Necessity for Small-Scale Tasks #371

Comments

Fanqyu commented Oct 16, 2024

Clarification on Speed Improvement with `fused_window_process` and Its Necessity for Small-Scale Tasks #371

Clarification on Speed Improvement with `fused_window_process` and Its Necessity for Small-Scale Tasks #371