You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Performance Optimizations
Moved the final scaling and uint8 quantization to GPU, reducing CPU and main memory bandwidth consumption. 2.5x speed-up.
Instruct FFMPEG to use RGB frames instead of BGR so no need to swap channels.
Batched inference (controlled by invoking the --batch & --batches parameter, default is 4).
Instruct torch to make contiguous tensors after the BCHW -> BHWC transform on GPU. So no need to copy the buffer before writing to FFMPEG . Reduced output IO time by 10x.
Use NVENC pipilene when available to decode and encode the images when piping inputs