You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background: I use zstd to decompress "block compressed images" (BCn) which have additionally been compressed by zstd before being written to disk, yielding a 33% size reduction. I have 331 images, all of them 2048x2048 pixels in size and exactly 4MiB large when decompressed, which are all compressed individually without a dictionary. Some have high variance and some with very regular patterns. When I run my application, I first load the entire binary blob containing all the images from file into memory, before starting to decompress them from a &[u8] slice of that buffer into another fixed size &mut [u8] slice allocated beforehand, so neither IO nor allocation should not affect the results. For profiling I'm using the profiling crate with the puffin backend, everything in release ofc, and puffin_http to send the profiling results to the external puffin_viewer. Tested on an 6900HS.
Using bulk::decompress_to_buffer on a single thread takes about 52.8s total of which 51.2s are taken up by this method:
#[profiling::function]fndecode_bcn_zstd_into(&self,src:&[u8],dst:&mut[u8]) -> io::Result<()>{let written = zstd::bulk::decompress_to_buffer(src, dst)?;assert_eq!(written, dst.len(), "all bytes written");Ok(())}
But if I switch for stream::copy_decode it only takes 2.9s, of which 1.3s are spend on decompression:
#[profiling::function]fndecode_bcn_zstd_into(&self,src:&[u8],mutdst:&mut[u8]) -> io::Result<()>{
zstd::stream::copy_decode(src,&mut dst)?;assert_eq!(0, dst.len(), "all bytes written");Ok(())}
Just looking at total time spent decompressing, that's a 39.3x speedup! I would honestly have expected the bulk API to be faster in this case, as it's specifically made to deal with slices and having all data being present in memory. Any idea what could cause the speed difference?
The text was updated successfully, but these errors were encountered:
Firestar99
changed the title
bulk::decompress is 40 times slower than stream::copy_decodebulk::decompress_to_buffer is 40 times slower than stream::copy_decodeJul 14, 2024
This is indeed quite surprising!
Note that the bulk API is intended to re-use a Compressor (or Decompressor) between calls - the module-level methods create a (De)compressor every time. Though zstd::stream::copy_decodealso creates a new context on every call, so it shouldn't be that different...
Background: I use zstd to decompress "block compressed images" (BCn) which have additionally been compressed by zstd before being written to disk, yielding a 33% size reduction. I have 331 images, all of them 2048x2048 pixels in size and exactly 4MiB large when decompressed, which are all compressed individually without a dictionary. Some have high variance and some with very regular patterns. When I run my application, I first load the entire binary blob containing all the images from file into memory, before starting to decompress them from a
&[u8]
slice of that buffer into another fixed size&mut [u8]
slice allocated beforehand, so neither IO nor allocation should not affect the results. For profiling I'm using theprofiling
crate with thepuffin
backend, everything in release ofc, andpuffin_http
to send the profiling results to the externalpuffin_viewer
. Tested on an 6900HS.Using
bulk::decompress_to_buffer
on a single thread takes about 52.8s total of which 51.2s are taken up by this method:But if I switch for
stream::copy_decode
it only takes 2.9s, of which 1.3s are spend on decompression:Just looking at total time spent decompressing, that's a 39.3x speedup! I would honestly have expected the bulk API to be faster in this case, as it's specifically made to deal with slices and having all data being present in memory. Any idea what could cause the speed difference?
The text was updated successfully, but these errors were encountered: