`bulk::decompress_to_buffer` is 40 times slower than `stream::copy_decode` #291

Firestar99 · 2024-07-14T17:31:50Z

Background: I use zstd to decompress "block compressed images" (BCn) which have additionally been compressed by zstd before being written to disk, yielding a 33% size reduction. I have 331 images, all of them 2048x2048 pixels in size and exactly 4MiB large when decompressed, which are all compressed individually without a dictionary. Some have high variance and some with very regular patterns. When I run my application, I first load the entire binary blob containing all the images from file into memory, before starting to decompress them from a &[u8] slice of that buffer into another fixed size &mut [u8] slice allocated beforehand, so neither IO nor allocation should not affect the results. For profiling I'm using the profiling crate with the puffin backend, everything in release ofc, and puffin_http to send the profiling results to the external puffin_viewer. Tested on an 6900HS.

Using bulk::decompress_to_buffer on a single thread takes about 52.8s total of which 51.2s are taken up by this method:

#[profiling::function]
fn decode_bcn_zstd_into(&self, src: &[u8], dst: &mut [u8]) -> io::Result<()> {
	let written = zstd::bulk::decompress_to_buffer(src, dst)?;
	assert_eq!(written, dst.len(), "all bytes written");
	Ok(())
}

But if I switch for stream::copy_decode it only takes 2.9s, of which 1.3s are spend on decompression:

#[profiling::function]
fn decode_bcn_zstd_into(&self, src: &[u8], mut dst: &mut [u8]) -> io::Result<()> {
	zstd::stream::copy_decode(src, &mut dst)?;
	assert_eq!(0, dst.len(), "all bytes written");
	Ok(())
}

Just looking at total time spent decompressing, that's a 39.3x speedup! I would honestly have expected the bulk API to be faster in this case, as it's specifically made to deal with slices and having all data being present in memory. Any idea what could cause the speed difference?

The text was updated successfully, but these errors were encountered:

gyscos · 2024-07-23T14:37:43Z

Hi, and thanks for the report!

This is indeed quite surprising!
Note that the bulk API is intended to re-use a Compressor (or Decompressor) between calls - the module-level methods create a (De)compressor every time. Though zstd::stream::copy_decode also creates a new context on every call, so it shouldn't be that different...

Firestar99 changed the title ~~bulk::decompress is 40 times slower than stream::copy_decode~~ bulk::decompress_to_buffer is 40 times slower than stream::copy_decode Jul 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`bulk::decompress_to_buffer` is 40 times slower than `stream::copy_decode` #291

`bulk::decompress_to_buffer` is 40 times slower than `stream::copy_decode` #291

Firestar99 commented Jul 14, 2024 •

edited

Loading

gyscos commented Jul 23, 2024

bulk::decompress_to_buffer is 40 times slower than stream::copy_decode #291

bulk::decompress_to_buffer is 40 times slower than stream::copy_decode #291

Comments

Firestar99 commented Jul 14, 2024 • edited Loading

gyscos commented Jul 23, 2024

`bulk::decompress_to_buffer` is 40 times slower than `stream::copy_decode` #291

`bulk::decompress_to_buffer` is 40 times slower than `stream::copy_decode` #291

Firestar99 commented Jul 14, 2024 •

edited

Loading