Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the buffer size configurable? #300

Open
ksolana opened this issue Aug 30, 2024 · 3 comments · May be fixed by #307
Open

Make the buffer size configurable? #300

ksolana opened this issue Aug 30, 2024 · 3 comments · May be fixed by #307

Comments

@ksolana
Copy link

ksolana commented Aug 30, 2024

The writer::new function has hardcoded buffer value.

    /// Creates a new `Writer`.
    ///
    /// All output from the given operation will be forwarded to `writer`.
    pub fn new(writer: W, operation: D) -> Self {
        Writer {
            writer,
            operation,

            offset: 0,
            // 32KB buffer? That's what flate2 uses
            buffer: Vec::with_capacity(32 * 1024),

            finished: false,
            finished_frame: false,
        }
    }

Can we make the buffer size configurable? I'd like to experiment with different buffer sizes and see if that helps with compression ratio/speed.

@ksolana
Copy link
Author

ksolana commented Aug 30, 2024

Related hail-is/hail#14033

@gyscos
Copy link
Owner

gyscos commented Sep 3, 2024

Hi, and thanks for the request!

Indeed, it would likely make sense to expose a way to set a custom buffer size. I'm surprised by the 12% saving reported in that hail ticket though. I think they changed the internal block size, not just the output buffer size. This buffer is not going to affect compression ratio, only potentially reduce the number of times things need to be copied between zstd's own internal buffer, this output buffer, and the actual writer destination.

@ksolana
Copy link
Author

ksolana commented Sep 3, 2024

This buffer is not going to affect compression ratio, only potentially reduce the number of times things need to be copied between zstd's own internal buffer, this output buffer, and the actual writer destination.

Yeah that sounds reasonable. Because speed is desired for certain workloads e.g., anza-xyz/agave#2729, it'll be great to expose the buffer size option to the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants