-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call overhead compared to C bindings #126
Comments
Wild guess, but the overhead (or part of it) could be coming from the automatic conversion of the raw arguments into |
Interesting! I don't have an immediate answer but am not surprised that ocaml-rs has additional overhead. I would guess it's probably related to the fact that Edit: Didn't see @tizoc's response, but that seems most likely. Curious to see what you're testing! |
Here are the results that confirm my guess:
Here is the version without conversions: #[no_mangle]
pub extern "C" fn caml_rust_three_int_unit_no_conversion_stubs(x: ocaml::Raw, y: ocaml::Raw, z: ocaml::Raw) -> ocaml::Raw {
#[inline(always)]
fn inner(gc: &mut ocaml::Runtime, _x: isize, _y: isize, _z: isize) -> isize {
{
let _ = &gc;
};
ocaml::sys::UNIT
}
{
let gc = unsafe { ::ocaml::Runtime::recover_handle() };
#[cfg(not(feature = "no-std"))] ::ocaml::inital_setup();
{
{
let res = inner(gc, x.0, y.0, z.0);
#[allow(unused_unsafe)]
let mut _gc_ = unsafe { ocaml::Runtime::recover_handle() };
ocaml::Raw(res)
}
}
}
} There is still a bit of overhead but it looks like most of it comes from the conversion. |
One more test, if I remove the call to
@zshipko for the conversion overhead I guess there is not much option other than giving the option to avoid that conversion? For the initialization part, the overhead is much less, but I guess it could be made configurable (so that the user has the option to perform it when launching the program) |
Thanks for digging into that! It seems like allowing pre-initialization should be an easy feature to add, I will take a look at that when I get a moment. The conversion could likely be optimized in some places, but when performance is the most important concern I would suggest using the raw types from ocaml-sys and performing conversion as needed. |
Yes, accessing the OCaml values directly (without conversion) is exactly what I had to do in the past to keep overhead to the minimum (for example accessing strings, arrays, etc directly without converting to avoid allocations), so my guessing was informed by that. In |
Thanks both of you for having a look and quickly answered! I would like to understand how ocaml-rs works under the hood. I try to understand why I get this benchmark and this benchmark when using custom blocks. It should be around 10ns-20ns for addition. We have already the overhead of the argument as we saw above, and there is certainly a better way to represent the value. When writing C, I have been doing something like this, i.e. save in the custom block the C value. No heap allocation, no indirection when accessing the value in C (gain some ns). The problem can be summarized as follow: |
I recommend using cargo-expand to expand macros into code and you can go from there.
It shouldn't be more. Floats can be |
I have been doing some experimentation to see the actual cost of using ocaml-rs.
When writing C bindings, I have been used to have a 4-6ns overhead due to root registrations, independently of the number of parameters.
With ocaml-rs, it seems different. See https://github.com/dannywillems/ocaml-rust-experimentation.
It seems the number of arguments play a role in the overhead. Is it possible to have an explanation please?
The text was updated successfully, but these errors were encountered: