Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of Wasm tail calls is lacking on aarch64 compared to x86 #9690

Closed
Robbepop opened this issue Nov 28, 2024 · 1 comment
Closed
Labels
bug Incorrect behavior in the current implementation that needs fixing

Comments

@Robbepop
Copy link
Contributor

Robbepop commented Nov 28, 2024

In Wasmi's benchmark suite I have the following Wasm test case:

cc @alexcrichton

(module
    (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64)
        (if (i64.eqz (local.get $N))
            (then
                (return (local.get $a))
            )
        )
        (if (i64.eq (local.get $N) (i64.const 1))
            (then
                (return (local.get $b))
            )
        )
        (return_call $fib
            (i64.sub (local.get $N) (i64.const 1))
            (local.get $b)
            (i64.add (local.get $a) (local.get $b))
        )
    )

    (func (export "run") (param $N i64) (result i64)
        (return_call $fib (local.get $N) (i64.const 0) (i64.const 1))
    )
)

It is a simple fibonacci routines based on Wasm's call_return tail calls.

When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 10-15x slower than Wasmtime on aarch64 usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.

After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.

Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.

Benchmarks from my machine:

execute/fib.tailrec/wasmi-old/1000000
                        time:   [22.361 ms 22.367 ms 22.381 ms]
execute/fib.tailrec/wasmi-new.eager.checked/1000000
                        time:   [15.106 ms 15.123 ms 15.144 ms]
execute/fib.tailrec/wasmi-new.lazy.checked/1000000
                        time:   [15.062 ms 15.081 ms 15.102 ms]
execute/fib.tailrec/wasmtime.cranelift/1000000
                        time:   [4.0465 ms 4.0740 ms 4.1016 ms]
@Robbepop Robbepop added the bug Incorrect behavior in the current implementation that needs fixing label Nov 28, 2024
@alexcrichton
Copy link
Member

Inspecting the disassemblies nothing looks awry to me. The x64 and aarch64 outputs are basically 1:1 here. My guess is that the differences in timing are probably cpu-specific. I'm going to close this because I think it's as-expected from the Wasmtime side at least, but thanks for opening this as it's still good to investigate!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior in the current implementation that needs fixing
Projects
None yet
Development

No branches or pull requests

2 participants