Performance of Wasm tail calls is lacking on `aarch64` compared to `x86` #9690

Robbepop · 2024-11-28T12:34:39Z

In Wasmi's benchmark suite I have the following Wasm test case:

(module
    (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64)
        (if (i64.eqz (local.get $N))
            (then
                (return (local.get $a))
            )
        )
        (if (i64.eq (local.get $N) (i64.const 1))
            (then
                (return (local.get $b))
            )
        )
        (return_call $fib
            (i64.sub (local.get $N) (i64.const 1))
            (local.get $b)
            (i64.add (local.get $a) (local.get $b))
        )
    )

    (func (export "run") (param $N i64) (result i64)
        (return_call $fib (local.get $N) (i64.const 0) (i64.const 1))
    )
)

It is a simple fibonacci routines based on Wasm's call_return tail calls.

When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 10-15x slower than Wasmtime on aarch64 usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.

After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.

Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.

Benchmarks from my machine:

execute/fib.tailrec/wasmi-old/1000000
                        time:   [22.361 ms 22.367 ms 22.381 ms]
execute/fib.tailrec/wasmi-new.eager.checked/1000000
                        time:   [15.106 ms 15.123 ms 15.144 ms]
execute/fib.tailrec/wasmi-new.lazy.checked/1000000
                        time:   [15.062 ms 15.081 ms 15.102 ms]
execute/fib.tailrec/wasmtime.cranelift/1000000
                        time:   [4.0465 ms 4.0740 ms 4.1016 ms]

The text was updated successfully, but these errors were encountered:

alexcrichton · 2024-12-03T00:11:33Z

Inspecting the disassemblies nothing looks awry to me. The x64 and aarch64 outputs are basically 1:1 here. My guess is that the differences in timing are probably cpu-specific. I'm going to close this because I think it's as-expected from the Wasmtime side at least, but thanks for opening this as it's still good to investigate!

Robbepop added the bug Incorrect behavior in the current implementation that needs fixing label Nov 28, 2024

alexcrichton closed this as completed Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of Wasm tail calls is lacking on `aarch64` compared to `x86` #9690

Performance of Wasm tail calls is lacking on `aarch64` compared to `x86` #9690

Robbepop commented Nov 28, 2024 •

edited

Loading

alexcrichton commented Dec 3, 2024

Performance of Wasm tail calls is lacking on aarch64 compared to x86 #9690

Performance of Wasm tail calls is lacking on aarch64 compared to x86 #9690

Comments

Robbepop commented Nov 28, 2024 • edited Loading

alexcrichton commented Dec 3, 2024

Performance of Wasm tail calls is lacking on `aarch64` compared to `x86` #9690

Performance of Wasm tail calls is lacking on `aarch64` compared to `x86` #9690

Robbepop commented Nov 28, 2024 •

edited

Loading