-p N, N > 1 seems absurdly slow? #14

tommythorn · 2020-12-15T01:49:47Z

Thanks for this brilliant tool which is exactly what I wanted. I followed the helpful guide #2 (comment)
to get a Ubuntu VM which is very fast in single user mode (no -p option), but as soon as I enable more than one core, performance is very very slow.

Is this a known issue?

tommythorn · 2020-12-15T04:15:00Z

Very funny, that's not the issue (I'm allocating 6 GB and it reproduces with just -p2). So nobody else sees this? That's very odd. I can reproduce it trivially.

Damenly · 2020-12-15T04:57:44Z

Very funny, that's not the issue (I'm allocating 6 GB and it reproduces with just -p2). So nobody else sees this? That's very odd. I can reproduce it trivially.

+1. Running with 8 CPUs and 8GB RAM but so slow (disk IO?). Reduce it to -p1 then it works smoothly.

cjdell · 2020-12-19T21:54:12Z

Noticing this too. With 1 CPU it is fine. You can easily see the problem when pinging the NAT gateway (in my case this is 192.168.64.1).

With a single CPU the ping time is < 1ms, but with multiple CPUs it can be often > 100ms. Not disk bound or CPU bound, but more like a synchronisation issue. The problem appears to be fundamental to Apple's virtualisation framework as this phenomenon is also happening in the "SimpleVM" project too.

evansm7 · 2020-12-22T11:53:38Z

Been discussing this on Twitter – looks like Virtualization.framework is dropping interrupts that aren’t directed at VCPU 0. I can recreate by manually changing the IRQ affinity in sysfs; as an example, IRQ6 was my virtio-net interrupt and I lose network when I direct it at VCPU1 instead of “any”:

echo 1 > /proc/irq/6/smp_affinity_list

My Debian basic installation doesn’t have irqbalanced (or equivalent), so all IRQs remain steered at 0 – but other distros appear to install it by default.

Someone said they had problems with Ubuntu cloud image without irqbalanced, which I have yet to look into. Maybe they have a similar userspace utility, maybe the kernel now has some spicy redirection.

It isn’t a vftool/SimpleVM bug, but a workaround is needed. Feels like a distro-specific tips & tricks discussion?

tommythorn · 2020-12-22T23:42:17Z

That seems like an ... odd choice of Apple. I apt remove irqbalance and ran for f in /proc/irq/*/smp_affinity_list;do echo 0 > $f;done and it looks like it made things way better. Curiously enough, /proc/irq/1/smp_affinity_list, .., /proc/irq/4/smp_affinity_list can't be written and stay at 0-5 (I ran with -p6, thus the "5"), but it appears to be significantly better than before. Thanks!

I would close, but it seems worthwhile to mention this in the documentation before closing.

seanjensengrey · 2020-12-24T16:58:11Z

Thanks @tommythorn! I think removing irqbalance is enough, **edit it isn't, see below. I am seeing guest compilation timings for Rust be on par with the M1 host. Previously, using a VM launched with -p 4, cargo took over 7 minutes just to print that it had started compiling the first crate.

irqblance discussion below.

Rust compilation timings

time cargo build --force ripgrep

ubuntu guest with `smp_affinity_list` changes applied

Launched with -p 4

   Finished release [optimized + debuginfo] target(s) in 31.61s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m31.819s
user    1m55.319s
sys     0m2.653s

apple m1 host

   Finished release [optimized + debuginfo] target(s) in 27.25s
   Replacing /Users/seanj/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real	0m27.389s
user	2m39.468s
sys	0m8.593s

With `irqbalance` installed

root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
3
0-3
1
0-3

Two minutes and cargo hasn't even finished updating crate index.

time cargo install --force ripgrep
    Updating crates.io index
^C

real    1m57.985s
user    0m0.166s
sys     0m0.038s

After allowing all cores to handle all interrupts

root@ubuntu:~# cat reset_affinity.sh 
#!/bin/bash

# set -eux;

for f in /proc/irq/*/smp_affinity_list;
        do echo "0-3" > $f;
doneroot@ubuntu:~# ./reset_affinity.sh 
./reset_affinity.sh: line 6: echo: write error: Input/output error
./reset_affinity.sh: line 6: echo: write error: Input/output error
./reset_affinity.sh: line 6: echo: write error: Input/output error
./reset_affinity.sh: line 6: echo: write error: Input/output error
root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
0-3
0-3
0-3
0-3

We see compilation times back to normal.

   Finished release [optimized + debuginfo] target(s) in 31.73s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m31.931s
user    1m55.523s
sys     0m2.975s

But! It isn't the configuration, but that act of writing to the smp_affinity_list. Clearing and resetting the irqs to the slowest observed settings still results in a [32s,45s] compile.

With a reconfigured affinity list of

root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
3
0
1
0

I was still able to get a 35s compile.

    Finished release [optimized + debuginfo] target(s) in 34.36s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m35.939s
user    2m0.677s
sys     0m2.327s

The worst configuration I could come up with apart from a fresh boot is

root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
3
3
3
3

And while the console lags a bunch we still see

    Finished release [optimized + debuginfo] target(s) in 34.81s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m43.805s
user    1m55.938s
sys     0m1.976s

It looks like both irqbalance needs to be removed and, the smp_affinity_lists need to get written to, preferably with low numbered cpus.

root@ubuntu:~# cat reset_affinity.sh 
#!/bin/bash

cat /proc/irq/*/smp_affinity_list;

for f in /proc/irq/*/smp_affinity_list;
        do echo "0" > $f;
done

cat /proc/irq/*/smp_affinity_list;

seanjensengrey · 2020-12-24T21:29:11Z

BTW, when running a guest with -p 8 I am seeing nearly identical rust compilation perf

../vftool/build/vftool -k vmlinux -i initrd -d a_disk1.img -m 2048 -p 8 -a "console=hvc0 root=/dev/vda"

    Finished release [optimized + debuginfo] target(s) in 25.59s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m25.815s
user    2m59.086s
sys     0m4.982s

M1 host


    Finished release [optimized + debuginfo] target(s) in 26.39s
   Replacing /Users/seanj/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real	0m26.523s
user	2m38.673s
sys	0m9.794s

gyf304 · 2021-01-11T08:47:54Z

You can also fix it by adding irqaffinity=0 in the kernel cmdline. irqfixup also seems to work.

https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html#:~:text=irqfixup

Before:

ubuntu@ubuntu:~$ sudo hdparm -Tt /dev/vda

/dev/vda:
 Timing cached reads:   17538 MB in  2.00 seconds = 8777.34 MB/sec
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing buffered disk reads:  12 MB in  3.27 seconds =   3.67 MB/sec

After:

ubuntu@ubuntu:~$ sudo hdparm -Tt /dev/vda

/dev/vda:
 Timing cached reads:   45040 MB in  2.00 seconds = 22574.22 MB/sec
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing buffered disk reads: 2238 MB in  3.00 seconds = 745.68 MB/sec

Edit: you will still need to remove irqbalance from your system.

jasmas · 2021-01-16T08:24:42Z

irqaffinity=0 would probably be the preferred method. This should probably just be documented in the same way as the console.

gyf304 mentioned this issue Jan 11, 2021

Creating tables in Database docker containers is slow (M1 Preview) docker/for-mac#5236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

-p N, N > 1 seems absurdly slow? #14

-p N, N > 1 seems absurdly slow? #14

tommythorn commented Dec 15, 2020

tommythorn commented Dec 15, 2020 •

edited

Loading

Damenly commented Dec 15, 2020

cjdell commented Dec 19, 2020

evansm7 commented Dec 22, 2020

tommythorn commented Dec 22, 2020 •

edited

Loading

seanjensengrey commented Dec 24, 2020

seanjensengrey commented Dec 24, 2020

gyf304 commented Jan 11, 2021 •

edited

Loading

jasmas commented Jan 16, 2021

-p N, N > 1 seems absurdly slow? #14

-p N, N > 1 seems absurdly slow? #14

Comments

tommythorn commented Dec 15, 2020

tommythorn commented Dec 15, 2020 • edited Loading

Damenly commented Dec 15, 2020

cjdell commented Dec 19, 2020

evansm7 commented Dec 22, 2020

tommythorn commented Dec 22, 2020 • edited Loading

seanjensengrey commented Dec 24, 2020

Rust compilation timings

ubuntu guest with smp_affinity_list changes applied

apple m1 host

With irqbalance installed

seanjensengrey commented Dec 24, 2020

gyf304 commented Jan 11, 2021 • edited Loading

jasmas commented Jan 16, 2021

tommythorn commented Dec 15, 2020 •

edited

Loading

tommythorn commented Dec 22, 2020 •

edited

Loading

ubuntu guest with `smp_affinity_list` changes applied

With `irqbalance` installed

gyf304 commented Jan 11, 2021 •

edited

Loading