Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-p N, N > 1 seems absurdly slow? #14

Open
tommythorn opened this issue Dec 15, 2020 · 9 comments
Open

-p N, N > 1 seems absurdly slow? #14

tommythorn opened this issue Dec 15, 2020 · 9 comments

Comments

@tommythorn
Copy link

Thanks for this brilliant tool which is exactly what I wanted. I followed the helpful guide #2 (comment)
to get a Ubuntu VM which is very fast in single user mode (no -p option), but as soon as I enable more than one core, performance is very very slow.

Is this a known issue?

@tommythorn
Copy link
Author

tommythorn commented Dec 15, 2020

Very funny, that's not the issue (I'm allocating 6 GB and it reproduces with just -p2). So nobody else sees this? That's very odd. I can reproduce it trivially.

@Damenly
Copy link

Damenly commented Dec 15, 2020

Very funny, that's not the issue (I'm allocating 6 GB and it reproduces with just -p2). So nobody else sees this? That's very odd. I can reproduce it trivially.

+1. Running with 8 CPUs and 8GB RAM but so slow (disk IO?). Reduce it to -p1 then it works smoothly.

@cjdell
Copy link

cjdell commented Dec 19, 2020

Noticing this too. With 1 CPU it is fine. You can easily see the problem when pinging the NAT gateway (in my case this is 192.168.64.1).

With a single CPU the ping time is < 1ms, but with multiple CPUs it can be often > 100ms. Not disk bound or CPU bound, but more like a synchronisation issue. The problem appears to be fundamental to Apple's virtualisation framework as this phenomenon is also happening in the "SimpleVM" project too.

@evansm7
Copy link
Owner

evansm7 commented Dec 22, 2020

Been discussing this on Twitter – looks like Virtualization.framework is dropping interrupts that aren’t directed at VCPU 0. I can recreate by manually changing the IRQ affinity in sysfs; as an example, IRQ6 was my virtio-net interrupt and I lose network when I direct it at VCPU1 instead of “any”:

echo 1 > /proc/irq/6/smp_affinity_list

My Debian basic installation doesn’t have irqbalanced (or equivalent), so all IRQs remain steered at 0 – but other distros appear to install it by default.

Someone said they had problems with Ubuntu cloud image without irqbalanced, which I have yet to look into. Maybe they have a similar userspace utility, maybe the kernel now has some spicy redirection.

It isn’t a vftool/SimpleVM bug, but a workaround is needed. Feels like a distro-specific tips & tricks discussion?

@tommythorn
Copy link
Author

tommythorn commented Dec 22, 2020

That seems like an ... odd choice of Apple. I apt remove irqbalance and ran for f in /proc/irq/*/smp_affinity_list;do echo 0 > $f;done and it looks like it made things way better. Curiously enough, /proc/irq/1/smp_affinity_list, .., /proc/irq/4/smp_affinity_list can't be written and stay at 0-5 (I ran with -p6, thus the "5"), but it appears to be significantly better than before. Thanks!

I would close, but it seems worthwhile to mention this in the documentation before closing.

@seanjensengrey
Copy link

Thanks @tommythorn! I think removing irqbalance is enough, **edit it isn't, see below. I am seeing guest compilation timings for Rust be on par with the M1 host. Previously, using a VM launched with -p 4, cargo took over 7 minutes just to print that it had started compiling the first crate.

irqblance discussion below.

Rust compilation timings

time cargo build --force ripgrep

ubuntu guest with smp_affinity_list changes applied

Launched with -p 4

   Finished release [optimized + debuginfo] target(s) in 31.61s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m31.819s
user    1m55.319s
sys     0m2.653s

apple m1 host

   Finished release [optimized + debuginfo] target(s) in 27.25s
   Replacing /Users/seanj/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real	0m27.389s
user	2m39.468s
sys	0m8.593s

With irqbalance installed

root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
3
0-3
1
0-3

Two minutes and cargo hasn't even finished updating crate index.

time cargo install --force ripgrep
    Updating crates.io index
^C

real    1m57.985s
user    0m0.166s
sys     0m0.038s

After allowing all cores to handle all interrupts

root@ubuntu:~# cat reset_affinity.sh 
#!/bin/bash

# set -eux;

for f in /proc/irq/*/smp_affinity_list;
        do echo "0-3" > $f;
doneroot@ubuntu:~# ./reset_affinity.sh 
./reset_affinity.sh: line 6: echo: write error: Input/output error
./reset_affinity.sh: line 6: echo: write error: Input/output error
./reset_affinity.sh: line 6: echo: write error: Input/output error
./reset_affinity.sh: line 6: echo: write error: Input/output error
root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
0-3
0-3
0-3
0-3

We see compilation times back to normal.

   Finished release [optimized + debuginfo] target(s) in 31.73s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m31.931s
user    1m55.523s
sys     0m2.975s

But! It isn't the configuration, but that act of writing to the smp_affinity_list. Clearing and resetting the irqs to the slowest observed settings still results in a [32s,45s] compile.

With a reconfigured affinity list of

root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
3
0
1
0

I was still able to get a 35s compile.

    Finished release [optimized + debuginfo] target(s) in 34.36s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m35.939s
user    2m0.677s
sys     0m2.327s

The worst configuration I could come up with apart from a fresh boot is

root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
3
3
3
3

And while the console lags a bunch we still see

    Finished release [optimized + debuginfo] target(s) in 34.81s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m43.805s
user    1m55.938s
sys     0m1.976s

It looks like both irqbalance needs to be removed and, the smp_affinity_lists need to get written to, preferably with low numbered cpus.

root@ubuntu:~# cat reset_affinity.sh 
#!/bin/bash

cat /proc/irq/*/smp_affinity_list;

for f in /proc/irq/*/smp_affinity_list;
        do echo "0" > $f;
done

cat /proc/irq/*/smp_affinity_list;

@seanjensengrey
Copy link

BTW, when running a guest with -p 8 I am seeing nearly identical rust compilation perf

../vftool/build/vftool -k vmlinux -i initrd -d a_disk1.img -m 2048 -p 8 -a "console=hvc0 root=/dev/vda"

    Finished release [optimized + debuginfo] target(s) in 25.59s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m25.815s
user    2m59.086s
sys     0m4.982s

M1 host


    Finished release [optimized + debuginfo] target(s) in 26.39s
   Replacing /Users/seanj/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real	0m26.523s
user	2m38.673s
sys	0m9.794s

@gyf304
Copy link

gyf304 commented Jan 11, 2021

You can also fix it by adding irqaffinity=0 in the kernel cmdline. irqfixup also seems to work.

https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html#:~:text=irqfixup

Before:

ubuntu@ubuntu:~$ sudo hdparm -Tt /dev/vda

/dev/vda:
 Timing cached reads:   17538 MB in  2.00 seconds = 8777.34 MB/sec
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing buffered disk reads:  12 MB in  3.27 seconds =   3.67 MB/sec

After:

ubuntu@ubuntu:~$ sudo hdparm -Tt /dev/vda

/dev/vda:
 Timing cached reads:   45040 MB in  2.00 seconds = 22574.22 MB/sec
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing buffered disk reads: 2238 MB in  3.00 seconds = 745.68 MB/sec

Edit: you will still need to remove irqbalance from your system.

@jasmas
Copy link

jasmas commented Jan 16, 2021

irqaffinity=0 would probably be the preferred method. This should probably just be documented in the same way as the console.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants