-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent container startup time #14
Comments
Hi @SmylerMC. I think we've got too separate issues here. Let's take the Debian custom kernel first. The issue here is you also need the |
Now looking at the slow-start issue. This symptom is one I haven't seen and I don't yet know what's going on. I agree it does look like QEMU taking nearly 2 minutes to load the kernel, but let's be sure. The option
Please run this and share another screen recording. If the delay is reproducible, whether the delay is before or after the shell will help us isolate the cause. Here is what I get: https://asciinema.org/a/632558 By the way, how much RAM does your computer have? |
Thank you for your swift response. Custom Kernel issueI had missed the change. I added Computer 1 recording: https://asciinema.org/a/NzdZdtgL4N93oENyN2DkO15yG New Dockerfile: FROM debian:bullseye
ARG KERNEL=linux-image-5.10.0-27-amd64
RUN apt update && \
apt install -y "$KERNEL" initramfs-tools && \
echo "virtiofs" >> /etc/initramfs-tools/modules && \
echo "virtio_console" >> /etc/initramfs-tools/modules && \
update-initramfs -u
Computer 1 has 64GB of RAM. Computer 2 has 16GB. Slow start issueIt is indeed Qemu being very slow to start: https://asciinema.org/a/PSslboorUjZlNgS8NKeOxDmyq . |
I notice you've got 20 cores on this computer and are using default memory allocation of 512M. I wonder if this isn't a happy combination. Let's try |
Oh I believe there may be an issue whereby debian specifically doesn't like using (Bizarrely ubuntu doesn't exhibit this issue, only Debian). |
Good catch, removing the option works. Any idea what's causing this?
No luck, doesn't change a thing. The behavior was also the same when running in my Fedora VM, with 8 cores and 8GB. What's weird is that I use Qemu on a daily basis on this computer through libvirt and the cli, and I have never experienced anything like this, except when using RuncVM. One of the options must be causing it. As a last resort, I can try starting the VM manually, removing options until I find the problematic one. |
No idea about the Debian/virtio_console issue, though logically I feel it has to be either a kernel build config or initramfs start-up script, as Ubuntu doesn't exhibit the issue. There's an argument to reverting to use serial by default, but I'm reluctant as virtio_console seems the more modern and appropriate interface...
To be clear, does this slow start only affect vanilla Ubuntu or does it affect any other images? And if you build a custom Ubuntu image with a different kernel, does it affect that? What does If you're feeling brave, you could try I agree that it's likely some qemu command line option could be triggering this, if you normally run qemu without problems. Easy way you can test this is hacking the /opt/runcvm/scripts/runcvm-ctr-qemu script. It should be reasonably clear how it composes the final qemu command line, so you can chop it around. |
Another idea: with
This might print some kernel logs earlier, or it might not. Also with this mode, please copy and paste the kernel logs directly from the terminal rather than use asciinema, as I'm concerned asciinema may be omitting some output. For comparison, here's the initial output of the above command, run on a Dell R620 running Debian Bullseye:
|
It appears to be sitting with a single thread eating 100% of a CPU core's time.
All images I have tried so far are affected, including ones with custom kernels.
It does not appear to. Early boot messages still appear after about one to two minutes. Output
|
I have managed to get qemu to start immediately by removing the network interface in the qemu command line (launching it by hand from the preqemu hook). Complete command line/.runcvm/guest/usr/bin/qemu-system-x86_64 \
'-no-user-config' '-nodefaults' '-no-reboot' \
'-action' 'panic=none' \
'-action' 'reboot=shutdown' \
'-enable-kvm' \
'-cpu' 'host,pmu=off' \
'-machine' 'q35,accel=kvm,usb=off,sata=off' \
'-device' 'isa-debug-exit' \
'-nographic' '-vga' 'none' \
'-m' '8192M' \
'-smp' '4,cores=1,threads=1,sockets=4,maxcpus=4' \
'-device' 'virtio-serial-pci,id=serial0' \
'-object' 'rng-random,id=rng0,filename=/dev/urandom' \
'-device' 'virtio-rng-pci,rng=rng0' \
'-numa' 'node,memdev=mem' \
'-object' 'memory-backend-file,id=mem,size=8192M,mem-path=/dev/shm,share=on,prealloc=off' \
'-chardev' 'socket,id=virtiofs,path=/run/.virtiofs.sock' \
'-device' 'vhost-user-fs-pci,queue-size=1024,chardev=virtiofs,tag=runcvmfs,ats=off' \
'-chardev' 'stdio,id=char0,mux=on,signal=off' \
'-serial' 'chardev:char0' \
'-mon' 'chardev=char0' \
'-echr' '20' \
'-chardev' 'socket,id=qemuguest0,path=/run/.qemu-guest-agent,server=on,wait=off' \
'-device' 'virtserialport,chardev=qemuguest0,name=org.qemu.guest_agent.0' \
'-monitor' 'unix:/run/.qemu-monitor-socket,server,nowait' \
'-kernel' '/.runcvm/guest/kernels/ubuntu/5.15.0-91-generic/vmlinuz' \
'-initrd' '/.runcvm/guest/kernels/ubuntu/5.15.0-91-generic/initrd' \
'-L' '/.runcvm/guest/usr/share/qemu' \
'-append' 'rootfstype=virtiofs root=runcvmfs noresume nomodeset net.ifnames=1 init=/.runcvm/guest/scripts/runcvm-vm-init rw ipv6.disable=1 panic=-1 scsi_mod.scan=none tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k cryptomgr.notests pci=lastbus=0 selinux=0 systemd.show_status=1 console=ttyS0 ' The options I removed are the following: '-netdev' 'tap,id=qemu0,ifname=tap-eth0,script=/.runcvm/guest/scripts/runcvm-ctr-qemu-ifup,downscript=/.runcvm/guest/scripts/runcvm-ctr-qemu-ifdown' \
'-device' 'virtio-net-pci,netdev=qemu0,mac=52:54:00:11:00:03,rombar=0' \ |
@SmylerMC Thanks so much for persisting with this. At least it seems we're getting somewhere! Please would you try removing the If the latter, we can debug the scripts by adding If the former, then we could experiment with different P.S. To make this change, just edit around line 97 of
|
I'm away from the keyboard right now, so I will try again later, but I did try removing the script options earlier and I don't think it changed anything. As for the device type, I did remove the -device line, which should have made it fallback to a E1000 adapter if I'm not mistaken. That did nothing either. I think I will try reproducing RuncVM's configuration as closely as I can outside of RunC to see if that could have anything to do with the container's configuration. |
You could certainly change |
Hi @SmylerMC have you had a chance to look at this again? I'm very keen to get to the bottom of it (although at the same time I still haven't been able to reproduce the issue on any test platform). |
I'm quite busy at the moment, but I'll probably come back to this in a week or so. I would really like to keep using RuncVM and it is really annoying. |
I just realized I did not come back to you about the network scripts and switching the network card to |
I am using RuncVM to study kernel vulnerabilities by building vulnerable container images. I am noticing different startup times from one computer to another, often very long, and the container never finishes booting in some cases.
Computer 1
Containers take around 2 minutes to start. Most of that time is spent after the Qemu process is started but seemingly before the kernel starts. The Qemu process uses 100% of a CPU core during that time. A similar waiting time exists when the container exists, also maxing out a CPU core. The exact same behavior is reproduced when running RuncVM in a Fedora VM on that host (Libvirt, with nested virtualization enabled and tested).
CPU:
Intel i7-12700H
Host kernel:
6.6.10-1-MANJARO
Docker version:
24.0.7, build afdd53b4e3
Asciinema recording: https://asciinema.org/a/iixER5qw6fSiLnslM1NW2z9d1
Computer 2
Running a vanilla Ubuntu container image works flawlessly.
Asciinema recoding: https://asciinema.org/a/wc0nQL5sFQ4hNUXkQMjApGxsq
Trying to run a Debian image with a custom kernel hangs the shell forever. Trying to run a second container with the same image after that returns immediately. This may be a different issue. Qemu is running and doesn't appear to be using much CPU.
Dockerfile:
CPU:
Intel i7-5600U
Host kernel:
6.5.0-kali3-amd64
Docker version:
20.10.25+dfsg1, build b82b9f3
The text was updated successfully, but these errors were encountered: