Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with Stateful Workloads on Latest Nydus #1174

Closed
Champ-Goblem opened this issue Mar 22, 2023 · 4 comments
Closed

Problems with Stateful Workloads on Latest Nydus #1174

Champ-Goblem opened this issue Mar 22, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@Champ-Goblem
Copy link

We are experiencing some problems with Nydus since at least version 2.1.4 when running some stateful workloads. The main affected workload is MySQL which fails to start correctly when run on Nydus 2.1.4+ along with Kata 3.0.2. The previous version we ran of Nydus was 2.1.0-rc.3 which when running with Kata 3.0.2 works as expected.

The error that MySQL throws during the startup phase:

ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/opt/bitnami/mysql/tmp/mysql.sock' (111)

This is unexpected because the provisioning scripts start MySQL in the background and enabling debug mode shows no errors during the startup of the background MySQL.

You can recreate it by following these steps:

  • Install Kata and Nydus 2.2.0 with the following dameonset (or manually if preferred):
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kata-installer
spec:
  selector:
    matchLabels:
      app: installer
  template:
    metadata:
      labels:
        app: installer
    spec:
      nodeSelector:
        # Node selector if applicable
      hostPID: true
      volumes:
      - name: bin
        hostPath:
          # CHANGE ME depending on OS
          path: /usr/local/bin
      - name: kata
        hostPath:
          path: /opt/kata
      - name: containerd
        hostPath:
          # CHANGE ME depending on containerd install
          path: /etc/containerd
      containers:
      - name: installer
        image: ubuntu:latest
        env:
        - name: kataReleaseURL
          value: https://github.com/kata-containers/kata-containers/releases/download/3.0.2/kata-static-3.0.2-x86_64.tar.xz
        - name: nydusReleaseURL
          value: https://github.com/dragonflyoss/image-service/releases/download/v2.2.0/nydus-static-v2.2.0-linux-amd64.tgz
        - name: binPath
          # CHANGE ME depending on OS
          value: /usr/local/bin
        - name: containerdPath
          # CHANGE ME depending on containerd install
          value: /etc/containerd/config.toml
        volumeMounts:
        - name: bin
          mountPath: /usr/local/bin
        - name: kata
          mountPath: /opt/kata
        - name: containerd
          mountPath: /etc/containerd
        securityContext:
          privileged: true
        command:
        - bash
        - -c
        args:
        - |
          #!/usr/bin/env bash
          set -e

          apt update && apt install -y wget xz-utils

          cd /tmp

          echo "Installing kata"
          wget --retry-connrefused -t 20 --waitretry=1 ${kataReleaseURL} -O /tmp/kata.tar.xz
          tar -xf /tmp/kata.tar.xz -C /
          cp /opt/kata/bin/containerd-shim-kata-v2 /usr/local/bin/ --force

          echo "Installing nydus"
          wget --retry-connrefused -t 20 --waitretry=1 ${nydusReleaseURL} -O /tmp/nydus.tar.gz
          tar -xzf /tmp/nydus.tar.gz -C /tmp/
          cp -r /tmp/nydus-static/* /usr/local/bin/ --force

          CONTAINERD_CRI_TAG="cri"
          if grep -E -q "^version = 2$" ${containerdPath}; then
            CONTAINERD_CRI_TAG="\"io.containerd.grpc.v1.cri\""
          fi

          if ! grep -q kata ${containerdPath}; then
            echo "
            [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata]
              runtime_type = \"io.containerd.kata.v2\"
              privileged_without_host_devices = true
              [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata.options]
                ConfigPath = \"/opt/kata/share/defaults/kata-containers/configuration.toml\"
            [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata-fc]
              runtime_type = \"io.containerd.kata.v2\"
              privileged_without_host_devices = true
              [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata-fc.options]
                ConfigPath = \"/opt/kata/share/defaults/kata-containers/configuration-fc.toml\"
            [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata-qemu]
              runtime_type = \"io.containerd.kata.v2\"
              privileged_without_host_devices = true
              [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata-qemu.options]
                ConfigPath = \"/opt/kata/share/defaults/kata-containers/configuration-qemu.toml\"
            [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata-clh]
              runtime_type = \"io.containerd.kata.v2\"
              privileged_without_host_devices = true
              [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata-clh.options]
                ConfigPath = \"/opt/kata/share/defaults/kata-containers/configuration-clh.toml\"" >> ${containerdPath}
          fi

          # Workaround for nydus 2.2.0
          echo '#!/bin/bash
          args=`echo $@ | sed '"'"'s#--hybrid-mode##'"'"'`' > ${binPath}/nydus-helper.sh
          echo "${binPath}/nydusd \$args" >> ${binPath}/nydus-helper.sh

          chmod +x ${binPath}/nydus-helper.sh

          nsenter -t 1 -m -p -- systemctl restart containerd

          echo 'Enabling nydus virtiofs'
          sed -i 's#virtio_fs_extra_args.*#virtio_fs_extra_args = []#' /opt/kata/share/defaults/kata-containers/configuration*
          sed -i 's#shared_fs.*#shared_fs = "virtio-fs-nydus"#' /opt/kata/share/defaults/kata-containers/configuration*
          sed -i "s#virtio_fs_daemon.*#virtio_fs_daemon = \"${binPath}/nydus-helper.sh\"#" /opt/kata/share/defaults/kata-containers/configuration*

          echo 'Setting sandbox_cgroup_only to false'
          sed -i 's/sandbox_cgroup_only=.\+$/sandbox_cgroup_only=false/' /opt/kata/share/defaults/kata-containers/configuration*

          echo "Done"

          while :; do sleep 50000; done
  • In the above there are a number of fields to be configured:
    • spec.template.spec.env contains binPath and containerdPath which is dependent upon the installation
    • spec.template.spec.volumes contains bin which should point to an active path in $PATH and contained which should point to the location of the config.toml file
  • Install MySQL via the Bitnami repo helm install mysql bitnami/mysql -f values.yaml with the below values for values.yaml
architecture: replication
auth:
  rootPassword: "password"
  username: "user1"
  password: "password"
  replicationPassword: "password"

image:
  debug: false

primary:
  runtimeClassName: kata-clh
  nodeSelector:
    # set if required
  initContainers:
  - command:
    - /bin/bash
    - -ec
    - |
      chown -R 1001:1001 /bitnami/mysql
    image: docker.io/bitnami/minideb:buster
    imagePullPolicy: Always
    name: volume-permissions
    resources: {}
    securityContext:
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /bitnami/mysql
      name: data
secondary:
  replicaCount: 0

initdbScripts:
  test.sh: |
    mysql -P 3306 -uroot -ppassword -e "SHOW STATUS;"

To note, if you want to helm uninstall mysql then helm install again, please delete the persistent volume claim between the uninstall and install steps.

When you run the above MySQL in runc, the command in test.sh under the initdbScripts will print the server status, this works because the Bitnami scripts start MySQL during the init phase in the background, allowing the scripts to make changes to the configs. Whereas when run with Kata and the latest Nydus we see that the same command fails to connect to MySQL over the socket present at /opt/bitnami/mysql/tmp/mysql.sock even though MySQL is running correctly. You may view the logs for MySQL running in the background by setting image.debug=true in the helm values.

To get an idea of how stable the newest version of Nydus is, we tried running the xfstest utility against it https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/. For nydus 2.1.0-rc3 this yielded:

Failures: generic/007 generic/013 generic/088 generic/131 generic/245 generic/247 generic/257 generic/258 generic/263 generic/430 generic/431 generic/432 generic/433 generic/434 generic/478 generic/504 generic/564 generic/571 generic/632 generic/637 generic/639
Failed 21 of 589 tests

and for version 2.2.0 the results were:

Failures: generic/007 generic/013 generic/088 generic/245 generic/257 generic/258 generic/263 generic/430 generic/431 generic/432 generic/433 generic/434 generic/504 generic/564 generic/571 generic/632 generic/637 generic/639
Failed 18 of 589 tests

We have made a ticket with the fuse-backend-rs crate with a suggestion about integrating xfstests into their testing regime to try and catch any problems early cloud-hypervisor/fuse-backend-rs#111.

@adamqqqplay adamqqqplay added the bug Something isn't working label Mar 23, 2023
@imeoer
Copy link
Collaborator

imeoer commented Mar 23, 2023

Hi @Champ-Goblem , thanks for the detailed report, could you try nydusify check --source <your_original_mysql_image> --target <your_nydus_mysql_image> with nydus-image & nydusd 2.1.4 ?

@ccx1024cc ccx1024cc self-assigned this Mar 23, 2023
@Champ-Goblem
Copy link
Author

Hi @imeoer, the image we are using is standard OCI rather than a RAFS formatted image, the image being used by the helm chart is docker.io/bitnami/mysql:8.0.32-debian-11-r14, does this change things?

@hsiangkao
Copy link
Contributor

it seems this issue is mainly related to FUSE? IOWs, fuse-backend-rs?

@ccx1024cc ccx1024cc removed their assignment Apr 7, 2023
@Champ-Goblem
Copy link
Author

Seems to be solved in 2.1.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants