Manual Volume detach case not handled #2537

CoreyCook8 · 2024-09-28T00:00:25Z

What happened:

After a volume was manually detached from a VM, two pods were mistakenly using the same Volume as their mounted volume.

What you expected to happen:

In an AWS cluster, this same issue gives this error message:

  Warning  FailedMount  14s (x6 over 30s)  kubelet            MountVolume.MountDevice failed for volume "pvc-XXXXX" : rpc error: code = Internal desc = Failed to find device path /dev/xvdaa. refusing to mount /dev/nvme3n1 because it claims to be volX but should be volY

I would expect this to be handled in a similar manner.

How to reproduce it:

Create a pod that mounts a PVC.
After the pod is running, manually detach the disk using the azure portal. (The pod will still show as running)
Create another pod that mounts a PVC and assign it to the same node.
Both pods should be running at this point.
Delete & recreate the first pod
The pod should go into a running state without the volume attaching
At this point, they will both be using the same Volume
To verify you can exec into both pods, create a file in the mounted directory in one and verify that it's shown in the other pod

Anything else we need to know?:

Environment:

CSI Driver version: 1.30.4
Kubernetes version (use kubectl version): 1.28.9
OS (e.g. from /etc/os-release): Ubuntu 20.04.6 LTS
Kernel (e.g. uname -a): Linux 5.4.0-1138-azure # 145-Ubuntu SMP Fri Aug 30 16:04:18 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

andyzhangx · 2024-10-08T13:48:31Z

this is expected since on Azure VM, device name is not bound to disk name, e.g. disk1 is mounted as /dev/sdc, and when disk1 is manually detached and disk2 is attached to the VM, disk2 is mounted as /dev/sdc, if you delete & recreate the first pod with disk1 volume, then disk1 would still use /dev/sdc since at that time CSI driver thinks that disk1 is still attached to the VM, it would just reuse the previous device name(de/sdc).

BTW, manual volume detach is not supported CSI driver scenario, that's out of CSI driver control.

CoreyCook8 · 2024-10-08T14:10:40Z

I understand that manual detach is out of the control of the csi driver. But, I would expect the CSI driver to ensure that a new pod is using the volume it has requested and not another pod's volume. If the pod is deleted, and the new pod is attached to the same VM, I would expect the csi driver to check the drive, and make sure the expected volume == the actual volume.

Or, when attaching the second disk to the same drive as the first disk, it would realize that a disk should already be there / realize that the first disk is no longer attached.

andyzhangx · 2024-10-08T14:31:35Z

due to the manual detach, the kubelet thinks that the disk1 is already attached to the node, thus CSI driver won't be called (no NodeStageVolume call) to verify the drive.

When attaching disk2 to the VM, using the same device name(/dev/sdc) is actually ok (this is also out of CSI driver control, it's controlled by linux disk kernel driver), I think the main problem is that when you do the manual detach, you should reschedule the first pod to other node, that would work. Otherwise we don't have a solution how to make this work since it's out of CSI driver control

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manual Volume detach case not handled #2537

Manual Volume detach case not handled #2537

CoreyCook8 commented Sep 28, 2024 •

edited

Loading

andyzhangx commented Oct 8, 2024

CoreyCook8 commented Oct 8, 2024

andyzhangx commented Oct 8, 2024

Manual Volume detach case not handled #2537

Manual Volume detach case not handled #2537

Comments

CoreyCook8 commented Sep 28, 2024 • edited Loading

andyzhangx commented Oct 8, 2024

CoreyCook8 commented Oct 8, 2024

andyzhangx commented Oct 8, 2024

CoreyCook8 commented Sep 28, 2024 •

edited

Loading