Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force to invoke all CNI plugin's delete at pods' tearing down #86

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

s1061123
Copy link
Contributor

In case of multiple interfaces in pod, when the pod is deleted,
forEachnetwork() is called with multiple network attachments.
If forEachnetwork() causes the error at the middle of processing networks,
then forEachnetwork() just returns and following network is not processed.
From CNI runtime point of view, all CNI plugin should be invoked to delete
interfaces.

This change introduce 'force' option in forEachnetwork() and try to
continue to process (i.e. delete network) even though forEachnetwork()
causes the error.

@openshift-ci-robot
Copy link

@s1061123: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the dco-signoff: no Indicates the PR's author has not DCO signed all their commits. label Mar 26, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: s1061123
To complete the pull request process, please assign rajatchopra after the PR has been reviewed.
You can assign the PR to them by writing /assign @rajatchopra in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Mar 26, 2021
In case of multiple interfaces in pod, when the pod is deleted,
forEachnetwork() is called with multiple network attachments.
If forEachnetwork() causes the error at the middle of processing networks,
then forEachnetwork() just returns and following network is not processed.
From CNI runtime point of view, all CNI plugin should be invoked to delete
interfaces.

This change introduce 'force' option in forEachnetwork() and try to
continue to process (i.e. delete network) even though forEachnetwork()
causes the error.

Signed-off-by: Tomofumi Hayashi <tohayash@redhat.com>
@s1061123 s1061123 force-pushed the fix/force-remove-cni-at-error branch from c20b9df to 5103466 Compare March 26, 2021 20:49
@openshift-ci-robot openshift-ci-robot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Mar 26, 2021
@s1061123
Copy link
Contributor Author

/assign @rajatchopra

@dcbw
Copy link
Collaborator

dcbw commented Mar 30, 2021

@s1061123 what's the error we're getting from forEachNetwork()?

@s1061123
Copy link
Contributor Author

s1061123 commented Mar 30, 2021

@dcbw fmt.Errorf("network %q requested interface name %q already assigned", req.Name, req.Ifname)
https://github.com/cri-o/ocicni/pull/86/files#diff-bc84663df95b74e63e2857796dcbf5c05eb68f3ee39de405f636cf42923b9e86L459-L460

@s1061123
Copy link
Contributor Author

@dcbw https://github.com/cri-o/ocicni/blob/master/pkg/ocicni/ocicni.go#L455-L464 here, forEachNetwork() looks all network and interface name, then check allIfNames[<ifname>] = true. If network's interface is dupped then allIfNames[<ifname>] is true.

Then current code just returns, as return fmt.Errorf("network %q requested interface name %q already assigned", req.Name, req.Ifname), even though remained interface is waiting for deletion. That is the issue.

@dcbw
Copy link
Collaborator

dcbw commented Mar 30, 2021

@s1061123 what I"m trying to think of is why there would be dupes? There should only be a single cache file per network for the pod, each with a different interface name...

@dcbw
Copy link
Collaborator

dcbw commented Mar 30, 2021

Is it the case that the multus network (that CRIO calls directly) has the same ifname as the default CNI network that is called by multus? And during teardown, ocicni finds both files even though it was only ever called for the Multus network?

If so, that's the detail I wasn't grasping this morning on the call.

@s1061123
Copy link
Contributor Author

@dcbw right. So the case I met is following process:

  1. ocicni invokes multus-cni -> generate a cache (from multus-cni results, multus-net-name + eth0)
  2. multus-cni invokes delegate plugin (e.g. flannel) -> generate a cache (from flannel results, flannel-net-name + eth0)
  3. multus-cni invokes net-attach-def plugin (e.g. macvlan) -> generate a cache (from macvlan results, net-attach-def-name + net1)

@dcbw
Copy link
Collaborator

dcbw commented Apr 7, 2021

@s1061123 I do think the "cleanest" solution to this is to have multus give ocicni a different cache dir so that we can cleanly separate the kubelet -> Multus calls from the Mutlus -> delegate calls.

However, we could just ignore the interface conflicts on DEL since DEL is supposed to be permissive anyway. Would that work for you?

@s1061123
Copy link
Contributor Author

s1061123 commented Apr 7, 2021

@dcbw For multus case, I've already taken care of it in k8snetworkplumbingwg/multus-cni#638 So ocicni with multus does not have such issue so far.

This PR is for non multus-cni case. Currently, as far as I know, ocicni seems to support multiple CNI invocation (as type cniNetworkPlugin's networks are map[string]*cniNetwork). So ocicni may generates one or more CNI caches without multus cni.

CNI runtime should invoke CNI plugin's DEL command even if some error is happen, to prevent resource leak. However, ocicni may not invoke plugin's DEL command and may causes resource leak, if

  • cniNetworkPlugin has multiple CNI networks and
  • forEachNetwork() returned with error.

forEachNetwork() may be returned with error if one of following cases:

That's why I filed this PR.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants