Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added FailureDomains as Nutanix cluster mutation #684

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

deepakm-ntnx
Copy link
Contributor

@deepakm-ntnx deepakm-ntnx commented May 21, 2024

What problem does this PR solve?:
This PR allows us to configure failureDomains and allows use it for control plane and worker nodes to create a capx cluster on nutanix infra.

Which issue(s) this PR fixes:
Fixes # NCN-100421

How Has This Been Tested?:

Unit tests passed.

make test

Manual test passed.

make dev.run-on-kind
clusterctl generate cluster nutanix-quick-start-helm-addon-cilium \
  --from examples/capi-quick-start/nutanix-cluster-cilium-helm-addon.yaml \
  --kubernetes-version v1.30.0 \
  --worker-machine-count 1 > c2.yaml
# update c2.yaml to have failureDomains added and set respective MD property. more info in docs
kubectl apply --server-side -f ./c2.yaml

Then observed that respective control plane and worker nodes are created on respective FD

Special notes for your reviewer:

Copy link
Contributor

@faiq faiq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good from a high level

type NutanixFailureDomains []NutanixFailureDomain

// NutanixFailureDomain configures failure domain information for Nutanix.
type NutanixFailureDomain struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some validation we want to do with PE Clusters and Subnets, like setting either this or those values there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capx handles it by checking fd set or not first. https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.4.0-alpha.2/controllers/nutanixmachine_controller.go#L834 need to see if we need to handle that in caren since caren does not have any notion of sequence of which patch will get called first. do you have any proposals?

api/v1alpha1/nutanix_clusterconfig_types.go Show resolved Hide resolved
api/v1alpha1/nutanix_clusterconfig_types.go Outdated Show resolved Hide resolved
api/v1alpha1/nutanix_clusterconfig_types.go Show resolved Hide resolved
@deepakm-ntnx deepakm-ntnx force-pushed the jira/NCN-100421 branch 4 times, most recently from 98becfe to 964389e Compare May 22, 2024 23:07
@deepakm-ntnx deepakm-ntnx changed the title NCN-100421 Added FailureDomains as mutation in caren feature: Added FailureDomains as mutation in caren May 22, 2024
@deepakm-ntnx deepakm-ntnx enabled auto-merge (squash) May 22, 2024 23:16
@deepakm-ntnx deepakm-ntnx force-pushed the jira/NCN-100421 branch 3 times, most recently from 3f6e26e to 20ed508 Compare May 23, 2024 19:47
@deepakm-ntnx
Copy link
Contributor Author

deepakm-ntnx commented May 23, 2024

Just a note: to be debugged
per NCN-100947,

  • case 1: if we create a cluster with failureDomains while creating the cluster then vms do get created on the respective FD
  • case 2: if we create a cluster first without failureDomains and later add FD with this PR changes, then VMs are getting created on the respective FD for MD. Control plane may be due no changes in PE remained same
  • case 3: change the pe in fd other than existing PEs used by control plane
1. Set FD directly
kubectl get Cluster nutanix-quick-start-helm-addon-cilium -o json | jq '.status.failureDomains'
{
  "fd1": {
    "controlPlane": true
  }
}

kubectl get Machine -A
NAMESPACE   NAME                                                           CLUSTER                                 NODENAME                                                       PROVIDERID                                       PHASE     AGE     VERSION
default     nutanix-quick-start-helm-addon-cilium-6n9sh-7kfcm              nutanix-quick-start-helm-addon-cilium   nutanix-quick-start-helm-addon-cilium-6n9sh-7kfcm              nutanix://02b5f9ea-9d1f-4346-ae9b-9bffbd6b1d05   Running   3m51s   v1.30.0
default     nutanix-quick-start-helm-addon-cilium-md-0-7xz9w-fdqnp-fh69s   nutanix-quick-start-helm-addon-cilium   nutanix-quick-start-helm-addon-cilium-md-0-7xz9w-fdqnp-fh69s   nutanix://cbb2869b-45b4-415c-91ce-a693d2039df4   Running   3m53s   v1.30.0


kubectl get Machine nutanix-quick-start-helm-addon-cilium-6n9sh-7kfcm -o json | jq '.spec.failureDomain'
"fd1"

kubectl get Machine nutanix-quick-start-helm-addon-cilium-md-0-7xz9w-fdqnp-fh69s -o json | jq '.spec.failureDomain'
"fd1"


2. Set cp without fd and then set fd later

kubectl get Cluster nutanix-quick-start-helm-addon-cilium -o json | jq '.status.failureDomains'
null

kubectl get Machine -A
NAMESPACE   NAME                                                           CLUSTER                                 NODENAME   PROVIDERID   PHASE          AGE   VERSION
default     nutanix-quick-start-helm-addon-cilium-md-0-wvh7p-zr4zq-8rd64   nutanix-quick-start-helm-addon-cilium                           Pending        30s   v1.30.0
default     nutanix-quick-start-helm-addon-cilium-rdcpp-wsc2v              nutanix-quick-start-helm-addon-cilium                           Provisioning   27s   v1.30.0

kubectl get Machine nutanix-quick-start-helm-addon-cilium-rdcpp-wsc2v -o json |  jq '.spec.failureDomain' 
null

kubectl get Machine nutanix-quick-start-helm-addon-cilium-md-0-wvh7p-zr4zq-8rd64 -o json |  jq '.spec.failureDomain'
null

NOW set the fd to the same cluster and apply yaml

kubectl get Machine -A
NAMESPACE   NAME                                                           CLUSTER                                 NODENAME                                                       PROVIDERID                                       PHASE     AGE     VERSION
default     nutanix-quick-start-helm-addon-cilium-md-0-wvh7p-jjq5c-l799c   nutanix-quick-start-helm-addon-cilium   nutanix-quick-start-helm-addon-cilium-md-0-wvh7p-jjq5c-l799c   nutanix://298472a7-dfdc-4020-b7ae-4a4825cb9166   Running   119s    v1.30.0
default     nutanix-quick-start-helm-addon-cilium-rdcpp-wsc2v              nutanix-quick-start-helm-addon-cilium   nutanix-quick-start-helm-addon-cilium-rdcpp-wsc2v              nutanix://15a3e683-61b8-47d2-b250-05c7c1b857c5   Running   4m29s   v1.30.0

kubectl get Cluster nutanix-quick-start-helm-addon-cilium -o json | jq '.status.failureDomains'
{
  "fd1": {
    "controlPlane": true
  }
}

# Note no changes to control plane machine as it was on same PE I think but no FD got set
kubectl get Machine nutanix-quick-start-helm-addon-cilium-rdcpp-wsc2v -o json |  jq '.spec.failureDomain'
null

kubectl get Machine nutanix-quick-start-helm-addon-cilium-md-0-wvh7p-jjq5c-l799c -o json |  jq '.spec.failureDomain'
"fd1"

per above observation, we may have to ask user to remove cluster and subnets from cp nmt (currently capx supports that but caren does not yet) by making cp nmt machinedetails of controlplane with optional clusters and subnets (currently its required))

@jimmidyson jimmidyson changed the title feature: Added FailureDomains as mutation in caren feature: Added FailureDomains as Nutanix cluster mutation May 24, 2024
@jimmidyson jimmidyson changed the title feature: Added FailureDomains as Nutanix cluster mutation feat: Added FailureDomains as Nutanix cluster mutation May 24, 2024
@jimmidyson jimmidyson disabled auto-merge June 24, 2024 16:55
@jimmidyson jimmidyson marked this pull request as draft June 24, 2024 16:55
@deepakm-ntnx deepakm-ntnx force-pushed the jira/NCN-100421 branch 2 times, most recently from 75eee8c to 0e5febe Compare July 9, 2024 20:38
@deepakm-ntnx
Copy link
Contributor Author

I would recommend to approve and merge this PR so that we can allow users to enable them manually if needed in the field if they need control planes on multiple PEs in 2.12. since its option, we dont have to expose it in the documentation

@deepakm-ntnx deepakm-ntnx marked this pull request as ready for review September 6, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants