-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA: refactor utils related to NodeInfos #7479
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: towca The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @MaciekPytel |
f013ec0
to
5a9ff97
Compare
5a9ff97
to
3a7c44f
Compare
/assign @BigDarkClown |
3a7c44f
to
e0024b0
Compare
e0024b0
to
a7eea5e
Compare
@@ -34,7 +34,7 @@ import ( | |||
"k8s.io/autoscaler/cluster-autoscaler/core/scaledown/planner" | |||
scaledownstatus "k8s.io/autoscaler/cluster-autoscaler/core/scaledown/status" | |||
"k8s.io/autoscaler/cluster-autoscaler/core/scaleup" | |||
orchestrator "k8s.io/autoscaler/cluster-autoscaler/core/scaleup/orchestrator" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yikes
simulator.BuildNodeInfoForNode, core_utils.GetNodeInfoFromTemplate, and scheduler_utils.DeepCopyTemplateNode all had very similar logic for sanitizing and copying NodeInfos. They're all consolidated to one file in simulator, sharing common logic. DeepCopyNodeInfo is changed to be a framework.NodeInfo method. MixedTemplateNodeInfoProvider now correctly uses ClusterSnapshot to correlate Nodes to scheduled pods, instead of using a live Pod lister. This means that the snapshot now has to be properly initialized in a bunch of tests.
a7eea5e
to
89a5259
Compare
} | ||
nodeInfo, err := simulator.BuildNodeInfoForNode(sanitizedNode, podsForNodes[node.Name], daemonsets, p.forceDaemonSets) | ||
templateNodeInfo, caErr := simulator.TemplateNodeInfoFromExampleNodeInfo(nodeInfo, id, daemonsets, p.forceDaemonSets, taintConfig) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be if caErr != nil {
here?
(also do we need to define a new caErr
variable here instead of just re-using err
?)
for _, slice := range n.LocalResourceSlices { | ||
newSlices = append(newSlices, slice.DeepCopy()) | ||
} | ||
return NewNodeInfo(n.Node().DeepCopy(), newSlices, newPods...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the NewNodeInfo
constructor only sets a node object if the passed in node
is not nil:
if node != nil {
result.schedNodeInfo.SetNode(node)
}
... invoking n.Node().DeepCopy(),
inline like this might be (theoretically) subject to a nil pointer exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind, you can ignore this comment
// Node returns overall information about this node.
func (n *NodeInfo) Node() *v1.Node {
if n == nil {
return nil
}
return n.node
}
id := nodeGroup.Id() | ||
baseNodeInfo, err := nodeGroup.TemplateNodeInfo() | ||
if err != nil { | ||
return nil, errors.ToAutoscalerError(errors.CloudProviderError, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this error response too generic?
// TemplateNodeInfoFromNodeGroupTemplate returns a template NodeInfo object based on NodeGroup.TemplateNodeInfo(). The template is sanitized, and only | ||
// contains the pods that should appear on a new Node from the same node group (e.g. DaemonSet pods). | ||
func TemplateNodeInfoFromNodeGroupTemplate(nodeGroup nodeGroupTemplateNodeInfoGetter, daemonsets []*appsv1.DaemonSet, taintConfig taints.TaintConfig) (*framework.NodeInfo, errors.AutoscalerError) { | ||
id := nodeGroup.Id() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to assign this to a var
return TemplateNodeInfoFromExampleNodeInfo(baseNodeInfo, id, daemonsets, true, taintConfig) | ||
} | ||
|
||
// TemplateNodeInfoFromExampleNodeInfo returns a template NodeInfo object based on a real example NodeInfo from the cluster. The template is sanitized, and only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I love the term "example" here. Would TemplateNodeInfoFromNode
or TemplateNodeInfoFromRealNode
or TemplateNodeInfoFromRealNodeInfo
work? Then we'd document like this
// TemplateNodeInfoFromNode returns a template NodeInfo object based on a NodeInfo from a real node on the cluster. The template is sanitized, and only
// We need to sanitize the node before determining the DS pods, since taints are checked there, and
// we might need to filter some out during sanitization.
sanitizedNode := sanitizeNodeInfo(realNode, newNodeNameBase, randSuffix, &taintConfig)
// No need to sanitize the expected pods again - they either come from sanitizedNode and were sanitized above,
etc.
I think my observation is that the word "example" suggests something non-real, mock object, something like that.
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This is a part of Dynamic Resource Allocation (DRA) support in Cluster Autoscaler. There were multiple very similar utils related to copying and sanitizing NodeInfos scattered around the CA codebase. Instead of adding similar DRA handling to all of them separately, they're consolidated into a single location that will be later adapted to handle DRA.
Which issue(s) this PR fixes:
The CA/DRA integration is tracked in kubernetes/kubernetes#118612, this is just part of the implementation.
Special notes for your reviewer:
The first commit in the PR is just a squash of #7466, and it shouldn't be a part of this review. The PR will be rebased on top of master after #7466 is merged.
This is intended to be a no-op refactor. It was extracted from #7350 after #7447, and #7466.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: