-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛(metrics) Initialize metrics for autoscaler errors, scale events, and pod evictions #7449
base: master
Are you sure you want to change the base?
🐛(metrics) Initialize metrics for autoscaler errors, scale events, and pod evictions #7449
Conversation
Welcome @thiha-min-thant! |
Hi @thiha-min-thant. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: thiha-min-thant The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @JoelSpeed @elmiko , could you please review this PR? Thanks! |
5727d50
to
906dadc
Compare
thanks for picking this up @thiha-min-thant ! |
…d pod evictions - Set initial count to zero for various autoscaler error types (e.g., CloudProviderError, ApiCallError) - Define failed scale-up reasons and initialize metrics (e.g., CloudProviderError, APIError) - Initialize pod eviction result counters for success and failure cases - Initialize skipped scale events for CPU and memory resource limits in both scale-up and scale-down directions Signed-off-by: Thiha Min Thant <thihaminthant20@gmail.com>
906dadc
to
ffd57af
Compare
Hi @JoelSpeed and @elmiko, I've made updates to both the code and the PR description to clarify the initialization of metrics and added a metrics log file as a reference. Thanks for your feedback, and please take a look at these updates when you have a chance! |
Do you know if the following also need initialisation?
|
|
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great, will make it easier if we need to update in the future.
/lgtm
Hi @JoelSpeed and @elmiko, If the code looks good to both of you, could we proceed with the merge? We’re just one /approve label away. Thanks for your review! |
/assign @BigDarkClown @BigDarkClown the bot of assignment chose you for this one, could we get this approved? |
@thiha-min-thant we need one of the maintainers for this area of the code to approve it. that's the only thing we are waiting on currently. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR initializes the failedScaleUpCount and other key metrics at startup, setting their values to zero so they appear in Prometheus even if no events have occurred. By pre-defining these metrics, we ensure comprehensive monitoring and avoid gaps in visibility, particularly for scale-up and scale-down events, error types, and pod evictions.
Which issue(s) this PR fixes:
Fixes #7448
Special notes for your reviewer:
Certain node metrics have not been initialized in this PR because they require runtime information. These metrics are tied to dynamic node states and cannot be set at startup.
Metrics Log Reference:
metrics.txt
Metrics requiring runtime data include:
Does this PR introduce a user-facing change?
NONE
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: