Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: alerts: Alert when a node evicts pods for any reason #273

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

smarterclayton
Copy link
Contributor

Evictions are unusual and may be a sign of resource pressure. While
it is not a strong error signal, it is important to know when these
events occur as they may be symptomatic of workload, node, or cluster
problems. In a healthy cluster, eviction should be rare.

The kubelet_evictions metric was added in Kubernetes 1.16.

Still testing this alert

Evictions are unusual and may be a sign of resource pressure. While
it is not a strong error signal, it is important to know when these
events occur as they may be symptomatic of workload, node, or cluster
problems. In a healthy cluster, eviction should be rare.

The kubelet_evictions metric was added in Kubernetes 1.16.
@smarterclayton
Copy link
Contributor Author

Bumped but still testing this (it's hard to trigger evictions on nodes surprisingly)

@csmarchbanks
Copy link
Member

This might just be our systems, but evictions are actually a pretty common occurrence in many of our clusters. We purposefully have some pods around to provide a buffer that can be quickly evicted and the capacity used while waiting for the cluster autoscaler to add capacity.

To get some evictions, you could create a PriorityClass with value -1 and deploy some pods using it. Then any other deployments can easily evict one of those.

@smarterclayton
Copy link
Contributor Author

That would be preemption (which we should also alert on), but eviction is more like "You use too much local filesystem". Priority class impacts eviction, but less so than preemption.

Copy link

This PR has been automatically marked as stale because it has not
had any activity in the past 30 days.

The next time this stale check runs, the stale label will be
removed if there is new activity. The issue will be closed in 7
days if there is no new activity.

Thank you for your contributions!

@github-actions github-actions bot added the stale label Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants