Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x](backport #41570) Metricbeat: add configurable failure threshold before reporting streams as degraded #41685

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Nov 19, 2024

Proposed commit message

Add configurable failure threshold before reporting streams as degraded

With this change it is possible to configure a threshold for the number of consecutive errors that may happen while fetching metrics for a given stream before the stream gets marked as DEGRADED.
To configure such threshold, add a "failure_threshold": <n> to a module configuration block.
Depending on the value of <n> the threshold will be configured in different ways:

  • n == 0: status reporting for the stream has been disabled, the stream will never become DEGRADED no matter how many errors are encountered while fetching metrics
  • n==1 or failure_threshold not specified: backward compatible behavior, the stream will become DEGRADED at the first error encountered
  • n > 1: stream will become DEGRADED after at least n consecutive errors have been encountered

When a fetch operation completes without errors the consecutive errors counter is reset and the stream is set to HEALTHY.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

No disruptive user impact since not specifying the new configuration key maintains the previous behavior

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs


This is an automatic backport of pull request #41570 done by [Mergify](https://mergify.com).

…ms as degraded (#41570)

* Metricbeat: add configurable failure threshold before reporting streams as degraded

With this change it is possible to configure a threshold for the number of consecutive errors that may happen while fetching metrics for a given stream before the stream gets marked as DEGRADED.
To configure such threshold, add a "failure_threshold": <n> to a module configuration block.
Depending on the value of <n> the threshold will be configured in different ways:

    n == 0: status reporting for the stream has been disabled, the stream will never become DEGRADED no matter how many errors are encountered while fetching metrics
    n==1 or failure_threshold not specified: backward compatible behavior, the stream will become DEGRADED at the first error encountered
    n > 1: stream will become DEGRADED after at least n consecutive errors have been encountered

When a fetch operation completes without errors the consecutive errors counter is reset and the stream is set to HEALTHY.

(cherry picked from commit f84c05b)
@mergify mergify bot requested a review from a team as a code owner November 19, 2024 13:34
@mergify mergify bot added the backport label Nov 19, 2024
@mergify mergify bot requested review from belimawr and VihasMakwana and removed request for a team November 19, 2024 13:34
@mergify mergify bot assigned pchila Nov 19, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 19, 2024
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Nov 19, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 19, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@pierrehilbert pierrehilbert added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team and removed Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Nov 19, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@pierrehilbert pierrehilbert requested review from pchila and removed request for belimawr and VihasMakwana November 19, 2024 17:06
@pchila pchila merged commit db727d0 into 8.x Nov 20, 2024
31 checks passed
@pchila pchila deleted the mergify/bp/8.x/pr-41570 branch November 20, 2024 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants