Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Refactor ServiceMonitor template to avoid duplicates across releases #742

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

crackerben99
Copy link

Description:

This pull request refactors the ServiceMonitor template in the APISIX Helm Chart to ensure that only one ServiceMonitor is created per namespace, even if there are multiple releases of the Chart.

Problem:
Currently, each release of the APISIX Helm Chart creates its own ServiceMonitor. This can lead to duplicate ServiceMonitors and unnecessary overhead in Prometheus when there are multiple releases of the Chart in the same namespace.

Solution:
We've utilized Helm's hook mechanism to ensure that only one ServiceMonitor is created per namespace, regardless of the number of releases.

Changes:

Removed release-specific labels and selectors from the ServiceMonitor template. The ServiceMonitor now selects all services with the labels app.kubernetes.io/name: {{ include "apisix.name" . }} and app.kubernetes.io/service: apisix-gateway, regardless of the release.
Added pre-install and post-install hooks to the ServiceMonitor. Before creating a new ServiceMonitor, these hooks check if one already exists with the name apisix-service-monitor. If it exists, it's deleted before creating a new one. This ensures that there's always only one ServiceMonitor.
Set the hook deletion policy to before-hook-creation and hook-succeeded. This means that the hook resource is deleted before a new one is created (if one already exists), and it's also deleted after the ServiceMonitor is successfully created.
Implications:

All releases of the APISIX Helm Chart in the same namespace will share the same ServiceMonitor. This is suitable when all releases can share the same ServiceMonitor configuration.
If a release is uninstalled, it won't delete the ServiceMonitor, as it may still be used by other releases.
If the ServiceMonitor is manually deleted, upgrading a release will recreate it, but other releases might need a manual upgrade to reconnect to the new ServiceMonitor.
This approach can work well in many cases, especially when you have multiple releases that can all share the same ServiceMonitor configuration. For more complex scenarios, other strategies like using relabeling in the Prometheus configuration to handle different ServiceMonitors might be considered.

@crackerben99
Copy link
Author

crackerben99 commented Mar 26, 2024

If we deploy multi helm releases in one cluster, prometheus operator will create a large number of jobs,looks like:

- job_name: serviceMonitor/sa/release-yl363akodi-apisix-l7/0
  honor_labels: false
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - sa
  scrape_interval: 15s
  metrics_path: /metrics
  scheme: http
  relabel_configs:
  - source_labels:
    - job
    target_label: __tmp_prometheus_job_name
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_app_kubernetes_io_instance
    - __meta_kubernetes_service_labelpresent_app_kubernetes_io_instance
    regex: (release-yl363akodi);true
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_app_kubernetes_io_managed_by
    - __meta_kubernetes_service_labelpresent_app_kubernetes_io_managed_by
    regex: (Helm);true
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_app_kubernetes_io_name
    - __meta_kubernetes_service_labelpresent_app_kubernetes_io_name
    regex: (apisix-l7);true
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_app_kubernetes_io_service
    - __meta_kubernetes_service_labelpresent_app_kubernetes_io_service
    regex: (apisix-gateway);true
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_app_kubernetes_io_version
    - __meta_kubernetes_service_labelpresent_app_kubernetes_io_version
    regex: (2.15.3);true
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_helm_sh_chart
    - __meta_kubernetes_service_labelpresent_helm_sh_chart
    regex: (apisix-l7-1.0.1);true
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_lb_id
    - __meta_kubernetes_service_labelpresent_lb_id
    regex: (yl363akodi);true
  - action: keep
    source_labels:
    - __meta_kubernetes_pod_container_port_name
    regex: prometheus
  - source_labels:
    - __meta_kubernetes_endpoint_address_target_kind
    - __meta_kubernetes_endpoint_address_target_name
    separator: ;
    regex: Node;(.*)
    replacement: ${1}
    target_label: node
  - source_labels:
    - __meta_kubernetes_endpoint_address_target_kind
    - __meta_kubernetes_endpoint_address_target_name
    separator: ;
    regex: Pod;(.*)
    replacement: ${1}
    target_label: pod
  - source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: service
  - source_labels:
    - __meta_kubernetes_pod_name
    target_label: pod
  - source_labels:
    - __meta_kubernetes_pod_container_name
    target_label: container
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: job
    replacement: ${1}
  - target_label: endpoint
    replacement: prometheus
  - source_labels:
    - __address__
    target_label: __tmp_hash
    modulus: 1
    action: hashmod
  - source_labels:
    - __tmp_hash
    regex: 0
    action: keep
  metric_relabel_configs: []

Each job will scrape all the resources under the same namespace and then relabel to select its own targets. This leads to a significant amount of redundant work for Prometheus. For example, if there are 100 releases in the same namespace, Prometheus will scrape the same set of resources 100 times, and then each job will filter out its own targets through relabeling.

@Revolyssup
Copy link
Contributor

@crackerben99 Can you update the Values.yaml and add docs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants