Fix: Refactor ServiceMonitor template to avoid duplicates across releases #742

crackerben99 · 2024-03-26T08:41:01Z

Description:

This pull request refactors the ServiceMonitor template in the APISIX Helm Chart to ensure that only one ServiceMonitor is created per namespace, even if there are multiple releases of the Chart.

Problem:
Currently, each release of the APISIX Helm Chart creates its own ServiceMonitor. This can lead to duplicate ServiceMonitors and unnecessary overhead in Prometheus when there are multiple releases of the Chart in the same namespace.

Solution:
We've utilized Helm's hook mechanism to ensure that only one ServiceMonitor is created per namespace, regardless of the number of releases.

Changes:

Removed release-specific labels and selectors from the ServiceMonitor template. The ServiceMonitor now selects all services with the labels app.kubernetes.io/name: {{ include "apisix.name" . }} and app.kubernetes.io/service: apisix-gateway, regardless of the release.
Added pre-install and post-install hooks to the ServiceMonitor. Before creating a new ServiceMonitor, these hooks check if one already exists with the name apisix-service-monitor. If it exists, it's deleted before creating a new one. This ensures that there's always only one ServiceMonitor.
Set the hook deletion policy to before-hook-creation and hook-succeeded. This means that the hook resource is deleted before a new one is created (if one already exists), and it's also deleted after the ServiceMonitor is successfully created.
Implications:

All releases of the APISIX Helm Chart in the same namespace will share the same ServiceMonitor. This is suitable when all releases can share the same ServiceMonitor configuration.
If a release is uninstalled, it won't delete the ServiceMonitor, as it may still be used by other releases.
If the ServiceMonitor is manually deleted, upgrading a release will recreate it, but other releases might need a manual upgrade to reconnect to the new ServiceMonitor.
This approach can work well in many cases, especially when you have multiple releases that can all share the same ServiceMonitor configuration. For more complex scenarios, other strategies like using relabeling in the Prometheus configuration to handle different ServiceMonitors might be considered.

crackerben99 · 2024-03-26T08:44:01Z

If we deploy multi helm releases in one cluster, prometheus operator will create a large number of jobs,looks like:

- job_name: serviceMonitor/sa/release-yl363akodi-apisix-l7/0
  honor_labels: false
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - sa
  scrape_interval: 15s
  metrics_path: /metrics
  scheme: http
  relabel_configs:
  - source_labels:
    - job
    target_label: __tmp_prometheus_job_name
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_app_kubernetes_io_instance
    - __meta_kubernetes_service_labelpresent_app_kubernetes_io_instance
    regex: (release-yl363akodi);true
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_app_kubernetes_io_managed_by
    - __meta_kubernetes_service_labelpresent_app_kubernetes_io_managed_by
    regex: (Helm);true
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_app_kubernetes_io_name
    - __meta_kubernetes_service_labelpresent_app_kubernetes_io_name
    regex: (apisix-l7);true
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_app_kubernetes_io_service
    - __meta_kubernetes_service_labelpresent_app_kubernetes_io_service
    regex: (apisix-gateway);true
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_app_kubernetes_io_version
    - __meta_kubernetes_service_labelpresent_app_kubernetes_io_version
    regex: (2.15.3);true
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_helm_sh_chart
    - __meta_kubernetes_service_labelpresent_helm_sh_chart
    regex: (apisix-l7-1.0.1);true
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_lb_id
    - __meta_kubernetes_service_labelpresent_lb_id
    regex: (yl363akodi);true
  - action: keep
    source_labels:
    - __meta_kubernetes_pod_container_port_name
    regex: prometheus
  - source_labels:
    - __meta_kubernetes_endpoint_address_target_kind
    - __meta_kubernetes_endpoint_address_target_name
    separator: ;
    regex: Node;(.*)
    replacement: ${1}
    target_label: node
  - source_labels:
    - __meta_kubernetes_endpoint_address_target_kind
    - __meta_kubernetes_endpoint_address_target_name
    separator: ;
    regex: Pod;(.*)
    replacement: ${1}
    target_label: pod
  - source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: service
  - source_labels:
    - __meta_kubernetes_pod_name
    target_label: pod
  - source_labels:
    - __meta_kubernetes_pod_container_name
    target_label: container
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: job
    replacement: ${1}
  - target_label: endpoint
    replacement: prometheus
  - source_labels:
    - __address__
    target_label: __tmp_hash
    modulus: 1
    action: hashmod
  - source_labels:
    - __tmp_hash
    regex: 0
    action: keep
  metric_relabel_configs: []

Each job will scrape all the resources under the same namespace and then relabel to select its own targets. This leads to a significant amount of redundant work for Prometheus. For example, if there are 100 releases in the same namespace, Prometheus will scrape the same set of resources 100 times, and then each job will filter out its own targets through relabeling.

Revolyssup · 2024-07-08T07:55:34Z

@crackerben99 Can you update the Values.yaml and add docs?

Update service-monitor.yaml with annotations for Helm hooks

4facac2

Update service-monitor.yaml configuration

4762603

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Refactor ServiceMonitor template to avoid duplicates across releases #742

Fix: Refactor ServiceMonitor template to avoid duplicates across releases #742

crackerben99 commented Mar 26, 2024

crackerben99 commented Mar 26, 2024 •

edited

Loading

Revolyssup commented Jul 8, 2024

Fix: Refactor ServiceMonitor template to avoid duplicates across releases #742

Are you sure you want to change the base?

Fix: Refactor ServiceMonitor template to avoid duplicates across releases #742

Conversation

crackerben99 commented Mar 26, 2024

crackerben99 commented Mar 26, 2024 • edited Loading

Revolyssup commented Jul 8, 2024

crackerben99 commented Mar 26, 2024 •

edited

Loading