-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b5c9125
commit a725040
Showing
5 changed files
with
109 additions
and
165 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,29 +1,10 @@ | ||
# --- Stage 1: Testing --- | ||
# Use the same base image for consistency | ||
FROM python:3.11.6-alpine3.18 as tester | ||
|
||
# Set working directory | ||
WORKDIR /app | ||
|
||
# Install dependencies | ||
COPY requirements.txt /app/ | ||
RUN pip install --upgrade pip && pip install -r requirements.txt && pip install pytest | ||
|
||
# Copy your application code and test files | ||
COPY monitoring_script.py /app/ | ||
COPY test_monitoring_script.py /app/ | ||
|
||
# Run tests | ||
RUN python -u -m pytest -v test_monitoring_script.py | ||
|
||
# --- Stage 2: Final Image --- | ||
FROM python:3.11.6-alpine3.18 as final | ||
|
||
WORKDIR /app | ||
COPY --from=tester /app/requirements.txt /app/ | ||
COPY requirements.txt /app/ | ||
RUN pip install --upgrade pip && pip install -r requirements.txt | ||
|
||
# Copy only the necessary files from the tester stage | ||
COPY --from=tester /app/monitoring_script.py /app/ | ||
COPY monitoring_script.py /app/ | ||
COPY serviceconfig.py /app/ | ||
|
||
CMD [ "python", "-u", "monitoring_script.py" ] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,32 @@ | ||
# Cluster Monitoring | ||
# GLUEOPS CLUSTER MONITORING | ||
|
||
This repo contains a script that sends a ping to Opsgenie's Heartbeat every 5 minutes. If the cluster is down, Opsgenie will not get a ping and will send a mail to inform team members of the cluster's failed state. | ||
This application is designed for monitoring a Kubernetes cluster with the Prometheus and Alertmanager components from the Kubernetes Prometheus Stack (KPS). | ||
|
||
The script is deployed into the ArgoCD cluster under monitoring. Once this cluster is down, pings will not be sent to Opsgenie, triggering an alert which is sent to concerned team members. | ||
## Configuration | ||
|
||
## Running the script | ||
Before running the application, make sure to configure the following environment variables: | ||
|
||
- Create a ```.env``` file, with the following contents | ||
```bash | ||
OPSGENIE_API_KEY=<some-value> | ||
HEARTBEAT_NAME=<some-value> | ||
PING_SLEEP=<some-value> | ||
``` | ||
- `OPSGENIE_API_KEY`: Your Opsgenie API key for sending heartbeat notifications. | ||
- `OPSGENIE_HEARTBEAT_NAME`: The name of the Opsgenie heartbeat to ping. | ||
- `OPSGENIE_PING_INTERVAL_MINUTES`: The interval (in minutes) between pinging the Opsgenie heartbeat (default: 2 minutes). | ||
|
||
- Runing the script | ||
```bash | ||
$ docker run --env-file .env ghcr.io/glueops/cluster-monitoring:main | ||
``` | ||
## Running in a Kubernetes Cluster | ||
|
||
To run this application within a Kubernetes cluster, follow these steps: | ||
|
||
1. Ensure your Kubernetes cluster is up and running. | ||
2. Deploy the application with the configured environment variables. | ||
3. The application will automatically detect its environment and use in-cluster URLs for Prometheus and Alertmanager. | ||
|
||
## Running Locally for Debugging | ||
|
||
To run this application locally for debugging purposes and access Prometheus and Alertmanager, you can set up port forwarding to your cluster. Follow these steps: | ||
|
||
1. Ensure you have `kubectl` installed and configured to communicate with your Kubernetes cluster. | ||
2. Identify the Prometheus and Alertmanager pods in your cluster: | ||
|
||
```bash | ||
# For Prometheus | ||
kubectl port-forward svc/kps-prometheus 9090:9090 -n glueops-core-kube-prometheus-stack | ||
# For Alertmanager | ||
kubectl port-forward svc/kps-alertmanager 9093:9093 -n glueops-core-kube-prometheus-stack |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
import os | ||
|
||
class ServiceConfig: | ||
def __init__(self): | ||
if os.getenv('KUBERNETES_SERVICE_HOST'): | ||
print("Setting up for Kubernetes environment.") | ||
self._setup_kubernetes_config() | ||
else: | ||
print("Setting up for local environment.") | ||
self._setup_local_config() | ||
|
||
# New environment variable settings | ||
self.OPSGENIE_API_KEY = os.getenv('OPSGENIE_API_KEY') | ||
self.OPSGENIE_HEARTBEAT_NAME = os.getenv('OPSGENIE_HEARTBEAT_NAME') | ||
self.OPSGENIE_PING_INTERVAL_MINUTES = int(os.getenv('OPSGENIE_PING_INTERVAL_MINUTES', 3)) | ||
|
||
|
||
def _setup_kubernetes_config(self): | ||
suffix = "glueops-core-kube-prometheus-stack.svc.cluster.local" | ||
self.prometheus = f"kps-prometheus.{suffix}:9090" | ||
self.alertmanager = f"kps-alertmanager.{suffix}:9093" | ||
self._set_urls() | ||
|
||
def _setup_local_config(self): | ||
self.prometheus = "localhost:9090" | ||
self.alertmanager = "localhost:9093" | ||
self._set_urls() | ||
|
||
def _set_urls(self): | ||
self.PROMETHEUS_URL_HEALTH = f"http://{self.prometheus}/-/healthy" | ||
self.ALERTMANAGER_URL_HEALTH = f"http://{self.alertmanager}/-/healthy" | ||
self.PROMETHEUS_URL_READY = f"http://{self.prometheus}/-/ready" | ||
self.ALERTMANAGER_URL_READY = f"http://{self.alertmanager}/-/ready" | ||
self.PROMETHEUS_QUERY_URL = f"http://{self.prometheus}/api/v1/query" |
This file was deleted.
Oops, something went wrong.