Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/updating heart beat check #7

Merged
merged 6 commits into from
Nov 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: GlueOps Action
name: Build Image

on: [push]

Expand All @@ -7,4 +7,4 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Build, Tag and Push Docker Image to GHCR
uses: GlueOps/github-actions-build-push-containers@v0.1.3
uses: GlueOps/github-actions-build-push-containers@main
160 changes: 159 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,160 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
__pycache__
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
16 changes: 6 additions & 10 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
# Use an official Python runtime as the parent image
FROM python:3.11.4-bullseye
FROM python:3.11.6-alpine3.18 as final

# Set the working directory in the container to /app
WORKDIR /app
COPY requirements.txt /app/
RUN pip install --upgrade pip && pip install -r requirements.txt

# Copy the current directory contents into the container at /app
COPY . /app
COPY monitoring_script.py /app/
COPY serviceconfig.py /app/

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Run main.py when the container launches
CMD ["python", "-u", "main.py"]
CMD [ "python", "-u", "monitoring_script.py" ]
41 changes: 27 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,32 @@
# Cluster Monitoring
# GLUEOPS CLUSTER MONITORING

This repo contains a script that sends a ping to Opsgenie's Heartbeat every 5 minutes. If the cluster is down, Opsgenie will not get a ping and will send a mail to inform team members of the cluster's failed state.
This application is designed for monitoring a Kubernetes cluster with the Prometheus and Alertmanager components from the Kubernetes Prometheus Stack (KPS).

The script is deployed into the ArgoCD cluster under monitoring. Once this cluster is down, pings will not be sent to Opsgenie, triggering an alert which is sent to concerned team members.
## Configuration

## Running the script
Before running the application, make sure to configure the following environment variables:

- Create a ```.env``` file, with the following contents
```bash
OPSGENIE_API_KEY=<some-value>
HEARTBEAT_NAME=<some-value>
PING_SLEEP=<some-value>
```
- `OPSGENIE_API_KEY`: Your Opsgenie API key for sending heartbeat notifications.
- `OPSGENIE_HEARTBEAT_NAME`: The name of the Opsgenie heartbeat to ping.
- `OPSGENIE_PING_INTERVAL_MINUTES`: The interval (in minutes) between pinging the Opsgenie heartbeat (default: 2 minutes).

- Runing the script
```bash
$ docker run --env-file .env ghcr.io/glueops/cluster-monitoring:main
```
## Running in a Kubernetes Cluster

To run this application within a Kubernetes cluster, follow these steps:

1. Ensure your Kubernetes cluster is up and running.
2. Deploy the application with the configured environment variables.
3. The application will automatically detect its environment and use in-cluster URLs for Prometheus and Alertmanager.

## Running Locally for Debugging

To run this application locally for debugging purposes and access Prometheus and Alertmanager, you can set up port forwarding to your cluster. Follow these steps:

1. Ensure you have `kubectl` installed and configured to communicate with your Kubernetes cluster.
2. Identify the Prometheus and Alertmanager pods in your cluster:

```bash
# For Prometheus
kubectl port-forward svc/kps-prometheus 9090:9090 -n glueops-core-kube-prometheus-stack
# For Alertmanager
kubectl port-forward svc/kps-alertmanager 9093:9093 -n glueops-core-kube-prometheus-stack
16 changes: 0 additions & 16 deletions json_log_formatter.py

This file was deleted.

90 changes: 0 additions & 90 deletions main.py

This file was deleted.

Loading