tethys aims to facilitate the integration of monitoring stack using the well-known Prometheus, Grafana and Alertmanager, in order to monitor several products (node, kubernetes, apache, elasticsearch, etc).
We also provide a collection of Prometheus exporters that collects metrics for several products.
Here is an overview :
Because we are used to integrating monitoring stacks for our clients, we have concluded several things :
- there are a ton of Prometheus exporters in the community, even for the same product. So, it is hard an boring to choose the good one. We provide a list of Prometheus exporters for each product you want to monitor, that just works with our stack.
- there are a ton of Grafana dashboars in the community, even for the same product. It is hard an boring to choose the good one. We provide Grafana dashboards for each product you want to monitor, that just works with our stack.
- glue between Prometheus exporters, Prometheus servers, Prometheus federation server, Grafana and Alertmanager is a complex task and requires a certain expertise to be maintained. We've decided to add an abstraction layer to facilitate the integration and the glue.
- because we had a monitoring stack for each customer which operated independently, it was becoming complicated on a daily basis to consult and maintain the stack of each customer. tethys provides a simplified customer-centric view, and simplify multi-tenant configuration of the stack.
For all these reasons, we decided to create tethys.
A list of clients to federate. Each client in clients
has the following variables :
name
: define the client name that will be used accross all the stack to identify the client, and to add filtering on queries.prometheus_federation
: a list of prometheuses servers to federate for this client. It allows to retrieve data metrics from specified prometheuses servers. Each federated prometheus inprometheus_federation
has the following variables :name
: specify a name for this federated prometheus. It also adds a labelclusterID: <name>
for all metrics data retrieved from this prometheus server.endpoint
: specify the prometheus endpoint to federate. The endpoint must be available from the tethys deployed prometheus that federates (e.g. http(s)://<url>:<port>).username
: specify the username for basic_auth to request the endpoint.password
: specify the password for basic_auth to request the endpoint.kubernetes_hosted
: specify if the prometheus endpoint is hosted on Kubernetes. Iftrue
, then it will also retrieve metrics data and create dashboards for this Kubernetes cluster.
products
: a list of products the client has. It means the specified federated prometheus actually has metrics data for these products. For now, it only allowsnode
,kubernetes
,elasticsearch
andapache
.prometheus_rules
: a list of custom prometheus rules based on Prometheus 2.0 documentation, to create for this client. Please note to use!unsafe
keyword as prefix for every string that uses dollar sign$
to avoid templating. By default,prometheus_rules
is empty and standard rules will be automatically applied for each definedproducts
.
Example usage:
clients:
- name: c1111
prometheus_federation:
- name: preprod
endpoint: prometheus-preprod-c1111.eu-west-1.elb.amazonaws.com:80
kubernetes_hosted: true
- name: prod
endpoint: https://prometheus-prod-c1111.eu-west-1.elb.amazonaws.com:80
username: admin
password: p4ssw0rd
kubernetes_hosted: true
products:
- node
- kubernetes
- name: c1337
prometheus_federation:
- name: prod
endpoint: prometheus-preprod.eu-west-1.elb.amazonaws.com:80
kubernetes_hosted: false
products:
- node
prometheus_rules:
- name: "c1337-node"
rules:
- alert: InstanceDown
annotations:
description: !unsafe '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'
summary: !unsafe "Instance {{ $labels.instance }} down"
expr: 'up{clientID="c1111"} == 0'
for: 5m
labels:
severity: critical
For now, alertmanager
only has enabled
attribute. It specifies if alertmanager must be installed or not.
Example usage :
alertmanager:
enabled: true
You can manage Grafana's user
(default: admin
) and password
(default: tethys
) by setting them.
Example usage :
grafana:
username: admin
password: 4dm!npwd
reverse_proxy
has several variables :
enabled
(default: false): if true, then install Apache as reverse proxy in front of Prometheus, Grafana and Alertmanager applications.certbot
(default: false) : if true, then install certbot in order to handle automatic SSL certificates for vhosts defined invhosts
variable (seevhosts
below)email
: specify the email address that will be used as contact for Apache vhosts, and Let's Encrypt notifications.vhosts
is a list of vhosts to be installed into Apache reverse proxy. Ifcertbot
is true, then it also manage SSL certificates for the specified vhost domains. Each vhost invhosts
variable has the following attributes :name
: the vhost namedomain
: the vhost domain helps to create the reverse proxy vhost, then manage SSL certificate for this domain regardingcertbot
value.backend_endpoint
: where to forward the request for this domain.
allow_list
(default: []): define a list of IP addresses or subnets that are allowed to access the entire vhosts. (Even with aallow_list
defined, the/.well-known/
URL is never restricted in order to allow Lets Encrypt to validate the SSL certificates).
Example usage :
reverse_proxy:
enabled: true
certbot: true
email: jdoe@foo.bar
vhosts:
- name: prometheus
domain: prometheus.foo.bar
backend_endpoint: localhost:9090
- name: grafana
domain: grafana.foo.bar
backend_endpoint: localhost:3000
- name: alertmanager
domain: alertmanager.foo.bar
backend_endpoint: localhost:9093
allow_list:
- 10.10.0.0/16
For the versions available, see the tags on this repository.
Additionaly you can see what change in each version in the CHANGELOG.md file.
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.