Full stack of tools for monitoring.
This stack is composed by:
- Netdata:
Real Time Monitoring
- Visualise real time system status - Prometheus
Data Base
- Store collected metrics - Docker
Container
- Base container solution - cAdvisor
Container metrics exporter
- Expose metrics of your running containers - Grafana
Analytics plataform
- Allow you query and understand collected metrics - Node_Exporter
OS & Hardware metrics exporter
- Expose OS & Hardware metrics - AlertManager
Alerting
- Handle alerts sent by Prometheus - Slack
Communications
- Easy integration between your teammates - BlackBox Exporter
Endpoint Probes
- Probing of multiple endpoints over HTTP, HTTPS, DNS, TCP and ICMP
To execute the steps bellow the following are necessary:
- Docker in Swarm mode
You can simply run the following to start your standalone swarm cluster:
$ docker swarm init --advertise-addr YOUR_HOST_IP_HERE
Swarm initialized: current node (dxn1zf6l61qsb1josjja83ngz) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-49nj1cmql0jkz5s954yi3oex3nedyz0fb0xx14ie39trti4wxv-8vxv8rssmk743ojnwacrr2e7c \
YOUR_HOST_IP_HERE:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
Do not forget to replace "YOUR_HOST_IP_HERE" ;)
cd
# git clone git@github.com:jeskz0rd/monitoring.git
cd monitoring
Since version 1.1 You can use Netdata as a Container running with the stack as global mode, exposing metrics from all your cluster straight to prometheus.
As described by titpetric in additional-notes
You will not get detailed application metrics (mysql, ups, etc.) from other containers or from the
host if running netdata in a container.
It may be possible to get some of those metrics, but it might not be easy, and most likely not worth it.
For most detailed metrics, netdata needs to share the same environment as the application server
it monitors. This means it would need to run either in the same container (not even remotely practical),
or in the same virtual machine (no containers).
If you do not intend to collect this kind of metrics through Netdata the containerized application fulfil the requirements and automates the deployment over your cluster, otherwise follow the steps bellow.
# bash <(curl -Ss https://my-netdata.io/kickstart.sh) all
# echo 1 >/sys/kernel/mm/ksm/run
# echo 1000 >/sys/kernel/mm/ksm/sleep_millisecs
# vim /conf/prometheus/prometheus.yml
...
- job_name: 'netdata'
metrics_path: '/api/v1/allmetrics'
params:
format: [prometheus]
honor_labels: true
scrape_interval: 20s
static_configs:
- targets: ['YOUR_NETDATA_IP:19999']
Add the "prom" label to the Prometheus Swarm node.
docker node update YOUR_PROMETHEUS_SWARM_NODE --label-add "prom=true"
In Grafana 5.1> the default user id is 472 and as described in Grafana Documentation the steps bellow must be done to run it properly.
# docker container run --rm --user root --name grafana_temp -it -v ~/monitoring/volumes/grafana/data:/var/lib/grafana --entrypoint bash grafana/grafana:5.1.3
yet in the container you just started run the following:
$ chown -R root:root /etc/grafana && \
chmod -R a+r /etc/grafana && \
chown -R grafana:grafana /var/lib/grafana && \
chown -R grafana:grafana /usr/share/grafana && \
exit
it takes a while changing the permissions...
Configure Prometheus to scrape http.
Add your http targets in the prometheus.yml
####### BLACKBOX MONITORING ########
- job_name: 'blackbox'
params:
module:
- http_2xx
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /probe
scheme: http
static_configs:
- targets:
- http://example.com
- http://www.example.com
- http://your.web.app
- http://your.web.app/check
...
# docker stack deploy -c docker-compose.yml monitoring
Check deployed services:
# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
ypjvzrdzs760 monitoring_alertmanager replicated 1/1 jesk/alertmanager_alpine:1.0 *:9093->9093/tcp
x54ardi5blgn monitoring_blackbox-exporter global 1/1 prom/blackbox-exporter:v0.12.0 *:9115->9115/tcp
nnpqv7k297g4 monitoring_cadvisor global 1/1 google/cadvisor:v0.30.0 *:8080->8080/tcp
gpn2qklfmra6 monitoring_grafana replicated 1/1 grafana/grafana:5.1.3 *:3000->3000/tcp
7xgth29zggfb monitoring_netdata global 1/1 firehol/netdata:alpine *:19999->19999/tcp
31q4t856ciua monitoring_node-exporter global 1/1 jesk/node-exporter_alpine:1.0 *:9100->9100/tcp
z2jd4eprumd8 monitoring_prometheus replicated 1/1 jesk/prometheus_alpine:1.0 *:9090->9090/tcp
Accessing Prometheus interface on browser:
http://YOUR_HOST_IP:9090
Accessing AlertManager interface on browser:
http://YOUR_HOST_IP:9093
Accessing Grafana interface on browser:
http://YOUR_HOST_IP:3000
user: admin
passwd: admin
Accessing Netdata interface on browser:
http://YOUR_HOST_IP:19999
Accessing Node_exporter metrics on browser:
http://YOUR_HOST_IP:9100/metrics
Accessing Blackbox_exporter on browser:
http://YOUR_HOST_IP:9115
Create a channel and add the API information about your Slack account
# vim /conf/alertmanager/config.yml
route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
username: 'YOUR USERNAME'
channel: '#YOURCHANNEL'
api_url: 'INCOMING WEBHOOK'
All notable changes to this project will be documented in this file.