Backend to connect to POSTGRESQL database that implements a special deduplication mechanism.
Standard deduplication provided by Alerta deduplicates if environment, resource, event and severity are the same.
Additionally, it correlates in the same case except if severity is different (apart from using correlate
alert field).
With this backend two extra options are implemented to deduplicate alerts:
- Having the same environment, two alerts deduplicate if they have the same value of attribute
deduplication
(if this attribute is part of the alert information). - If the alert includes the attribute 'inferredCorrelation' with an alert id or a list of alert ids, alert is deduplicated with that alert id (the first one if the attribute contains a list).
To keep history of deduplicated alerts, alerts having deduplication
attribute form the value
field in each
history element data as <resource>/<event>/<value>
, so a history element is appended to the alert information
if resource, event or value are modified.
A default value for deduplication may be provided with a configuration property: DEFAULT_DEDUPLICATION_TEMPLATE
.
As the value should depend on the alert, the value is rendered as a jinja2 template providing the parameter alert
with the alert information. This value will be used only if alert doesn't provide a deduplication
attribute.
For example:
DEFAULT_DEDUPLICATION_TEMPLATE = '{{ alert.id }}'
=> means no deduplication by attribute by default as each
alert has a different id. Only alerts providing deduplication
attribute may deduplicate.
(The same behavior is obtained if no DEFAULT_DEDUPLICATION_TEMPLATE
property is configured).
DEFAULT_DEDUPLICATION_TEMPLATE = '{{ alert.environment }}-{{ alert.resource }}-{{ alert.event }}'
will
provide a deduplication similar to the Alerta original deduplication
Backend may be configured to use original + new deduplication by attribute or only attribute deduplication.
This behavior may be configured by alert using alert attribute deduplicationType
. The default value for alerts
that don't provide this attribute is defined in configuration property DEFAULT_DEDUPLICATION_TYPE
(default: 'both').
Possible values for the attribute or configuration are:
both
: original + new deduplication by attribute will be executed.attribute
: only new deduplication by attribute will be executed.
For attribute
deduplication type, alerts with same environment, resource, event and severity will not be deduplicated
if attribute deduplication
has a different value (or not provided).
Alert state transition flow is modified too: Alerts are not reopened after being closed. This means that, if an alert is received that is duplicated or deduplicated with a current alert in 'closed' status, alert is not considered as duplicated and a new alert is created.
Housekeeping is modified too, separating housekeeping of expired and closed alerts. Now the time to delete these two kind of alerts is configured using different configuration properties:
DELETE_EXPIRED_AFTER
: alerta property now configures the time (in seconds) to delete expired alerts. Default value is 7200. A value of 0 may be used to not deleting expired alerts.DELETE_CLOSED_AFTER
: new property to configure the time (in seconds) to delete closed alerts. If not configured, the value ofDELETE_EXPIRED_AFTER
is used. Use a value of 0 to not deleting closed alerts.
To use this database backend, the database schema for connection scheme must be iometrics
:
DATABASE_URL = 'iometrics://<pg_user>@<pg_server>/<pg_db>?connect_timeout=10&application_name=alerta'
A mechanism to execute plugins asynchronously is provided. These plugins, named "alerters", may implement operations
to execute when a new alert is received and also when the alert is resolved (status moves to Closed
).
Asynchronous execution is handled by celery python library using redis as celery broker and results backend.
Alerters must provide a class implementing datadope_alerta.plugins.iom_plugin.IOMAlerterPlugin
.
This implementation defines the class that implements the alerter specific behaviour which is defined in another class
that has to implement datadope_alerta.plugins.Alerter
and its two main method process_event
and process_recovery
.
These methods will execute the alerter operations when a new alert or a recovery is received.
Recovery operation will not be executed if a successful event operation has not been executed before for that alert.
If a recovery is received before the event operation starts executing (alert may be configured to have a delay from the moment the alert is received and when the alerter event operation is launched), the event operation is cancelled so neither that event operation nor the recovery operation is executed.
If the recovery is received while event operation is being executed, recovery operation will wait until the event operation finishes. If it finishes ok, the recovery operation is launched. If it finishes with an error, possible pending retries of event operation are cancelled and recovery operation is not launched.
Attribute | Type | Scope | Meaning |
---|---|---|---|
deduplication | string | Global | See backend info above |
deduplicationType | 'both' or 'attribute' | Global | See backend info above |
alerters | list \ json | Global | Alerter plugins to be executed to notify alert |
eventTags | dict \ json | Global | |
autoCloseAt | datetime | Global | |
autoCloseAfter | float (seconds) | Global | Fills / replaces autoCloseAt with last_received_time + value |
autoResolveAt | datetime | Global | |
autoResolveAfter | float (seconds) | Global | Fills / replaces autoResolveAt with last_received_time + value |
ignoreRecovery | bool | Alerter | |
actionDelay | float | Alerter | Seconds to wait before notifying alerters |
tasksDefinition | dict \ json | Alerter | |
repeatMinInterval | dict \ json | Alerter | Min interval from last repetition to send a new repeat event |
recoveryActions | dict \ json | Global | Recovery actions definition |
Scope 'Alerter' means that the attribute value may be defined specifically for every alerter while 'Global' means that the same value will be used independently of the alerter.
A plugin is provided to execute recovery actions before alerting.
This plugin will use one of the configured recovery actions providers to execute some recovery action before alerting.
Available recovery actions are read from python entry_points
with type alerta.recovery_actions.providers
.
AWX provider is provided as part of this python library.
Recovery action plugin is executed if the attribute recoveryActions
is present as part of an alert.
In this case, configured alerters are not launched after receiving the alert but this recovery actions plugins is
launched instead. It will be in charge of executing the configured recovery actions using the selected provider.
After actions are executed it will leave some time to the alert to be recovered. If the alert is not closed during
that time, it will launch the configured alerters.
A plugin is provided to manage blockouts using different blackout providers. This plugin name is 'blackout_manager' and should substitute original Alerta blackout plugin to obtain the extra functionality it provides.
A provider named internal
is included by default with the same functionaily as original Alerta blackout plugin.
Other providers may be configured using entry-point alerta.blackout.providers
in a setup file.
The location of the class implementing the provider must be defined as the entry-point:
'alerta.blackout.providers': [
'internal = datadope_alerta.plugins.blackouts.providers.internal:Provider'
]
The implementing class must be a subclass of datadope_alerta.plugins.blackouts.BlackoutProvider
which has only
one method to implement, that should return a boolean indicating if the provided alert is in blackout or not.
Blackout manager will handle if the blackout period ends. If alert is still open when blackout period ends, its status
will be changed to open
.
This is done using a periodic background task that request each provider if a currently in blackout alert is still in
blackout.
A new system used to enrich and/or fulfill the information from incoming alerts following certain predefined rules.
A set of contextual rules can be created using the contextualizer API. Each of these rules will have various fields that will help the Notifier plugin to check whether an alert matches any of the predefined rules. More information can be found at datadope_alerta/plugins/notifier/README.md
Several api contexts are provided by Datadope Alerta to support new functionalities:
Context | Method | Function |
---|---|---|
/alert/<alert_id>/alerters | GET | Returns alerters information related to an alert |
/async/alert | POST | Receives an alert as in /alert but processes it asynchronously. Returns the id of the task that will process the alert |
/async/alert/<bg_task_id> | GET | Returns the status of an async alert creation requested using previous context |
/alert_context/rules | POST | Adds a new contextual rule to the database |
/alert_context/rules/ | GET | Returns a contextual rule given a name |
/alert_context/rules | GET | Returns all the contextual rules |
/alert_context/rules/ | PUT | Updates a contextual rule matching the given ID |
/alert_context/rules/ | DELETE | Deletes a contextual rule matching the given ID |
Project provides a python package. To create the package, go to the project folder, start the python virtual environment and issue the setup commands:
pipenv shell
python -m setup bdist_wheel
One the package is built, it can be installed in an alerta python environment:
pipenv install datadope-alerta/dist/datadope_alerta-2.2.1-py3-none-any.whl
Or may be included in the Alerta deployment Pipfile.
Minimum Python version is 3.10
Following, an example pipfile is provided to run Alerta server.
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
alerta-server = {extras=["postgres"], git = "https://github.com/datadope-io/alerta.git"}
"celery[redis]" = "==5.2.7"
alerta = "==8.5.1" # Client. Needed for periodic background tasks
python-dotenv = "*"
pyopenssl = "*"
# zabbix-alerta = {git = "https://github.com/alerta/zabbix-alerta"}
# bson = "*" # Not needed with master version but needed for 8.7.0
requests = "==2.31.0"
gunicorn = "*"
flower = "*"
python-dateutil = "*"
[dev-packages]
[requires]
python_version = "3.10"
datadope-alerta package may be included in this Pipfile pointing to the wheel file or to the git repo.
An example configuration file is provided in config_example/alertad.conf.
To use the configuration, the environment var ALERTA_SVR_CONF_FILE
must be established pointing to the configuration
file.
See Alerta configuration documentation to get more information about configuration options.
Logging format strings can use the following fields to enrich log information:
- alert_id: id of the alert being handled.
- alerter_name: name of the alertar handling the alert.
- operation: 'new', 'repeat' or 'recovery' will be the possible values for this field.
If any of those field is not available, '-' will be printed instead.
These fields can be used for any logger associated to alerta server and in celery task logger. They can't be used in celery logger.
Apart from previous fields, celery task logger can also use the following fields:
- task_id: id of celery task that is running.
- task_name: name os celery task that is running.
Example of format string configuration variable:
CELERYD_TASK_LOG_FORMAT = \
"%(asctime)s|%(levelname)s|%(alert_id)s|%(alerter_name)s|%(operation)s|%(message)s" \
"[[%(name)s|%(processName)s][%(task_name)s|%(task_id)s]]"
Use RA_PROVIDER_<provider>_CONFIG
to provide a dictionary with the provider configuration.
For example RA_PROVIDER_AWX_CONFIG
will be the expected configuration for awx provider.
Configuration may be provided in alertad.conf config file and/or environment variables. If both provided, a merge will be done, having more priority the configuration from environment variables.
Alerta server must be executed within the pip environment defined before, including the configuration file environment var. The following command executes the server listing connections to port 8001:
alertad run --port 8001
The configuration file should include one or more users in ADMIN_USERS
constant. To create these users in alerta,
execute the following command:
alertad user --all
Once these users are created (they are created as admin users), you can log in the user interface (show below)
and create the key to use for sending connecting to the server.
This key must be configured in SECRET_KEY
constant in configuration file.
A redis server must be available for alerta to run celery tasks in background. Redis server location must be
configured in CELERY_BROKER_URL
and CELERY_RESULT_BACKEND
constants in configuration file.
To install the Alerta UI follow this procedure in a location of the server that will run the UI:
wget https://github.com/alerta/alerta-webui/releases/latest/download/alerta-webui.tar.gz
mkdir alerta-webui
cd alerta-webui
tar xzvf ../alerta-webui.tar.gz
cd dist
cp config.json.example config.json
-- edit config.json to point to alerta server url --
cat config.json
{"endpoint": "http://localhost:8001"}
You can now run the server for the UI. To run in port 8080:
python -m http.server 8000
To execute background tasks for Datadope alerters plugins, at least one celery worker must be running.
To run a celery worker the same python environment for alerta may be used. Also, the same configuration file may be used (configuration file in config_example is prepared to run a celery worker consuming from all configured queues).
The command to run the worker might be (issued inside the pipenv environment):
celery -A "datadope_alerta.bgtasks.celery" worker --loglevel=debug
This command will run a worker that will consume from all the queues defined in configuration file.
To run a worker to consume only specific queues include the parameter -Q
with the list of queues to consume from.
See celery worker command documentation for other parameters that may be used for example to define the numer of concurrent tasks that the worker will be able to run.
Apart from the workers, a celery beat process must also be started to manage scheduling of periodic tasks. The following periodic tasks will be executed:
Task | Interval config var | Operation |
---|---|---|
auto close | AUTO_CLOSE_TASK_INTERVAL (def 1 min) | Check if any alert is configured for auto close after some time |
auto resolve | AUTO_RESOLVE_TASK_INTERVAL (def 1 min) | Check if any alert is configured for auto resolve after some time |
The command to run celery beat process might be (issued inside the pipenv environment):
celery -A "datadope_alerta.bgtasks.celery" beat -s /var/tmp/celerybeat-schedule --loglevel=debug
A user interface to manage celery tasks and workers is available installing python library flower
(included in the
example Pipfile provided before).
With this library installed, a server can be launched within the alert/celery pipenv:
celery -A "datadope_alerta.bgtasks.celery" flower
Without parameters, the server will listen in port 5555. Use argument --port
to change the port.
See Flower documentation for more information.
To execute an alerta client, the python environment to use may be much simpler than the one needed for the server:
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
alerta = "*"
[dev-packages]
[requires]
python_version = "3.10"
Configuration file location is also defined using an environment var.
In this case, the environment var to use is ALERTA_CONF_FILE
.
The configuration file for the client is also much simpler than for the server. In this case, it is an ini file. For example:
[DEFAULT]
timezone = Europe/Madrid
output = json
key = <the server key>
endpoint = http://localhost:8001
sslverify = off
timeout = 300.0
debug = yes
Once the python environment is configured, an alert can be sent to the server:
alerta send -r web01 -e NodeDown -E Production -S Website -s major -t "Web server is down." -v ERROR
This alerta
command supports a full bunch of arguments to customize the alert to be sent to the server.
See alerta client documentation.
A full working environment can be launched running docker-compose.yml file provided in deployment folder.
To use this docker-compose file an environment file with name .env
must be generated in the same deployment folder.
An example file example.env is provided. It can be copied as .env
and modified to
use specific secrets.
Once .env
file si available, docker-compose can be executed from deployment folder using:
VERSION=$(cat ../VERSION) docker-compose up -d
This command will create the following containers:
- datadope-alerta-postgres
- datadope-alerta-redis
- datadope-alerta-server (listening in port 8001 of host computer, port 8000 in container)
- datadope-alerta-webui (listening in port 8000 of host computer and in container)
- datadope-alerta-celery-worker-1
- datadope-alerta-celery-beat (schedules periodic background tasks)
- datadope-alerta-celery-flower (UI to manage celery tasks listening in port 5555 of host computer and in container)
The number of celery workers to run may be modified using:
VERSION=$(cat ../VERSION) docker-compose up -d --scale celery-worker=2
In this case, 2 celery workers will run.
Default configuration for alerta and celery environments is obtained from config_example/alertad.conf. Most of the configuration may be overriden using environment vars. An example of environment file is located at deployment/example.env file.
Three docker files are provided to create images for:
- alerta server: Dockerfile.alerta
- celery workers, beat and flower services: Dockerfile.celery
- alerta webui: Dockerfile.webui
Dockerfiles and needed files to build dockers are available in deployment folder.
To create the images:
docker build --build-arg VERSION=$(cat VERSION) -f deployment/alerta.dockerfile -t datadope-alerta-server:$(cat VERSION) .
docker build --build-arg VERSION=$(cat VERSION) -f deployment/celery.dockerfile -t datadope-alerta-celery:$(cat VERSION) .
docker build -f deployment/webui.dockerfile -t datadope-alerta-webui .
These commands must be executed from repository root folder.
Alerta and celery dockers will include config files in config_example
but environment vars should be provided when running dockers to provide the actual
configuration for the installation environment. These env vars can be provided with
-e
and/or --env-file
docker run arguments.
To run a full environment:
docker run -d --rm --name datadope-alerta-server --env-file .env -e "ALERTA_SVR_CONF_FILE=/etc/datadope-alerta/alertad.conf" -p 8001:8000 datadope-alerta-server
docker run -d --rm --name datadope-alerta-celery-worker1 --env-file .env -e "ALERTA_SVR_CONF_FILE=/etc/datadope-alerta/alertad.conf" datadope-alerta-celery
docker run -d --rm --name datadope-alerta-celery-worker2 --env-file .env -e "ALERTA_SVR_CONF_FILE=/etc/datadope-alerta/alertad.conf" datadope-alerta-celery
docker run -d --rm --name datadope-alerta-celery-beat --env-file .env -e "ALERTA_SVR_CONF_FILE=/etc/datadope-alerta/alertad.conf" datadope-alerta-celery entry_point_celery_beat.sh
docker run -d --rm --name datadope-alerta-celery-flower --env-file .env -e "ALERTA_SVR_CONF_FILE=/etc/datadope-alerta/alertad.conf" -p 5555:5555 datadope-alerta-celery entry_point_celery_flower.sh
docker run -d --rm --name datadope-alerta-webui -p 8000:8000 datadope-alerta-webui
These configuration expects to have a running postgres and redis. The connection with them
is done using env vars DATABASE_URL
and CELERY_BROKER_URL
.
Alerter base class provides the following utility method:
Alerter.get_contextual_configuration(var_definition: VarDefinition, alert: Alert, operation: str)
using the following var definition structure:
from dataclasses import dataclass
from typing import Any
@dataclass
class VarDefinition:
var_name: str
default: Any = None
specific_event_tag: str = None
var_type: type = None
renderable = True
With this method, an alerter may get a configuration variable following a priority based steps.
Values are read from alert and application configuration in steps in the order below.
If the value is a dict, all steps are merged with priority from up to down. If it is not a dict, the value is the return of the first step that provides a non-null value.
Variable names are not case-sensitive. CamelCase and snake_case formats are also considered the same (the_var, THE_VAR, THEVAR, thevar, TheVar, thevar... correspond to the same variable).
Steps order:
- From event tags. Several tags are checked in order. Only the first one with a value is considered:
- alert.attributes['eventTags'][<SPECIFIC_EVENT_TAG>]
- alert.attributes['eventTags'][<ALERTER_NAME>_<VAR_NAME>]
- alert.attributes['eventTags'][<VAR_NAME>]
- From attributes:
- alert.attributes[<ALERTER_NAME>][<VAR_NAME>]
- alert.attributes[<ALERTER_NAME>_<VAR_NAME>]
- alert.attributes[<VAR_NAME>]
- From alerter configuration:
- config[<ALERTER_NAME>_CONFIG[<VAR_NAME>]]
- From alerter configuration as KEY:
- config[<ALERTER_NAME>_<VAR_NAME>]
- From default alerters configuration:
- environ[ALERTERS_DEFAULT_<VAR_NAME>]
- config[ALERTERS_DEFAULT_<VAR_NAME>]
- From default value if provided.
If the value obtained if a dict and an operation is provided, returned value will be the one related to the operation. The keys for the dictionary should be the values of ALERTERS_KEY_BY_OPERATION for each operation:
ALERTERS_KEY_BY_OPERATION = {
'process_event': 'new',
'process_recovery': 'recovery',
'process_repeat': 'repeat',
'process_action': 'action'
}
For example, for the var 'my_var' may have two different values for new event operation and for recovery operation. In that case, it may be configured as an attribute this way:
from alerta.models.alert import Alert
def create_alert(alert: Alert):
alert.attributes['my_var'] = {"new": "value for new event", 'recovery': "value for recovery"}
For compatibility, a prefix 'new' or 'recovery' may be used to indicate a different value of a var for each operation. When looking for a var value in alert attributes or in config first the var with the operation prefix is queried and, if that var is not in the dict, then it is queried without the prefix.
So, we can achieve the same as the previous example defining two vars:
from alerta.models.alert import Alert
def create_alert(alert: Alert):
alert.attributes['new_my_var'] = "value for new event"
alert.attributes['recovery_my_var'] = "value for recovery"
In either way, requesting the value of 'my_var' will return the correct value for the provided operation
(operation
is an argument to provide to the function).
By default, when requesting the value of a variable using the previous method, the obtained value is rendered as a Jinja2 template is the value is a string, a dict (all values of the dict as rendered recursively) or a list (all list elements are rendered recursively).
VarDefinition
provides a member field to avoid rendering a specific variable.
To render the jinja2 template 4 variables are provided to the renderer that can be used for templating:
alert
: Alert information asAlert
object.attributes
: Alert attributes. It is similar toalert.attributes
but this variable is provided as a case-insensitive dict, so it should be used instead ofAlert.attributes
.event_tags
: Value ofeventTags
attribute as a case-insensitive dict.alerter_config
: Alerter configuration as a case-insensitive dict.alerter_name
: Name of the alerteroperation
: Involved operation (process_event
,process_recovery
...)operation_key
: Involved operation kwy (new
,recovery
,repeat
,action
)pretty_alert
: alert data json representation
Alerters may use provided method of parent class Alerter.render_template(self, template_path, alert)
.
This method will return the result of rendering the template in the provided path with the four variables defined before.
Alerters may use its inherited method Alerter.get_message()
to get a message associated to the alert
depending on the current operation, its event tags and the message
attribute.
For new and repeat operations, message
attribute is used as source of the message.
For action or recovery operations, reason
received with the action or attribute reason
in case of an action
received without reason, is used as source of message.
This source massage is parsed with the following rules:
- If event tags
<OPERATION_KEY>_MESSAGE_LINE_#
are available, source message is replaced by the concatenation of those event tags in order and separated by\n
. - Parts of the source message with the form
{TAG_NAME}
are replaced by the value of the event tagTAG_NAME
if available. - If event tag
<OPERATION_KEY>_EXTRA_FOOTER
orEXTRA_FOOTER
is available, the content of that tag is appended at the end of the message. - If event tag
<OPERATION_KEY>_EXTRA_TITLE
orEXTRA_TITLE
is available, the content of that tag is appended at the end of the first message line. - If event tag
<OPERATION_KEY>_EXTRA_PRE_TITLE
orEXTRA_PRE_TITLE
is available, the content of that tag is inserted at the beginning of the first message line.
Aleter.get_message
function will try to obtain a jinja 2 template file. The file location may be configured
with template
attribute or configuration parameter, which defaults to {{ alerter_name }}/{{ operation_key }}.j2
in
a templates folder configured with setting ALERTERS_TEMPLATES_LOCATION
.
If a template file is available, parsed message
and reason
are provided as variables to the jinja environment, apart from
the rest of variables indicated in Rendering templated strings.
If no template file is available, parsed message is returned for new and repeat operations, reason is returned for the rest.