This repository contains an implementation example of a managed ELKK stack using the AWS Cloud Development Kit. This example uses Python.
- Context
- Prerequisites
- Amazon Virtual Private Cloud
- Amazon Managed Streaming for Apache Kafka
- Filebeat
- Amazon Elasticsearch Service
- Kibana
- Amazon Athena
- Logstash
- Clean up
The ELKK stack is a pipeline of services to support real-time reporting and analytics. Amazon services can provide a managed ELKK stack using the services Amazon Elasticsearch Service, Logstash on Amazon EC2 or on Amazon Elastic Container Services and Amazon Managed Streaming for Kafka. Kibana is included as a capability of the Amazon Elasticsearch Service. As part of a holistic solution Logstash in addition to outputting logs to Amazon Elasticsearch outputs the log to Amazon S3 for longer term storage. Amazon Athena can be used to directly query files in Amazon S3.
Filebeat agents will be used to collect the logs from the application/host systems, and publish the logs to Amazon MSK. Filebeat agents are deployed on an Amazon EC2 instance to simulate log generation.
Amazon Managed Streaming for Kafka (Amazon MSK) is used as a buffering layer to handle the collection of logs and manage the back-pressure from downstream components in the architecture. The buffering layer will provide recoverability and extensibility in the platform.
The Logstash layer will perform a dual-purpose of reading the data from Amazon MSK and indexing the logs to Amazon Elasticsearch in real-time as well as storing the data to S3.
Users can search for logs in Amazon Elasticsearch Service using Kibana front-end UI application. Amazon Elasticsearch is a fully managed service which provides a rich set of features such as Dashboards, Alerts, SQL query support and much more which can be used based on workload specific requirements.
Logs are stored in Amazon S3 to support cold data log analysis requirements. AWS Glue catalog will store the metadata information associated with the log files to be made available to the user for ad-hoc analysis.
Amazon Athena supports SQL queries against log data stored in Amazon S3.
The following tools are required to deploy this Amazon Managed ELKK stack.
If using AWS Cloud9 skip to section "AWS Cloud9 - Create Cloud9 Environment" below.
AWS CDK - https://docs.aws.amazon.com/cdk/latest/guide/getting_started.html
AWS CLI - https://aws.amazon.com/cli/
Git - https://git-scm.com/downloads
python (3.6 or later) - https://www.python.org/downloads/
Docker - https://www.docker.com/
If desired AWS Cloud9 set up is detailed in the AWS Cloud9 setup Instructions.
Complete the following steps to set up the Managed ELKK workshop in your environment.
At a bash terminal session.
# clone the repo
$ git clone https://github.com/aws-samples/aws-cdk-managed-elkk
# move to directory
$ cd aws-cdk-managed-elkk
# bootstrap the remaining setup (assumes us-east-1)
$ bash bootstrap.sh
# activate the virtual environment
$ source .env/bin/activate
Create the CDK configuration by bootstrapping the CDK.
# bootstrap the cdk
(.env)$ cdk bootstrap aws://youraccount/yourregion
The first stage in the ELKK deployment is to create an Amazon Virtual Private Cloud with public and private subnets. The Managed ELKK stack will be deployed into this VPC.
Use the AWS CDK to deploy an Amazon VPC across multiple availability zones.
# deploy the vpc stack
(.env)$ cdk deploy elkk-vpc
The second stage in the ELKK deployment is to create the Amazon Managed Streaming for Apache Kafka cluster. An Amazon EC2 instance is created with the Apache Kafka client installed to interact with the Amazon MSK cluster.
Use the AWS CDK to deploy an Amazon MSK Cluster into the VPC.
# deploy the kafka stack
(.env)$ cdk deploy elkk-kafka
The CDK will prompt to apply Security Changes, input "y" for Yes.
When Client is set to True an Amazon EC2 instance is deployed to interact with the Amazon MSK Cluster. It can take up to 30 minutes for the Amazon MSK cluster and client EC2 instance to be deployed.
Wait until 2/2 checks are completed on the Kafka client EC2 instance to ensure that the userdata scripts have fully run.
On creation the Kafka client EC2 instance will create three Kafka topics: "elkktopic", "apachelog", and "appevent".
Open a terminal window to connect to the Kafka client Amazon EC2 instance and create a Kafka producer session:
# get the ec2 instance public dns
(.env)$ kafka_client_dns=`aws ec2 describe-instances --filter file://kafka/kafka_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}[0].Instance"` && echo $kafka_client_dns
# use the public dns to connect to the amazon ec2 instance
(.env)$ ssh ec2-user@$kafka_client_dns
While connected to the Kafka client EC2 instance create the Kafka producer session on the elkktopic Kafka topic:
# Get the cluster ARN
$ kafka_arn=`aws kafka list-clusters --output text --query 'ClusterInfoList[*].ClusterArn'` && echo $kafka_arn
# Get the bootstrap brokers
$ kafka_brokers=`aws kafka get-bootstrap-brokers --cluster-arn $kafka_arn --output text --query '*'` && echo $kafka_brokers
# Connect to the cluster as a producer on the Kafka topic "elkktopic"
$ /opt/kafka_2.12-2.4.0/bin/kafka-console-producer.sh --broker-list $kafka_brokers --topic elkktopic
Leave the Kafka producer session window open.
Open a new terminal window and connect to the Kafka client EC2 instance to create a Kafka consumer session:
# get the ec2 instance public dns
(.env)$ kafka_client_dns=`aws ec2 describe-instances --filter file://kafka/kafka_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}[0].Instance"` && echo $kafka_client_dns
# use the public dns to connect to the ec2 instance
(.env)$ ssh ec2-user@$kafka_client_dns
Note the optional steps in red, if the yourkeypair is not recognized.
While connected to the Kafka client EC2 instance create the consumer session on the elkktopic Kafka topic.
# Get the cluster ARN
$ kafka_arn=`aws kafka list-clusters --output text --query 'ClusterInfoList[*].ClusterArn'` && echo $kafka_arn
# Get the bootstrap brokers
$ kafka_brokers=`aws kafka get-bootstrap-brokers --cluster-arn $kafka_arn --output text --query '*'` && echo $kafka_brokers
# Connect to the cluster as a consumer
$ /opt/kafka_2.12-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server $kafka_brokers --topic elkktopic --from-beginning
Type messages into the Kafka producer session and they are published to the Amazon MSK cluster
The messages published to the Amazon MS cluster by the producer session will appear in the Kafka consumer window as they are read from the cluster.
The Kafka client EC2 instance windows can be closed.
To simulate incoming logs for the ELKK cluster Filebeat will be installed on an Amazon EC2 instance. Filebeat will harvest logs generated by a dummy log generator and push these logs to the Amazon MSK cluster.
Use the AWS CDK to create an Amazon EC2 instance installed with Filebeat and a dummy log generator.
# deploy the Filebeat stack
(.env)$ cdk deploy elkk-filebeat
An Amazon EC2 instance is deployed with Filebeat installed and configured to output to Kafka.
Wait until 2/2 checks are completed on the Filebeat EC2 instance to ensure that the userdata script as run.
Open a new terminal window connect to the Filebeat EC2 instance and create create dummy logs:
# get the Filebeat ec2 instance public dns
(.env)$ filebeat_dns=`aws ec2 describe-instances --filter file://filebeat/filebeat_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $filebeat_dns
# use the public dns to connect to the filebeat ec2 instance
(.env)$ ssh ec2-user@$filebeat_dns
While connected to the Filebeat EC2 instance create dummy logs:
# generate dummy apache logs with log generator
$ ./log_generator.py
Dummy logs created by the log generator will be written to the apachelog folder. Filebeat will harvest the logs and publish them to the Amazon MSK cluster.
In the Kafka client EC2 instance terminal window disconnect the consumer session with <control+c>.
Create Kafka consumer session on the apachelog Kafka topic.
# Get the cluster ARN
$ kafka_arn=`aws kafka list-clusters --output text --query 'ClusterInfoList[*].ClusterArn'` && echo $kafka_arn
# Get the bootstrap brokers
$ kafka_brokers=`aws kafka get-bootstrap-brokers --cluster-arn $kafka_arn --output text --query '*'` && echo $kafka_brokers
# Connect to the cluster as a consumer
$ /opt/kafka_2.12-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server $kafka_brokers --topic apachelog --from-beginning
Messages generated by the log generator should appear in the Kafka consumer terminal window.
The Amazon Elasticsearch Service provides an Elasticsearch domain and Kibana dashboards. The elkk-elastic stack also creates an Amazon EC2 instance to interact with the Elasticsearch domain. The EC2 instance can also be used to create an SSH tunnel into the VPC for Kibana dashboard viewing.
# deploy the elastic stack
(.env)$ cdk deploy elkk-elastic
When prompted input "y" for Yes to continue.
An Amazon EC2 instance is deployed to interact with the Amazon Elasticsearch Service domain.
New Amazon Elasticsearch Service domains take about ten minutes to initialize.
Wait until 2/2 checks are completed on the Amazon EC2 instance to ensure that the userdata script has run.
Connect to the EC2 instance using a terminal window:
# get the elastic ec2 instance public dns
(.env)$ elastic_dns=`aws ec2 describe-instances --filter file://elastic/elastic_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $elastic_dns
# use the public dns to connect to the elastic ec2 instance
(.env)$ ssh ec2-user@$elastic_dns
While connected to the Elastic EC2 instance:
# get the elastic domain
$ elastic_domain=`aws es list-domain-names --output text --query '*'` && echo $elastic_domain
# get the elastic endpoint
$ elastic_endpoint=`aws es describe-elasticsearch-domain --domain-name $elastic_domain --output text --query 'DomainStatus.Endpoints.vpc'` && echo $elastic_endpoint
# curl a doc into elasticsearch
$ curl -XPOST $elastic_endpoint/elkktopic/_doc/ -d '{"message": "Hello - this is a test message"}' -H 'Content-Type: application/json'
# curl to query elasticsearch
$ curl -XPOST $elastic_endpoint/elkktopic/_search -d' { "query": { "match_all": {} } }' -H 'Content-Type: application/json'
# count the records in the index
$ curl -GET $elastic_endpoint/elkktopic/_count
# exit the Elastic ec2 instance
$ exit
Amazon Elasticsearch Service has been deployed within a VPC in a private subnet. To allow connections to the Kibana dashboard we deploy a public endpoint using Amazon API Gateway, AWS Lambda, Amazon CloudFront, and Amazon S3.
# deploy the kibana endpoint
(.env)$ cdk deploy elkk-kibana
When prompted "Do you wish to deploy these changes?", enter "y" for Yes.
When the deployment is complete the Kibana url is output by the AWS CDK as "elkk-kibana.kibanalink. Click on the link to navigate to Kibana.
Open the link.
The Kibana Dashboard is visible.
To view the records on the Kibana dashboard an "index pattern" needs to be created.
Select "Management" on the left of the Kibana Dashboard.
Select "Index Patterns" at the top left of the Management Screen.
Input an Index Patterns into the Index Pattern field as "elkktopc*".
Click "Next Step".
Click "Create Index Pattern".
The fields from the index can be seen. Click on "Discover".
The data can be seen on the Discovery Dashboard.
Amazon Simple Storage Service is used to storage logs for longer term storage. Amazon Athena can be used to query files on S3.
# deploy the athena stack
(.env)$ cdk deploy elkk-athena
Logstash is deployed to subscribe to the Kafka topics and output the data into Elasticsearch. An additional output is added to push the data into S3. Logstash additionally parses the apache common log format and transforms the log data into json format.
The Logstash pipeline configuration can be viewed in logstash/logstash.conf
Check the /app.py file and verify that the elkk-logstash stack is initially set to deploy Logstash on an Amazon EC2 instance and Amazon Fargate deployment is disabled.
# logstash stack
logstash_stack = LogstashStack(
app,
"elkk-logstash",
vpc_stack,
logstash_ec2=True,
logstash_fargate=False,
env=core.Environment(
account=os.environ["CDK_DEFAULT_ACCOUNT"],
region=os.environ["CDK_DEFAULT_ACCOUNT"],
),
)
When we deploy the elkk-stack we will be deploying Logstash on an Amazon EC2 instance.
(.env)$ cdk deploy elkk-logstash
An Amazon EC2 instance is deployed with Logstash installed and configured with an input from Kafka and output to Elasticsearch and s3.
Wait until 2/2 checks are completed on the Logstash EC2 instance to ensure that the userdata scripts have fully run.
Connect to the Logstash EC2 instance using a terminal window:
# get the logstash instance public dns
$ logstash_dns=`aws ec2 describe-instances --filter file://logstash/logstash_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $logstash_dns
# use the public dns to connect to the logstash instance
$ ssh ec2-user@$logstash_dns
While connected to logstash EC2 instance:
# verify the logstash config, the last line should contain "Config Validation Result: OK. Exiting Logstash"
$ /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/logstash.conf
# check the logstash status
$ service logstash status -l
Exit the Logstash instance and reconnect to the Filebeat instance.
# exit logstash instance
exit
# get the Filebeat ec2 instance public dns
(.env)$ filebeat_dns=`aws ec2 describe-instances --filter file://filebeat/filebeat_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $filebeat_dns
# use the public dns to connect to the filebeat ec2 instance
(.env)$ ssh ec2-user@$filebeat_dns
In the Filebeat EC2 instance generate new log files.
# generate new logs
$ ./log_generator.py
Navigate to Kibana and view the logs generated.
Create a new Index Pattern for the apache logs using pattern "elkk-apachelog*".
At the Configure Settings dialog there is now an option to select a timestamp. Select "@timestamp".
Apache Logs will now appear on a refreshed Dashboard by their timestamp. Apache Logs are selected by their index at mid-left of the Dashboard.
Navigate to s3 to view the files pushed to s3.
Logstash can be deployed into containers or virtual machines. To deploy logstash on containers update the logstash deployment from Amazon EC2 to AWS Fargate.
Update the /app.py file and verify that the elkk-logstash stack is set to fargate and not ec2.
# logstash stack
logstash_stack = LogstashStack(
app,
"elkk-logstash",
vpc_stack,
logstash_ec2=False,
logstash_fargate=True,
env=core.Environment(
account=os.environ["CDK_DEFAULT_ACCOUNT"],
region=os.environ["CDK_DEFAULT_ACCOUNT"],
),
)
Deploy the updated stack, terminating the Logstash EC2 instance and creating a Logstash service on AWS Fargate.
(.env)$ cdk deploy elkk-logstash
The logstash EC2 instance will be terminated and an AWS Fargate cluster will be created. Logstash will be deployed as containerized tasks.
In the Filebeat EC2 instance generate new logfiles.
# get the Filebeat ec2 instance public dns
(.env)$ filebeat_dns=`aws ec2 describe-instances --filter file://filebeat/filebeat_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $filebeat_dns
# use the public dns to connect to the filebeat ec2 instance
(.env)$ ssh ec2-user@$filebeat_dns
```bash
# generate new logs, the -f 20 will generate 20 files at 30 second intervals
$ ./log_generator.py -f 20
Navigate to Kibana and view the logs generated. They will appear in the Dashboard for Apache Logs as they are generated.
To clean up the stacks... destroy the elkk-vpc stack, all other stacks will be torn down due to dependencies.
CloudWatch logs will need to be separately removed.
(.env)$ cdk destroy elkk-vpc
This library is licensed under the MIT-0 License. See the LICENSE file.