Amazon Managed ELKK

This repository contains an implementation example of a managed ELKK stack using the AWS Cloud Development Kit. This example uses Python.

Context

The ELKK stack is a pipeline of services to support real-time reporting and analytics. Amazon services can provide a managed ELKK stack using the services Amazon Elasticsearch Service, Logstash on Amazon EC2 or on Amazon Elastic Container Services and Amazon Managed Streaming for Kafka. Kibana is included as a capability of the Amazon Elasticsearch Service. As part of a holistic solution Logstash in addition to outputting logs to Amazon Elasticsearch outputs the log to Amazon S3 for longer term storage. Amazon Athena can be used to directly query files in Amazon S3.

Components

Filebeat agents will be used to collect the logs from the application/host systems, and publish the logs to Amazon MSK. Filebeat agents are deployed on an Amazon EC2 instance to simulate log generation.

Amazon Managed Streaming for Kafka (Amazon MSK) is used as a buffering layer to handle the collection of logs and manage the back-pressure from downstream components in the architecture. The buffering layer will provide recoverability and extensibility in the platform.

The Logstash layer will perform a dual-purpose of reading the data from Amazon MSK and indexing the logs to Amazon Elasticsearch in real-time as well as storing the data to S3.

Users can search for logs in Amazon Elasticsearch Service using Kibana front-end UI application. Amazon Elasticsearch is a fully managed service which provides a rich set of features such as Dashboards, Alerts, SQL query support and much more which can be used based on workload specific requirements.

Logs are stored in Amazon S3 to support cold data log analysis requirements. AWS Glue catalog will store the metadata information associated with the log files to be made available to the user for ad-hoc analysis.

Amazon Athena supports SQL queries against log data stored in Amazon S3.

Prerequisites

The following tools are required to deploy this Amazon Managed ELKK stack.

If using AWS Cloud9 skip to section "AWS Cloud9 - Create Cloud9 Environment" below.

AWS CDK - https://docs.aws.amazon.com/cdk/latest/guide/getting_started.html
AWS CLI - https://aws.amazon.com/cli/
Git - https://git-scm.com/downloads
python (3.6 or later) - https://www.python.org/downloads/
Docker - https://www.docker.com/

If desired AWS Cloud9 set up is detailed in the AWS Cloud9 setup Instructions.

Create the Managed ELKK

Complete the following steps to set up the Managed ELKK workshop in your environment.

At a bash terminal session.

# clone the repo
$ git clone https://github.com/aws-samples/aws-cdk-managed-elkk
# move to directory
$ cd aws-cdk-managed-elkk

# bootstrap the remaining setup (assumes us-east-1)
$ bash bootstrap.sh
# activate the virtual environment
$ source .env/bin/activate

Bootstrap the CDK

Create the CDK configuration by bootstrapping the CDK.

# bootstrap the cdk
(.env)$ cdk bootstrap aws://youraccount/yourregion

Amazon Virtual Private Cloud

The first stage in the ELKK deployment is to create an Amazon Virtual Private Cloud with public and private subnets. The Managed ELKK stack will be deployed into this VPC.

Use the AWS CDK to deploy an Amazon VPC across multiple availability zones.

# deploy the vpc stack
(.env)$ cdk deploy elkk-vpc

Amazon Managed Streaming for Apache Kafka

The second stage in the ELKK deployment is to create the Amazon Managed Streaming for Apache Kafka cluster. An Amazon EC2 instance is created with the Apache Kafka client installed to interact with the Amazon MSK cluster.

Use the AWS CDK to deploy an Amazon MSK Cluster into the VPC.

# deploy the kafka stack
(.env)$ cdk deploy elkk-kafka

The CDK will prompt to apply Security Changes, input "y" for Yes.

When Client is set to True an Amazon EC2 instance is deployed to interact with the Amazon MSK Cluster. It can take up to 30 minutes for the Amazon MSK cluster and client EC2 instance to be deployed.

Wait until 2/2 checks are completed on the Kafka client EC2 instance to ensure that the userdata scripts have fully run.

On creation the Kafka client EC2 instance will create three Kafka topics: "elkktopic", "apachelog", and "appevent".

Open a terminal window to connect to the Kafka client Amazon EC2 instance and create a Kafka producer session:

# get the ec2 instance public dns
(.env)$ kafka_client_dns=`aws ec2 describe-instances --filter file://kafka/kafka_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}[0].Instance"` && echo $kafka_client_dns
# use the public dns to connect to the amazon ec2 instance
(.env)$ ssh ec2-user@$kafka_client_dns

While connected to the Kafka client EC2 instance create the Kafka producer session on the elkktopic Kafka topic:

# Get the cluster ARN
$ kafka_arn=`aws kafka list-clusters --output text --query 'ClusterInfoList[*].ClusterArn'` && echo $kafka_arn
# Get the bootstrap brokers
$ kafka_brokers=`aws kafka get-bootstrap-brokers --cluster-arn $kafka_arn --output text --query '*'` && echo $kafka_brokers
# Connect to the cluster as a producer on the Kafka topic "elkktopic" 
$ /opt/kafka_2.12-2.4.0/bin/kafka-console-producer.sh --broker-list $kafka_brokers --topic elkktopic

Leave the Kafka producer session window open.

Open a new terminal window and connect to the Kafka client EC2 instance to create a Kafka consumer session:

# get the ec2 instance public dns
(.env)$ kafka_client_dns=`aws ec2 describe-instances --filter file://kafka/kafka_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}[0].Instance"` && echo $kafka_client_dns
# use the public dns to connect to the ec2 instance
(.env)$ ssh ec2-user@$kafka_client_dns

Note the optional steps in red, if the yourkeypair is not recognized.

While connected to the Kafka client EC2 instance create the consumer session on the elkktopic Kafka topic.

# Get the cluster ARN
$ kafka_arn=`aws kafka list-clusters --output text --query 'ClusterInfoList[*].ClusterArn'` && echo $kafka_arn
# Get the bootstrap brokers
$ kafka_brokers=`aws kafka get-bootstrap-brokers --cluster-arn $kafka_arn --output text --query '*'` && echo $kafka_brokers
# Connect to the cluster as a consumer
$ /opt/kafka_2.12-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server $kafka_brokers --topic elkktopic --from-beginning

Type messages into the Kafka producer session and they are published to the Amazon MSK cluster

The messages published to the Amazon MS cluster by the producer session will appear in the Kafka consumer window as they are read from the cluster.

The Kafka client EC2 instance windows can be closed.

Filebeat

To simulate incoming logs for the ELKK cluster Filebeat will be installed on an Amazon EC2 instance. Filebeat will harvest logs generated by a dummy log generator and push these logs to the Amazon MSK cluster.

Use the AWS CDK to create an Amazon EC2 instance installed with Filebeat and a dummy log generator.

# deploy the Filebeat stack
(.env)$ cdk deploy elkk-filebeat

An Amazon EC2 instance is deployed with Filebeat installed and configured to output to Kafka.

Wait until 2/2 checks are completed on the Filebeat EC2 instance to ensure that the userdata script as run.

Open a new terminal window connect to the Filebeat EC2 instance and create create dummy logs:

# get the Filebeat ec2 instance public dns
(.env)$ filebeat_dns=`aws ec2 describe-instances --filter file://filebeat/filebeat_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $filebeat_dns
# use the public dns to connect to the filebeat ec2 instance
(.env)$ ssh ec2-user@$filebeat_dns

While connected to the Filebeat EC2 instance create dummy logs:

# generate dummy apache logs with log generator
$ ./log_generator.py

Dummy logs created by the log generator will be written to the apachelog folder. Filebeat will harvest the logs and publish them to the Amazon MSK cluster.

In the Kafka client EC2 instance terminal window disconnect the consumer session with <control+c>.

Create Kafka consumer session on the apachelog Kafka topic.

# Get the cluster ARN
$ kafka_arn=`aws kafka list-clusters --output text --query 'ClusterInfoList[*].ClusterArn'` && echo $kafka_arn
# Get the bootstrap brokers
$ kafka_brokers=`aws kafka get-bootstrap-brokers --cluster-arn $kafka_arn --output text --query '*'` && echo $kafka_brokers
# Connect to the cluster as a consumer
$ /opt/kafka_2.12-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server $kafka_brokers --topic apachelog --from-beginning

Messages generated by the log generator should appear in the Kafka consumer terminal window.

Amazon Elasticsearch Service

The Amazon Elasticsearch Service provides an Elasticsearch domain and Kibana dashboards. The elkk-elastic stack also creates an Amazon EC2 instance to interact with the Elasticsearch domain. The EC2 instance can also be used to create an SSH tunnel into the VPC for Kibana dashboard viewing.

# deploy the elastic stack
(.env)$ cdk deploy elkk-elastic

When prompted input "y" for Yes to continue.

An Amazon EC2 instance is deployed to interact with the Amazon Elasticsearch Service domain.

New Amazon Elasticsearch Service domains take about ten minutes to initialize.

Wait until 2/2 checks are completed on the Amazon EC2 instance to ensure that the userdata script has run.

Connect to the EC2 instance using a terminal window:

# get the elastic ec2 instance public dns
(.env)$ elastic_dns=`aws ec2 describe-instances --filter file://elastic/elastic_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $elastic_dns
# use the public dns to connect to the elastic ec2 instance
(.env)$ ssh ec2-user@$elastic_dns

While connected to the Elastic EC2 instance:

# get the elastic domain
$ elastic_domain=`aws es list-domain-names --output text --query '*'` && echo $elastic_domain
# get the elastic endpoint
$ elastic_endpoint=`aws es describe-elasticsearch-domain --domain-name $elastic_domain --output text --query 'DomainStatus.Endpoints.vpc'` && echo $elastic_endpoint
# curl a doc into elasticsearch
$ curl -XPOST $elastic_endpoint/elkktopic/_doc/ -d '{"message": "Hello - this is a test message"}' -H 'Content-Type: application/json'
# curl to query elasticsearch
$ curl -XPOST $elastic_endpoint/elkktopic/_search -d' { "query": { "match_all": {} } }' -H 'Content-Type: application/json'
# count the records in the index
$ curl -GET $elastic_endpoint/elkktopic/_count
# exit the Elastic ec2 instance
$ exit

Kibana

Amazon Elasticsearch Service has been deployed within a VPC in a private subnet. To allow connections to the Kibana dashboard we deploy a public endpoint using Amazon API Gateway, AWS Lambda, Amazon CloudFront, and Amazon S3.

# deploy the kibana endpoint
(.env)$ cdk deploy elkk-kibana

When prompted "Do you wish to deploy these changes?", enter "y" for Yes.

When the deployment is complete the Kibana url is output by the AWS CDK as "elkk-kibana.kibanalink. Click on the link to navigate to Kibana.

Open the link.

The Kibana Dashboard is visible.

To view the records on the Kibana dashboard an "index pattern" needs to be created.

Select "Management" on the left of the Kibana Dashboard.

Select "Index Patterns" at the top left of the Management Screen.

Input an Index Patterns into the Index Pattern field as "elkktopc*".

Click "Next Step".

Click "Create Index Pattern".

The fields from the index can be seen. Click on "Discover".

The data can be seen on the Discovery Dashboard.

Amazon Athena

Amazon Simple Storage Service is used to storage logs for longer term storage. Amazon Athena can be used to query files on S3.

# deploy the athena stack
(.env)$ cdk deploy elkk-athena

Logstash

Logstash is deployed to subscribe to the Kafka topics and output the data into Elasticsearch. An additional output is added to push the data into S3. Logstash additionally parses the apache common log format and transforms the log data into json format.

The Logstash pipeline configuration can be viewed in logstash/logstash.conf

Check the /app.py file and verify that the elkk-logstash stack is initially set to deploy Logstash on an Amazon EC2 instance and Amazon Fargate deployment is disabled.

# logstash stack
logstash_stack = LogstashStack(
    app,
    "elkk-logstash",
    vpc_stack,
    logstash_ec2=True,
    logstash_fargate=False,
    env=core.Environment(
        account=os.environ["CDK_DEFAULT_ACCOUNT"],
        region=os.environ["CDK_DEFAULT_ACCOUNT"],
    ),
)

When we deploy the elkk-stack we will be deploying Logstash on an Amazon EC2 instance.

(.env)$ cdk deploy elkk-logstash

An Amazon EC2 instance is deployed with Logstash installed and configured with an input from Kafka and output to Elasticsearch and s3.

Wait until 2/2 checks are completed on the Logstash EC2 instance to ensure that the userdata scripts have fully run.

Connect to the Logstash EC2 instance using a terminal window:

# get the logstash instance public dns
$ logstash_dns=`aws ec2 describe-instances --filter file://logstash/logstash_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $logstash_dns
# use the public dns to connect to the logstash instance
$ ssh ec2-user@$logstash_dns

While connected to logstash EC2 instance:

# verify the logstash config, the last line should contain "Config Validation Result: OK. Exiting Logstash"
$ /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/logstash.conf
# check the logstash status
$ service logstash status -l

Exit the Logstash instance and reconnect to the Filebeat instance.

# exit logstash instance
exit
# get the Filebeat ec2 instance public dns
(.env)$ filebeat_dns=`aws ec2 describe-instances --filter file://filebeat/filebeat_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $filebeat_dns
# use the public dns to connect to the filebeat ec2 instance
(.env)$ ssh ec2-user@$filebeat_dns

In the Filebeat EC2 instance generate new log files.

# generate new logs
$ ./log_generator.py

Navigate to Kibana and view the logs generated.

Create a new Index Pattern for the apache logs using pattern "elkk-apachelog*".

At the Configure Settings dialog there is now an option to select a timestamp. Select "@timestamp".

Apache Logs will now appear on a refreshed Dashboard by their timestamp. Apache Logs are selected by their index at mid-left of the Dashboard.

Navigate to s3 to view the files pushed to s3.

Logstash can be deployed into containers or virtual machines. To deploy logstash on containers update the logstash deployment from Amazon EC2 to AWS Fargate.

Update the /app.py file and verify that the elkk-logstash stack is set to fargate and not ec2.

# logstash stack
logstash_stack = LogstashStack(
    app,
    "elkk-logstash",
    vpc_stack,
    logstash_ec2=False,
    logstash_fargate=True,
    env=core.Environment(
        account=os.environ["CDK_DEFAULT_ACCOUNT"],
        region=os.environ["CDK_DEFAULT_ACCOUNT"],
    ),
)

Deploy the updated stack, terminating the Logstash EC2 instance and creating a Logstash service on AWS Fargate.

(.env)$ cdk deploy elkk-logstash

The logstash EC2 instance will be terminated and an AWS Fargate cluster will be created. Logstash will be deployed as containerized tasks.

In the Filebeat EC2 instance generate new logfiles.

# get the Filebeat ec2 instance public dns
(.env)$ filebeat_dns=`aws ec2 describe-instances --filter file://filebeat/filebeat_filter.json --output text --query "Reservations[*].Instances[*].{Instance:PublicDnsName}"` && echo $filebeat_dns
# use the public dns to connect to the filebeat ec2 instance
(.env)$ ssh ec2-user@$filebeat_dns

```bash
# generate new logs, the -f 20 will generate 20 files at 30 second intervals
$ ./log_generator.py -f 20

Navigate to Kibana and view the logs generated. They will appear in the Dashboard for Apache Logs as they are generated.

Cleanup

To clean up the stacks... destroy the elkk-vpc stack, all other stacks will be torn down due to dependencies.

CloudWatch logs will need to be separately removed.

(.env)$ cdk destroy elkk-vpc

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Managed ELKK

Table of Contents

Context

Components

Prerequisites

Create the Managed ELKK

Bootstrap the CDK

Amazon Virtual Private Cloud

Amazon Managed Streaming for Apache Kafka

Filebeat

Amazon Elasticsearch Service

Kibana

Amazon Athena

Logstash

Cleanup

License

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
athena		athena
elastic		elastic
filebeat		filebeat
helpers		helpers
img		img
kafka		kafka
kibana		kibana
logstash		logstash
vpc		vpc
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Config		Config
LICENSE		LICENSE
README.md		README.md
app.py		app.py
bootstrap.sh		bootstrap.sh
cdk.json		cdk.json
cloud9.md		cloud9.md
requirements.txt		requirements.txt
setup.py		setup.py

License

aws-samples/aws-cdk-managed-elkk

Folders and files

Latest commit

History

Repository files navigation

Amazon Managed ELKK

Table of Contents

Context

Components

Prerequisites

Create the Managed ELKK

Bootstrap the CDK

Amazon Virtual Private Cloud

Amazon Managed Streaming for Apache Kafka

Filebeat

Amazon Elasticsearch Service

Kibana

Amazon Athena

Logstash

Cleanup

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages