- CodeCommit requires CloudWatch Events rule to trigger CodePipeline
- Can trigger lambda functions out of CodeCommit events
- AWS provides several managed policies:
AWSCodeCommitFullAccess
,AWSCodeCommitPowerUser
,AWSCodeCommitReadOnly
- Can use Approval Rule templates to e.g. trigger unit tests via CodeBuild
- CodePipeline can execute cross-region actions
- CodePipeline can deploy straight into S3
- CodePipeline can have custom actions that invoke job workers
- CodeBuild can be triggered directly from Github via web hook
- CodeBuild supports build badges, which provide an embeddable, dynamically generated image (badge) that displays the status of the latest build for a project
- In EC2/On-Premises deployment, a CodeDeploy deployment group is a set of individual instances targeted for a deployment. A deployment group contains individually tagged instances, Amazon EC2 instances in Amazon EC2 Auto Scaling groups, or both.
- CodeDeploy can terminate the original instances in the deployment group with a waiting period of 1 hour.
- CodeDeploy has a default timeout of 1 hour to wait for scripts to finish
- CodeDeploy failing on
AllowTraffic
can mean that health checks on ELB are misconfigured - Notifies via CloudWatch Events
- Amazon CodeGuru Profiler helps developers understand the runtime behaviour of their applications, improve performance, and decrease infrastructure costs.
- Amazon CodeGuru Reviewer is an automated code review service that identifies critical defects and deviation from coding best practices for Java and Python code. Works on PRs
- Reviewer can protect secrets and suggest code changes to mitigate
- CFN custom resources -> pre-signed URLs
- In a stackset, global resources (like S3) have to be unique
- CloudFormation drift detection requires manual intervention; use AWS Config to automate detection.
- By using a launch role via launch constraint, you can instead limit the end users’ permissions to the minimum they require for that product
- The template constraint limits the options that are available to end-users when they launch a product. It works by narrowing the allowable values for parameters that are defined in the product’s underlying AWS CloudFormation template
- Apply template constraints to ensure that the end users can use products without breaching the compliance requirements of your organization
- OpsWorks can create time-based instances for scaling of predictable workload, or load-based using CPU utilisation or load, or memory utilisation
- EC2 memory metrics are not collected by default and need to have CloudWatch agent installed
- EC2 can use built-in instance recovery
- An instance is scheduled to be retired when AWS detects irreparable failure of the underlying hardware that hosts the instance.
- When an instance reaches its scheduled retirement date, it is stopped or terminated by AWS.
- AWS also sends an AWS Health event, which you can monitor and manage by using Amazon CloudWatch Events.
- ASG lifecycle states:
Pending
(hooksPending:Wait
,Pending:Proceed
)InService
Terminating
(hooksTerminating:Wait
,Terminating:Proceed
)Terminated
Pending:Wait
lifecycle hook can allow AMI upgrades before bringing them into serviceTerminating:Wait
lifecycle hook to collect instance data (e.g. logs) before final termination- Tags mentioned in the Auto Scaling group are not propagated to EBS volumes
- ASG: A warm pool gives you the ability to decrease latency for your applications that have exceptionally long boot times, for example, because instances need to write massive amounts of data to disk.
- Can keep instances in pool running or stopped
- ASG can notify via SNS on failed instance launch
- Can use Amazon EventBridge or Amazon CloudWatch Events to track the Auto Scaling Events
- Can trigger Lambdas from ASG by filtering on EventBridge events
- CloudFormation + ASGs:
AutoScalingReplacingUpdate
:WillReplace
true
will wait for a complete replacement of the ASG and its instances before deleting the old ASGAutoScalingRollingUpdate
: replaces existing instance in ASG; valid options: MaxBatchSize, MinInstancesInService, MinSuccessfulInstancesPercent, PauseTime, SuspendProcesses, WaitOnResourceSignals
- Storage Gateway does not automatically refresh the cache if the files were added directly to S3.
RefreshCache
can be used to refresh the cache periodically.- Tape gateway is backed up by glacier, meant for backups etc
- File gatewayEC2 gets on-premises data into the cloud
- Volume gateway is cloud-backed iSCSI block storage volumes
AWS-AmazonLinuxDefaultPatchBaseline
is a predefined patch baseline, doesn't do custom patchesaws:runDocument
plugin runs SSM documents stored in Systems Manager or on a local shareaws:downloadContent
plugin downloads an SSM document from a remote location to a local share- Can use SSM to create AMIs
- ALBs can be configured for 'dual stack' mode that allows IPv4 and IPv6
- ALBs can have weightings between target groups
iam:passrole
passes a role to a service. E.g. a developer role to CloudFormation
- Firewall Manager can be used to configure and apply WAF ACLs to the ALBs in an AWS account. It can help centrally manage as well as apply them to new accounts added to the Organization in the future.
- KMS grants are commonly used by AWS services that integrate with AWS KMS to encrypt your data at rest.
- The service creates a grant on behalf of a user in the account, uses its permissions, and retires the grant as soon as its task is complete.
- Can be used for org-wide compliance
- AWS recommends a separate delegated GuardDuty administrator account
- Can auto-enable GuardDuty for all future Org accounts
- Can configure GuardDuty Trusted IP list and Threat IP list and work with findings based on those
- GuardDuty needs EventBridge for filtering
- AWS Config can ensure all EC2 instances are managed by AWS Systems Manager.
- AWS Config can find
ec2-volume-inuse-check
, but cannot detect how long a volume was unused for cloudformation-stack-drift-detection-check
checks if the actual configuration of a CloudFormation stack differs, or has drifteds3-bucket-ssl-requests-only
checks whether S3 buckets have policies that require requests to use SSL- Can deploy conformance packs into org accounts (from a delegated admin account)
- Config itself is per region, use Config Aggregator for centralised collection of findings across regions & accounts
- Uses aggregator account
- By default, AWS Config will not automatically remediate the accounts that disabled its CloudTrail. You must manually set this up using a CloudWatch Events rule and a custom Lambda function that calls the StartLogging API to enable CloudTrail back again. Furthermore, the
cloudtrail-enabled
AWS Config managed rule is only available for the periodic trigger type and not Configuration changes.
- Use EventBridge to get notifications on Control Tower events like
CreateManagedAccount
- Customizations for AWS Control Tower (CfCT) helps you customize your AWS Control Tower landing zone and stay aligned with AWS best practices. Customizations are implemented with AWS CloudFormation templates and service control policies (SCPs).
- CfCT capability is integrated with AWS Control Tower lifecycle events so that your resource deployments remain synchronized with your landing zone
- Are inherited down the path
org root
->ou
->accounts
- AWS Trusted Advisor checks identify ways to optimize your AWS infrastructure, improve security and performance, reduce costs, and monitor service quotas
- Cost Optimization, Performance, Security, Fault Tolerance, and Service Limits
- TrustedAdvisor can check for under-utilized EC2
- Trusted Advisor's primary integration point is CloudWatch Events
- With Trusted Advisor’s Service Limit Dashboard, you can view, refresh, and export utilization and limit data on a per-limit basis.
- Metrics are published on Amazon CloudWatch in which you can create custom alarms
- AWS Health is scanning public repos and can send events for compromised keys
- On detection of an exposed IAM access key, AWS Health generates an
AWS_RISK_CREDENTIALS_EXPOSED
CloudWatch Event. - Also lists AWS Scheduled maintenance events on Health Dashboard
- Can use CloudWatch Events/EventBridge to trigger workflows based on events
- Can monitor AWS Health events using Amazon EventBridge or CloudWatch Events by calling the AWS Health API
- Can set up trails for
- Data events: These events provide insight into the resource operations performed on or within a resource. These are also known as data plane operations.
- For S3 or Lambda data events
- Management events: Management events provide insight into management operations that are performed on resources in your AWS account. These are also known as control plane operations.
- Data events: These events provide insight into the resource operations performed on or within a resource. These are also known as data plane operations.
- NAT gateway does not span multiple AZs (instead: one gateway per AZ)
- Can send VPC Flow Logs to CloudWatch Logs
- Read replicas are always asynchronous
- AWS Aurora Global Database uses storage-based replication with typical latency of less than 1 second, using dedicated infrastructure that leaves your database fully available to serve application workloads.
- 1 primary region (read/write), up to 5 secondary regions (read)
- In the event of a regional degradation or outage, one of the second regions can be promoted to read and write capabilities in less than 1 minute.
- Aurora endpoints
- single built-in cluster endpoint, connects to the primary instance of the cluster
- reader endpoint for read-only connections for your Aurora cluster
- can have custom cluster endpoints (managed by Aurora) that can be READER. WRITER or ANY
- RDS creates and saves automated backups of your DB instance or Multi-AZ DB cluster during the backup window of your database.
- default: 30min backup during 8h per-region night
- Amazon RDS uses SNS to provide notification when an Amazon RDS event occurs.
- Can also use CloudWatch Events/Eventbridge
- Failover:
- AZ outages => RDS multi-AZ deployment
- Regional outages => RDS read replica
- Multi-region deployments are like multi-AZ deployments, but other regions can be used for reads. Read replicas can be in the same AZ, same region, or cross-region
- Read replicas have best RTO/RPO, but highest cost
- In DynamoDb
ThrottledWriteRequests
can help adjusting increase the maximum write capacity units for the table's Auto Scaling policy. WriteThrottleEvents
are requests to DynamoDB that exceed the provisioned write capacity units for a table or a global secondary index.- Can use Kinesis Data Streams to capture changes to DynamoDB
- Amazon DynamoDB global tables provide a single-digit millisecond latency and make sure the data is available across regions.
- DynamoDB Global Tables requires
- tables are created in each region already
- DynamoDB Streams is enabled
- Don't have multiple lambdas read from DynamoDB Streams
- Only one process per shard!
- Better to use fan-out pattern
- AWS Glue is an efficient way to store object metadata. Combination: S3 - Glue - Athena - QuickSight
- Can include a pre-calculated checksum as part of your request. Amazon S3 compares the provided checksum to the checksum that it calculates by using your specified algorithm
- Can activate access logs and use Athena for analysis/queries
- S3 cross-region replication is push-based: source bucket gets a replication rule, destination bucket gets a bucket policy, source needs IAM role for S3 service to assume
- Configure a replication rule within the source bucket to activate the replication process.
- Create a bucket policy in the destination bucket that grants the source bucket permission to replicate objects into it.
- In the source AWS account, create an IAM role that Amazon S3 can assume to replicate objects. Enable versioning in both buckets.
- AWS CloudTrail only logs bucket-level actions in your Amazon S3 buckets by default. If you want to record all object-level API activity in your S3 bucket, you can set up data events in CloudTrail
- API Gateway does not have specific metrics for individual http error codes like 403, only a generic
4XXError
metric - Can enable API caching in Amazon API Gateway to cache your endpoint's responses
- can set ECS tasks as a target of CloudWatch events
- ECS/Fargate logs
- add the required
logConfiguration
parameters to your task definition to turn on theawslogs
log driver
- add the required
- ECS/EC2
- container instances have an attached IAM role that contains
logs:CreateLogStream
andlogs:PutLogEvents
- to turn on the
awslogs
log driver, your Amazon ECS container instances require at least version 1.9.0 of the container agent
- container instances have an attached IAM role that contains
-
Is a web service for automatically scaling scalable resources for individual AWS services beyond Amazon EC2
- Lambda function provisioned concurrency
- DynamoDB tables and global secondary indexes
- Aurora replicas
- Amazon Elastic Container Service (ECS) services
- ...
-
Target tracking scaling – Scale a resource based on a target value for a specific CloudWatch metric.
-
Step scaling – Scale a resource based on a set of scaling adjustments that vary based on the size of the alarm breach.
-
Scheduled scaling – Scale a resource one time only or on a recurring schedule.
- OriginGroup: An origin group includes two origins (a primary origin and a second origin to failover to) and a failover criteria that you specify.
- SNS defines a delivery policy for each delivery protocol. The delivery policy defines how Amazon SNS retries the delivery of messages when server-side errors occur (when the system that hosts the subscribed endpoint becomes unavailable).
- When the delivery policy is exhausted, Amazon SNS stops retrying the delivery and discards the message
- —> unless a dead-letter queue is attached to the subscription.
- For ECS notifications on essential task stopped, used EventBridge
- For S3 fanout, use SNS and subscribe consumers to it
- CloudWatch Logs are always encrypted
- CloudWatch Metrics filters can be used to filter CloudWatch Logs
- Can create CloudWatch Alarm for the
StatusCheckFailed_System
metric and select the EC2 action to recover the instance - CloudWatch Logs Subscription for near realtime feed of log events
- "Getting logs out of CloudWatch for further processing"
- from CloudWatch Logs, to Kinesis, ElasticSearch or Lambda
- CloudWatch has a predefined dashboard for CodeBuild metrics
- You can call the EC2
CreateSnapshot
API directly as a target from CloudWatch Events.
- KMS monitors to CloudWatch, can define alarms and alert
- Can run X-Ray daemon on AWS Elastic Beanstalk
- X-Ray daemon uses UDP port 2000
- CodeDeploy states + lifecycle hooks
- CodeCommit IAM policies
- CodeCommit needs CloudWatch Events/EventBridge to detect PRs
- (EventBridge is the same service as CloudWatch Events, just with a new interface and more features exposed.)
- GitHub needs a web hook to start a CodePipeline
- CodeDeploy lifecycle hooks (reserved for CodeDeploy in parentheses):
ApplicationStop
- (
DownloadBundle
) BeforeInstall
- (
Install
) AfterInstall
ApplicationStart
ValidateService
BeforeBlockTraffic
- (
BlockTraffic
) AfterBlockTraffic
BeforeAllowTraffic
- (
AllowTraffic
) AfterAllowTraffic
- Integrate automated testing into CI/CD pipelines
- CloudWatch Logs + EventBridge to automate based on CodeBuild job results
- CodeDeploy + EventBridge to automate based on CodeDeploy job results
- EventBridge for CodePipeline scheduled events
- CodeDeploy can integrate with CloudWatch Alarms to pause deployments
- Build and manage artifacts
- CodeBuild + CodePipeline + CodeDeploy + S3 for artifacts
- S3 versioning + encryption required for CodePipeline
- Implement deployment strategies for instance, container, and serverless environments
- Elastic Beanstalk policies
- All at once - fastest, but causes downtime; all remaining options have zero downtime
- Rolling - still uses batches
- Rolling with additional batch - to maintain full capacity during deploy
- Immutable for when new & old versions must not be mixed and for fast rollback
- Traffic splitting: for canary deploys
- Blue/Green deployments: swap environment URLs; keep RDS in a separate stack; requires DNS change (all previous ones do not)
- Lambda
- canary deployments via alias weights
- use CodeDeploy default deploy options:
- Lambda:
LambdaLinear10PercentEvery10Minutes
(10% of traffic shifted at a time),LambdaCanary10Percent10Minutes
(one 10% and one 90% deploy) - EC2:
AllAtOnce
,OneAtATime
,HalfAtATime
- Lambda:
- ALB + EC2 + Route53 alias record swaps
- OpsWorks Stack cloning + Route53 alias swaps
- OpsWorks lifecycle stages
- Elastic Beanstalk policies
- Define cloud infrastructure and reusable components to provision and manage systems throughout their lifecycle
- CloudFormation cross-stack references use exports + Fn::ImportValue
- Inline Lambda functions in CFN
- Custom resource is used to invoke a Lambda function in AWS CloudFormation, the request will include a pre-signed URL. The Lambda function is responsible for returning a response to the pre-signed URL to indicate if the resource creation was successful or not.
- Deploy automation to create, onboard, and secure AWS accounts in a multi-account/multi-region environment
- Design and build automated solutions for complex tasks and large-scale environments
- CloudFormation + ASGs:
AutoScalingReplacingUpdate
:WillReplace:true
will wait for a complete replacement of the ASG and its instances before deleting the old ASGAutoScalingRollingUpdate
: replaces existing instance in ASG; valid options:MaxBatchSize
, MinInstancesInService, MinSuccessfulInstancesPercent, PauseTime, SuspendProcesses, WaitOnResourceSignals
- OpsWorks can create time-based instances for scaling of predictable workload, or load-based using CPU utilisation or load, or memory utilisation
- Collecting on-prem info:
- Application Discovery Agent (install on each VM) or Agentless Discovery Connector (separate VM)
- CloudFormation + ASGs:
- Implement highly available solutions to meet resilience and business requirements
- RDS:
- AZ outages => RDS multi-AZ deployment
- Regional outages => RDS read replica
- Multi-region deployments are like multi-AZ deployments, but other regions can be used for reads. Read replicas can be in the same AZ, same region, or cross-region
- Read replicas have best RTO/RPO, but highest cost
- Frontend traffic switching => Route53 failover
- AutoScaling with a min & max of 1 is actually sensible - it makes the instance auto-redeploy if it dies
- Route53 policies:
simple
,failover
,geolocation
,geoproximity
,latency
,multi-value answer
,weighted
- RDS:
- Implement solutions that are scalable to meet business requirements
- ASG lifecycle states:
Pending
(hooksPending:Wait
,Pending:Proceed
)InService
Terminating
(hooksTerminating:Wait
,Terminating:Proceed
)Terminated
- EC2 autoscaling
Pending:Wait
lifecycle hook can allow AMI upgrades before bringing them into service Terminating:Wait
lifecycle hook to collect instance data (e.g. logs) before final termination- EKS: k8s cluster autoscaler or karpenter
- EKS networking:
- VPC CNI plugin
- Load Balancer Controller
- CoreDNS
- kube-proxy
- Calico
- ASG lifecycle states:
- Hybrid environment patching
- Implement automated recovery processes to meet RTO/RPO requirements
- Configure the collection, aggregation, and storage of logs and metrics
- AWS Config Aggregator for centralised collection of findings across regions & accounts
- EC2 custom logging requirements => CloudWatch Logs Agent
- ECS Fargate logs => awslogs driver on task definition
- CloudWatch has a predefined dashboard for CodeBuild metrics
- Audit, monitor, and analyze logs and metrics to detect issues
- near real time dashboards => QuickSight
- near real time processing on CloudWatch logs:
- Lambda subscription filter
- Kinesis stream filter
- ElasticSearch (OpenSearch) subscription filter
- CloudTrail has log integrity checking which must be turned on
- Automate monitoring and event management of complex environments
- Service limit alerting => Trusted Advisor + CloudWatch Alarms + ServiceLimitUsage metric
- Manage event sources to process, notify, and take action in response to events
- S3 event notifications for data notifications like file deletion
- RDS event notifications for multi-AZ failover events
- EventBridge + AWS Health for notification about IAM credentials being exposed on GitHub, and for notifications about instance outages, etc.
- CloudTrail data events for object-level activity on S3
- EC2 Auto Scaling groups => EventBridge
- CodePipeline stage => EventBridge
- CodeDeploy => CloudWatch Alarm +
MinimumHealthyHosts
metric can be used for rollbacks - OpsWorks self-healing => EventBridge
- Implement configuration changes in response to events
- Troubleshoot system and application failures
- Implement techniques for identity and access management at scale
- Limit CodeCommit permissions via IAM policy which matches repo
- S3 bucket policies for requiring TLS
- Apply automation for security controls and data protection
- Lifecycle management + auto-rotation of secrets => Secrets Manager
- Cost-effective => SSM Parameter Store SecureStrings
- Patching => SSM Patch Manager
- Implement security monitoring and auditing solutions