Detect operational anomalies in Serverless Applications with Amazon DevOps Guru at AWS Cloud Day Warsaw 2024

VadymKazulkin 15 views 80 slides Sep 21, 2024
Slide 1
Slide 1 of 80
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80

About This Presentation

In this talk we’ll use a standard serverless application that uses API Gateway, Lambda, DynamoDB, SQS, SNS, Kinesis, Step Functions, Aurora (Serverless) (and other AWS-managed services). We'll explore how Amazon DevOps Guru recognizes operational issues and anomalies like increased latency and...


Slide Content

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
byVadymKazulkin
ip.labsGmbH
17.10.2023
How to Reduce
Cold Starts for
Java Serverless
Applications in AWS
GraalVM, AWS SnapStartand Co

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
Contact
Vadym Kazulkin
ip.labs GmbH Bonn, Germany
Co-Organizer of the Java User Group Bonn
[email protected]
@VKazulkin
https://dev.to/vkazulkin
https://github.com/Vadym79/
https://de.slideshare.net/VadymKazulkin/
https://www.linkedin.com/in/vadymkazulkin
https://www.iplabs.de/

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
About ip.labs
3 Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Lifecycle
4 Amazon DevOps Guru for the Serverless Applications
c

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
Amazon DevOps Guru
5 Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH6
AIOPs
ArtificialIntelligenceforITOperations(AIOps)istheprocessofusing
machinelearningtechniquestosolveoperationalproblems.Thegoalof
AIOpsistoreducehumaninterventionintheIToperationsprocesses.
Byusingadvancedmachinelearningtechniques,youcanreduce
operationalincidentsandincreaseservicequality.AIOpscanhelpyou
with:
•Increaseservicequality
•forexample,bygroupingrelatedincidentsbasedontimeand
language
•Predictincidentsbeforetheyhappen
https://aws.amazon.com/devops-guru
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH7
What is AWS DevOps Guru
AmazonDevOpsGuruoffersafullymanagedAIOpsplatformpowered
bymachinelearning(ML)thatisdesignedtomakeiteasytoimprovean
application’soperationalperformanceandavailability
DevOpsGuruhelpsdetectbehaviorsthatdeviatefromnormaloperating
patternssoyoucanidentifyoperationalissueslongbeforetheyimpact
yourcustomers
•increasedlatency
•errorrates(timeouts,throttles,CPU,memoryand,diskutilization)
•resourceconstraints(exceedingAWSaccountlimits)
https://aws.amazon.com/devops-guru
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH8
Benefits of DevOps Guru
https://aws.amazon.com/devops-guru
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH9
https://aws.amazon.com/devops-guru
How DevOps Guru work
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH10
DevOps Guru is powered by pre-trained ML models
•Builtdomain-specific,single-purposemodelstoidentifyknownfailure
modesinsteadofnormalmetricbehavior.
•DevOpsGurureliesonalargeensembleofdetectors—statisticalmodels
tunedtodetectcommonadversescenariosinavarietyofoperational
metrics.
•DevOpsGurudetectorsdon’tneedtobetrainedorconfigured.They
workinstantlyaslongasenoughhistoryisavailable.
•Individualdetectorsworkinpreconfiguredensemblestogenerate
anomaliesonsomeofthemostimportantmetrics:errorrates,
availability,latency,incomingrequestrates,CPU,memory,anddisk
utilization,amongothers.
https://aws.amazon.com/blogs/machine-learning/amazon-devops-guru-is-powered-by-pre-trained-ml-models-that-encode-operational-excellence/
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH12
DevOps Guru pre-trained ML detectors with periodic behaviors
•Many metrics, such as the number of
incoming requests in customer-facing
APIs, exhibit periodic behavior.
•The purpose of the causal
convolution detector is to analyze
temporal data with such patterns and
to determine expected periodic
behavior.
•When the detector infers that a
metric is periodic, it adapts normal
metric behavior thresholds to the
seasonal pattern.
https://aws.amazon.com/blogs/machine-learning/amazon-devops-guru-is-powered-by-pre-trained-ml-models-that-encode-operational-excellence/
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
How future of software developers may look like
13 Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH14
Monitoring & Alerting of the Serverless Applications
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH15
Monitoring & Alerting of the Serverless Applications
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru Example Application
16
https://github.com/Vadym79/DevOpsGuruWorkshopDemo inspired by https://github.com/aws-samples/serverless-java-frameworks-samples
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH17
DevOps Guru Set Up
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH18
DevOps Guru Set Up with AWS Organizations
https://aws.amazon.com/blogs/mt/how-to-easily-configure-devops-guru-across-your-organization-with-systems-manager-quick-setup/
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru Dashboard
19 Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru Dashboard
20 Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru Reactive Insights
21 Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru Examples
22
•Warm up the application (takes between 1 and 24 hours) to create a base line
•Design test experiment to provoke errors and latency increase
•Reduce the service quote of the AWS service (API Gateway, Lambda,
DynamoDB)
•Set very low service quotas for the sake of reducing AWS costs
•Add latency artificially
•Stress test with HeyTool to run into the operational issues
•See if the DevOps Guru recognized the operational issues
•Remediate the operational issues by increasing service quote, removing the
artificial latency or stopping the stress test
•See whether DevOps Guru closes the incident when it’s resolved
https://github.com/rakyll/hey
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru: Recognize Operational Issues in DynamoDB
23 Amazon DevOps Guru for the Serverless Applications
c

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru Examples: DynamoDB Throttling
24
hey -q 20 -z 15m -c 20 -H "X-API-Key: XXXa6XXXX "
https://XXX.execute-api.eu-central
1.amazonaws.com/prod/products/1
Amazon DevOps Guru for the Serverless Applications
c

Vadym Kazulkin| @VKazulkin | ip.labsGmbH25
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH26
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH27
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH28
c
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH29
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH30
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH31
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH32
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH33
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH34
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru: Recognize Operational Issues in DynamoDB
35 Amazon DevOps Guru for the Serverless Applications
c

Vadym Kazulkin| @VKazulkin | ip.labsGmbH36
DevOps Guru Examples: API Gateway
HTTP 429 „too many requests“ Error
Query to exaust the quota
hey -q 10 -z 1m -c 10 -H "X-API-Key:
XXXa6XXXX" https://XXX.execute-api.eu
-central-1.amazonaws.com/prod/
products/1
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH37
DevOps Guru Examples: API Gateway
HTTP 404 „Not Found“ Error
Query for not existing product id, e.g. 200
hey -q 1 -z 15m -c 1 -H "X-API-Key: XXXa6XXXX" https://XXX.execute-
api.eu-central-1.amazonaws.com/prod/products/200
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH38 Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru: Recognize Operational Issues in DynamoDB
39 Amazon DevOps Guru for the Serverless Applications
c

Vadym Kazulkin| @VKazulkin | ip.labsGmbH40
DevOps Guru Examples: Lambda Throttling 1
hey -q 5 -z 15m -c 5 -H "X-API-Key: XXXa6XXXX" https://XXX.execute-api.eu-
central-1.amazonaws.com/prod/products/1
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH41
DevOps Guru Examples: Lambda Throttling 1
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH42
Add 31 sec latency in the code of the Lambda function
DevOps Guru Examples: Lambda Timeout Error
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH43
DevOps Guru Examples: Lambda Error
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH44
Temporary add 28 sec latency in the code of
the Lambda function
DevOps Guru Examples: Lambda Increased Latency
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH45
DevOps Guru Examples: Lambda Increased Latency
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH46
DevOps Guru: Recognize Operational Issues in SQS
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH47
Temporary add 26 sec latency in
the code of the Lambda function
DevOps Guru: Operational Issues in SQS
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH48
DevOps Guru: Operational Issues in SQS
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH49
DevOps Guru: Recognize Operational Issues Amazon
in Kinesis
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH50
DevOps Guru Examples: Operational Issues in
Amazon Kinesis Data Stream -> Lambda -> (S3)
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH51
DevOps Guru: Recognize Operational Issues in
AWS Step Functions
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH52
DevOps Guru Examples: Operational Issues
in Amazon Step Functions -> Lambda
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH53
DevOps Guru: Recognize Operational Issues in Aurora
Serverless v2 PostgreSQL
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH54
DevOps Guru Examples: Enabling Performance
Insights for Aurora Serverless v2
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH55
DevOps Guru Examples: Operational Issues Lambda -
> Aurora Serverless v2 w/o RDS Proxy
hey -q 100 -z 15m -c 100 -H "X-API-Key: XXXa6XXXX" https://XXX.execute-
api.eu-central-1.amazonaws.com/prod/productsWithoutDataApi/2
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH56
DevOps Guru: Recognize Operational Issues in Aurora
Serverless v2 PostgreSQL using DataAPI
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH57
DevOps Guru Examples: Operational Issues Lambda -> Aurora
Serverless v2 using DataAPI
hey -q 100 -z 15m -c 100 -H "X-API-Key: XXXa6XXXX" https://XXX.execute-
api.eu-central-1.amazonaws.com/prod/productsWithDataApi/2
Amazon DevOps Guru for the Serverless Applications
No Aurora Serverless DB anomalous metrics
detected

Vadym Kazulkin| @VKazulkin | ip.labsGmbH58
DevOps Guru Examples: Operational Issues Lambda -> Aurora
Serverless v2 using DataAPI
hey -q 100 -z 15m -c 100 -H "X-API-Key: XXXa6XXXX" https://XXX.execute-
api.eu-central-1.amazonaws.com/prod/productsWithDataApi/1
Amazon DevOps Guru for the Serverless Applications
Data APINon Data API Non Data APIData API
Non Data API
Non Data APIData API Data API
Non Data APIData API

Vadym Kazulkin| @VKazulkin | ip.labsGmbH59
DevOps Guru Proactive Insights
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH60
DevOps Guru Proactive Examples: DynamoDB table
reads/writes are under utilized
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH61
DevOps Guru Proactive Examples: DynamoDB table
point in time recovery not enabled
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH62
DevOps Guru Proactive Examples: Lambda
timeout exceeds recommended SQS visibility
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH63
DevOps Guru Proactive Examples: Lambda Timeout Exceeds
Recommended SQS Visibility
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH64
DevOps Guru Proactive Examples: SQS Triggered Lambda
Does Not Have a DLQ
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH65
DevOps Guru Proactive Examples: Lambda Function Consuming
DynamoDB/Kinesis Stream Without Failure Destination
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH66
DevOps Guru Proactive Examples: Lambda Function Has
Concurrency Spillover
hey -q 1 -z 30m -c 9 -m DELETE -H "X-API-Key: XXXa6XXXX" -H "Content-Type: application/json;charset=utf-
8" https://XXX.execute-api.eu-central-1.amazonaws.com/prod/products/11
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH67
DevOps Guru Proactive Examples: Lambda Function
does not have enough subnets
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH68
DevOps Guru integration in Incident
Management Tools
•AWS OPsCenter(via AWS Systems Manager)
•PagerDuty
•Atlassian Opsgenie
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH69
DevOps Guru Integration Settings
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH70
DevOps Guru Integration with PagerDuty
https://www.pagerduty.com/docs/guides/amazon-devops-guru-integration-guide/
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH71
DevOps Guru Integration with PagerDuty
https://www.pagerduty.com/docs/guides/amazon-devops-guru-integration-guide/
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH72
DevOps Guru Integration with PagerDuty
Enter „Integration
URL“ generated by
PagerDuty
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH73
DevOps Guru PagerDuty Incidents
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH74
DevOps Guru Supported Services and Pricing
https://aws.amazon.com/de/devops-guru/pricing/
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH75
$3,024 per
resource per month
$2,016 per
resource per month
DevOps Guru Supported Services and Pricing
https://aws.amazon.com/de/devops-guru/pricing/
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH76
DevOps Guru Cost Estimator
https://aws.amazon.com/de/devops-guru/pricing/
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH77
DevOps Guru Conclusions, Obeservations, Suggestions
•Most operational issues have been correctly recognized so far
•It took several (at least 7) minutes to create an incident after
anomaly appeared
•Correctly no insights created for the temporary incidents
•Short time Lambda, DynamoDB and API Gateway Throttling
•Lambda duration anomalous insights (Duration p90)
•took time to create such an insight (sometimes more than 30
minutes). Maybe because of the medium severity
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH78
DevOps Guru Conclusions, Obeservations, Suggestions
•Recommendations for the insight reason could be more precise (these are
limitations of CloudWatch though)
•No precise HTTP response code as API Gateway response but 4XX and
5XX
•No differentiation between Lambda throttling because of reaching
individual function concurrency limit or the total AWS account
concurrency limit
•No differentiation between Lambda Timeout and Init Error
•DevOps Guru Proactive Insights
•Missed some important ones, like not used Lambda Provisioned
Concurrency for a long period of time
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH79
•#AWS #Wishlist for DevOps Guru
•Support for EventBridge(and EventBridgePipes)
•Support for AppSync
•Support for Aurora (Serverless v2 )over DataAPI
•Better support for tracing i.e. AWS X-Ray, CloudWatch ServiceLens
and integrations with the 3
rd
observability tools i.e. Lumigo,
Datadog
DevOps Guru Conclusions, Obeservations, Suggestions
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
by Vadym Kazulkin
ip.labs GmbH
17.10.2023
How to Reduce
Cold Starts for
Java Serverless
Applications in AWS
GraalVM, AWS SnapStartand Co

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
by Vadym Kazulkin
ip.labs GmbH
17.10.2023
How to Reduce
Cold Starts for
Java Serverless
Applications in AWS
GraalVM, AWS SnapStartand Co