Amazon DevOps Guru for Serverless Applications at JAWS Pankration 2024

VadymKazulkin 29 views 39 slides Aug 24, 2024
Slide 1
Slide 1 of 39
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39

About This Presentation

In this talk we’ll use a standard serverless application that uses API Gateway, Lambda, DynamoDB, SQS, SNS, Kinesis, Step Functions, Aurora (Serverless) (and other AWS-managed services). We'll explore how Amazon DevOps Guru recognizes operational issues and anomalies like increased latency and...


Slide Content

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
Amazon DevOps Guru for the Serverless applications
Vadym Kazulkin, ip.labs, JAWS Pankration , August 24 2024
Amazon DevOps
Guru for the
Serverless
Applications
1

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
Contact
Vadym Kazulkin
ip.labs GmbH Bonn, Germany
Co-Organizer of the Java User Group Bonn
[email protected]
@VKazulkin
https://dev.to/vkazulkin
https://github.com/Vadym79/
https://de.slideshare.net/VadymKazulkin/
https://www.linkedin.com/in/vadymkazulkin
https://www.iplabs.de/

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Lifecycle
5 Amazon DevOps Guru for the Serverless Applications
c

Vadym Kazulkin| @VKazulkin | ip.labsGmbH8
AIOPs
ArtificialIntelligenceforITOperations(AIOps)istheprocessofusingmachine
learningtechniquestosolveoperationalproblems.ThegoalofAIOpsistoreduce
humaninterventionintheIToperationsprocesses.
Byusingadvancedmachinelearningtechniques,youcanreduceoperational
incidentsandincreaseservicequality.AIOpscanhelpyouwith:
•Predictincidentsbeforetheyhappen
•Classifynewincidentsandinsights
https://aws.amazon.com/devops-guru
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH9
What is AWS DevOps Guru
AmazonDevOpsGuruoffersafullymanagedAIOpsplatformpoweredbymachine
learning(ML)thatisdesignedtomakeiteasytoimproveanapplication’soperational
performanceandavailability
DevOpsGuruhelpsdetectbehaviorsthatdeviatefromnormaloperatingpatternsso
youcanidentifyoperationalissueslongbeforetheyimpactyourcustomers
•increasedlatency
•errorrates(timeouts,throttles,CPU,memoryand,diskutilization)
•resourceconstraints(exceedingAWSaccountlimits)
https://aws.amazon.com/devops-guru
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH10
Benefits of DevOps Guru
https://aws.amazon.com/devops-guru
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH11
https://aws.amazon.com/devops-guru
How DevOps Guru work
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH12
DevOps Guru is powered by pre-trained ML models
•Builtdomain-specific,single-purposemodelstoidentifyknownfailuremodesinstead
ofnormalmetricbehavior.
•DevOpsGurureliesonalargeensembleofdetectors—statisticalmodelstunedto
detectcommonadversescenariosinavarietyofoperationalmetrics.
•DevOpsGurudetectorsdon’tneedtobetrainedorconfigured.Theywork
instantlyaslongasenoughhistoryisavailable.
•Individualdetectorsworkinpreconfiguredensemblestogenerateanomalieson
someofthemostimportantmetrics:errorrates,availability,latency,incoming
requestrates,CPU,memory,anddiskutilization,amongothers.
https://aws.amazon.com/blogs/machine-learning/amazon-devops-guru-is-powered-by-pre-trained-ml-models-that-encode-operational-excellence/
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH14
DevOps Guru pre-trained ML detectors with periodic behaviors
•Many metrics, such as the number of
incoming requests in customer-facing
APIs, exhibit periodic behavior.
•The purpose of the causal convolution
detector is to analyze temporal data
with such patterns and to determine
expected periodic behavior.
•When the detector infers that a metric
is periodic, it adapts normal metric
behavior thresholds to the seasonal
pattern.
https://aws.amazon.com/blogs/machine-learning/amazon-devops-guru-is-powered-by-pre-trained-ml-models-that-encode-operational-excellence/
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru Example Application
15
https://github.com/Vadym79/DevOpsGuruWorkshopDemo inspired by https://github.com/aws-samples/serverless-java-frameworks-samples
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
How future of software developers may look like
16 Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH18
Monitoring & Alerting of the Serverless Applications
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH19
Monitoring & Alerting of the Serverless Applications
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH20
DevOps Guru Set Up
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH21
DevOps Guru Set Up with AWS Organizations
https://aws.amazon.com/blogs/mt/how-to-easily-configure-devops-guru-across-your-organization-with-systems-manager-quick-setup/
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru Dashboard
22 Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru Dashboard
23 Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru Reactive Insights
24 Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru: Recognize Operational Issues in DynamoDB
26 Amazon DevOps Guru for the Serverless Applications
c

Vadym Kazulkin| @VKazulkin | ip.labsGmbH
DevOps Guru Examples: DynamoDB Throttling
27
hey -q 20 -z 15m -c 20 -H "X-API-Key: XXXa6XXXX "
https://XXX.execute-api.eu-central
1.amazonaws.com/prod/products/1
Amazon DevOps Guru for the Serverless Applications
c

Vadym Kazulkin| @VKazulkin | ip.labsGmbH29
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH30
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH31
c
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH32
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH33
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH34
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH35
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH36
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH37
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH38
DevOps Guru Examples: DynamoDB Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH41
•API Gateway 5xx and 4xx (throttling and not found) errors
•Lambda errors (throttling, timeouts, initerrors, increased latency)
•SQS with Lambda poller(message in flight stays to long in the
queue, due to error in Lambda)
•DynamoDB/Kinesis Streams and Lambda (high number of
asynchronous Lambda retries, due to error in Lambda)
•StepFunctionsinvoking Lambda (high number of Error State and
Lambda retries, due to error in Lambda)
•RDS and Aurora (Serverless v2) (high CPU load and highly increased
number of the database connections)
Other types of operational insights in the Serverless
applications recognized by DevOps Guru
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH78
DevOps Guru Proactive Insights
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH88
DevOps Guru integration in Incident Management Tools
•AWS OPsCenter(via AWS Systems Manager)
•PagerDuty
•Atlassian Opsgenie
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH98
$3,024 per
resource per month
$2,016 per
resource per month
DevOps Guru Supported Services and Pricing
https://aws.amazon.com/de/devops-guru/pricing/
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH99
DevOps Guru Cost Estimator
https://aws.amazon.com/de/devops-guru/pricing/
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH100
DevOps Guru Conclusions, Obeservations, Suggestions
•Most operational issues have been correctly recognized so far
•It took several (at least 7) minutes to create an incident after
anomaly appeared
•Correctly no insights created for the temporary incidents
•Short time Lambda, DynamoDB and API Gateway Throttling
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH101
DevOps Guru Conclusions, Obeservations, Suggestions
•Recommendations for the insight reason could be more precise (these
are limitations of CloudWatch though)
•No precise HTTP response code as API Gateway response but 4XX
and 5XX
•No differentiation between Lambda throttling because of reaching
individual function concurrency limit or the total AWS account
concurrency limit
•No differentiation between Lambda Timeout and Init Error
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH105
•#AWS #Wishlist for DevOps Guru
•Support for EventBridge(and EventBridgePipes)
•Support for AppSync
•Support for Aurora Serverless v2 over Data API
•Better support for tracing i.e. AWS X-Ray, CloudWatch ServiceLens
and integrations with the 3
rd
observability tools i.e. Lumigo,
Datadog
DevOps Guru Conclusions, Obeservations, Suggestions
Amazon DevOps Guru for the Serverless Applications

Vadym Kazulkin| @VKazulkin | ip.labsGmbH107
Thank you
Amazon DevOps Guru for the Serverless Applications