Apidays Singapore 2024 - OpenTelemetry for API Monitoring by Danielle Kayumbi, Capgemini

APIdays_official 84 views 95 slides May 03, 2024
Slide 1
Slide 1 of 95
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95

About This Presentation

OpenTelemetry for API Monitoring
Danielle Kayumbi, Software Engineer - Capgemini

Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024)

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conference...


Slide Content

OpenTelemetry For API Monitoring

Danielle KAYUMBI HODIEB Software Engineer danielleKayumbi

API Security breaches

Impact the business

AWS Historic Outage December 7th 2021

https://aws.amazon.com/fr/about-aws/global-infrastructure/localzones/locations/

. . .

Deliver packages

Netflix, Disney+ ... Stream movies

Play online Video games

https://www.supermarktblog.com/2017/11/13/antippen-statt-anstellen-wie-vapiano-und-mcdonalds-den-bestellprozess-im-schnellrestaurant-umkrempeln/ Order in fastfood restaurants

Delta, Southwest ... Buy airline tickets

Medical exams Were delayed

And so where Education exams

Deploy in prod New automatic job Scale up

Huge volume of requests

Huge volume of retry calls

Congestion

9 hours to fix

How to prevent & discover ?

Identify outages Errors Latency Bottlenecks Third parties

Alert ! Send Relevant messages

Distributed systems

Amazon Microservices architecture https://www.divante.com/blog/10-companies-that-implemented-the-microservice-architecture-and-paved-the-way-for-others

Netflix More than 700 microservices https://www.divante.com/blog/10-companies-that-implemented-the-microservice-architecture-and-paved-the-way-for-others

Observe the system

Monitoring the operations from time to time

Monitoring is too deterministic

Analyse data continuously

Understand the relationships between systems

Observability The key success

Monitoring Observability

Traces Metrics Logs Telemetry data

Generate, collect, export Telemetry data

Help make a system observable

Accepted in 2019 Moved to matury level in 2021

OpenTracing OpenCensus

Fundamentals

Distributed tracing

https://eng.blackbuck.com/distributed-tracing-at-blackbuck/ Follow request journey across various systems

Log

Log Message emitted by services or other components

[ 2024-04-18T09:05:08.906198+00:00 ] app.INFO : Entity has been successfully persisted Log (Syslog sample)

Log Lack contextual information Tracking code execution

Log Correlated with trace & spans Included as part of a span

Span

Span Unit of work or operation . Tracks specific operations that a request makes.

{ “name” : “get-customer-cart” “context” : { “trace_id” : “7bba9f33312b3dbb8b2c2c62bb7abe2d” , “span_id” : “086e83747d0e381e” }, “parent_id” : “” , “start_time” : “2024-04-18 16:04:01.209458162 +0000 UTC” , “end_time” : “2024-04-18 16:04:01.209458162 +0000 UTC” , “status_code” : “OK” , “attributes” : { “http.method” : “GET” , “user_id” : “123”, “ cart_id”: “456” }} } }

Sensitive services (command/query handlers…) Controllers (Http, AMQP actions…) Repositories (database, message broker, http client…)

Inject span

Annotation

Distributed trace

Distributed trace Records the paths taken by requests, Through multi-service architecture. Trace = collection of spans

https://opentelemetry.io/

Cart service Product service API [GET] product by id API [GET] cart by id

Get cart operation Get products Retrieve cart db Get product operation Retrieve product db Trace Cache product Context Product service Cart service

Context propagation Keep correlation between signals Handle by instrumentation librairies W3C Recommendation for interoperability

https://www.datadoghq.com/

OpenTelemetry collectors

https://docs.lightstep.com/docs/quick-start-collector

https://www.elastic.co/blog/best-practices-instrumenting-opentelemetry

https://www.datadoghq.com/

Service Level Management Four golden signals Monitoring Dashboards

Four Golden Signals Latency Traffic Saturation Errors

Latency https://cloud.google.com/blog/products/management-tools/the-right-metrics-to-monitor-cloud-data-pipelines

Traffic https://cloud.google.com/blog/products/management-tools/the-right-metrics-to-monitor-cloud-data-pipelines

Errors https://www.datadoghq.com/

Saturation https://cloud.google.com/blog/products/management-tools/the-right-metrics-to-monitor-cloud-data-pipelines

https://www.datadoghq.com/

Service level

Service Level Objective “Less than 1% of users should experience an idle of 5s”

Service Level Indicator Defined quantitative measure

Availability, latency “How many requests could be handled ?”

AWS outage resolution

https://www.thousandeyes.com/blog/aws-outage-analysis-dec-7-2021

Comprehensive observability

Thank you ! danielleKayumbi