Grafana Alloy Best Practice presented in COSCUP 2024

ChenYiHuang5 968 views 32 slides Aug 04, 2024

Slide 1 of 32

About This Presentation

Grafana Alloy Best Practice

Size: 3 MB

Language: en

Added: Aug 04, 2024

Slides: 32 pages

Slide Content

Grafana Alloy
Best Practice

Eric Huang
LINE Taiwan / SRE
2021 :E-SUN bank
2022 :LINE Taiwan
Kubernetes, Rust, eBPF
2
titanericchen-yi-huang

01
Introduction

•Collect metrics from client side & browser
•Monitor web application performance
•Discover error
•Track user behavior (session)
Real User Monitoring (RUM)
Source: Web Vitals, User-centric performance metrics,Grafana Faro OSS
4

(Distributed) Tracing
Source: Observability primer | OpenTelemetry
5
•Represent the full journey of request though distributed environment
•Improve the visibility of the app
•Diagnose the source of error

(Distributed) Tracing
6
Source: COSCUP 2024

“Alloy is a flexible, high performance, vendor-neutral distribution of the OpenTelemetry Collector”
Key features:
•Custom components
•Chained components
•Debugging utilities
Adopt faro.receiver component with Faro SDK
Grafana Alloy
Source: Grafana Alloy | Grafana Alloy documentation
7

“Grafana Faro includes a highly configurable web SDK for real user monitoring that instruments browser frontend applications to capture observability signals.”
Key features:
•Monitoring applications performance
•Captures errors, logs, user activity
•Instrument performance and observe full stack
Grafana Faro
Web SDK
Source: Grafana Faro OSS | Web SDK for real user monitoring (RUM)
9

02
Results

End-to-End Tracing
Spans include:
•frontend app (nextjs)
•ingress controller (traefik)
•web framework (flask)
•http client library (requests)
12

RUM dashboards
official RUM dashboard: Loki datasource
13

RUM dashboardsimproved RUM dashboard: Prometheus datasource
14

Session/Trace Explore
15

Session Detail
16

03
Present
Architecture

Technology stack
18
Infra managed by JP, KR
Serve managed by TW SRE

Architecture (User)
19

Architecture (SRE)
20

How to
design?
04

Requirements
Must have:
•Adopt present observability
platform
•Easy to deploy alloy service
automatically
•Control traffic load sent
from real user
Nice to have:
•Easy application for new
tenant
•Slack workflow
•Sample code for SSR and
CSR app
•Nextjs based sample app
22

Alloy Architecture (User)
23

Alloy Architecture (SRE)
24

Why?
•Adopt gateway instead of individual ingress for each cluster?
•Unified traffic control by SRE
•Decouple the business logic and telemetry traffic
•Easy deployment for alloy
•Cut down the Security Review procedure
25

Why?
•Choose Contour instead of Traefik, or other ingress controller?
•Contour is more performant and less memory consumption
•Envoy Gateway is considered, but k8s version is not compatible
26

•Handle incoming large amount of traffic
•Load test and tuning for Contour and Alloy
•3 levels of protections
1.Client side sampling
2.Contour rate limit
3.Grafana Alloy rate limit
•Increasing load from Loki and Tempo
•Continuously tuning for Loki and Tempo
•Individual rate limit for each tenant
Challenges
Load Test Report:
Alloy: 1500 RPS (1 core, 1Gi)
Envoy: 10000 connection (3 core, 1Gi)
27

•Web vitals is stored in Loki instead of Prometheus
•Adopt Loki Rulers to ingest Loki query result into Prometheus
•Faster loading for real user monitoring dashboard
•Constrained trace propagation in present architecture
•Upgrade or update the trace propagation in the intermediate
•Block trace propagation header from API gateway
•Add allowed list for trace context header (e.g., TraceParent, Uber-Trace-Id)
Challenges
28

Challenges
29Source: DevOpsDays Taipei 2024Source: DevOpsDays Taipei 2023

Future work
05

•Upgrade Traefik to v3.0 to adopt OpenTelemetry
•Resolve the issue of unbalanced requests to OTEL collector
•Zero-code instrumentation by eBPF (e.g., Grafana Beyla)
•Continuously tuning for Tempo, Loki, and Alloy
31

Grafana Alloy Best Practice presented in COSCUP 2024

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Grafana Alloy Best Practice presented in COSCUP 2024

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

figurative-language power point.pptththtrht

Figurative-Language-powerpoint.pptgttgth

Plasma proteins functions electroforesis - Copy.pptx

FORMATO 4. PLANTEAMIENTO DEL PROBLEMA. 4 Oct. 2022.ppt

Cristiano Ronaldo jugador portugués, la leyenda

🎮✨ Top 10 Most Used Software Tools by Indie Game Developers (2025 Edition)