Unified Observability - Alkin Tezuysal - OpenTechDay Summit September 2025 Final .pptx.pdf
askdba
0 views
56 slides
Sep 27, 2025
Slide 1 of 56
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
About This Presentation
This talk supports the Kubernetes and observability ecosystem by showcasing how to consolidate metrics, logs, traces, and eBPF data into a unified backend using ClickHouse. It demonstrates how teams can reduce tool sprawl, simplify their observability pipelines, and improve troubleshooting across di...
This talk supports the Kubernetes and observability ecosystem by showcasing how to consolidate metrics, logs, traces, and eBPF data into a unified backend using ClickHouse. It demonstrates how teams can reduce tool sprawl, simplify their observability pipelines, and improve troubleshooting across distributed systems. By integrating with open standards like OpenTelemetry and tools like Prometheus, Jaeger, and Grafana, this approach promotes scalable, open-source observability practices within cloud-native environments.
Size: 7.66 MB
Language: en
Added: Sep 27, 2025
Slides: 56 pages
Slide Content
1www.altinity.com
@Altinity Inc. 2025
Unified Observability
Alkin Tezuysal, Director of Services - Altinity Inc.
Leveraging ClickHouse as a Comprehensive
Telemetry Database
Open Tech Day
Sept, 2025
2www.altinity.com
@Altinity Inc. 2025
Let’s Get Connected!
@ask_DBA
X
linkedin.com/in/askdba/
LinkedIn
Alkin Tezuysal
Director of Services @AltinityDB
Open Source Database Evangelist
Previously
ChistaDATA, PlanetScale, Percona and Pythian
as Senior Technical Manager, SRE, DBA
Earlier in Life
Enterprise DBA , Informix, Oracle, DB2 , SQL Server
Recent Recognitions:
Most Influential in Database Community 2022 - The Redgate 100
MySQL Cookbook, 4th Edition 2022 - O'Reilly Media, Inc.
MySQL Rockstar 2023 - Oracle (MySQL Community)
Database Design and Modeling with PostgreSQL and MySQL
2024 - <Packt>
Oracle ACE Pro 2025 - Oracle
3www.altinity.com
@Altinity Inc. 2025
Agenda
01 Observibility Basics
02 Introduction to ClickHouse
03 Coroot vs SigNoz vs hyperDX —
with ClickHouse + OpenTelemetry (2025 Update)
04 OpenTelemetry aka OTel features
05 Monitoring tools vs DIY observibility
4www.altinity.com
@Altinity Inc. 2025
Sailing Trivia
What does the Beaufort Scale measure?
The Beaufort Scale measures
wind force/speed, ranging from
0 (calm) to 12 (hurricane).
5www.altinity.com
@Altinity Inc. 2025
What’s New
Since March 2025
●Ecosystem refresh: OpenTelemetry stable for metrics, traces, logs
●ClickHouse-backed stacks maturing rapidly
●Today’s focus
○Architectures & scalability for Coroot, SigNoz, hyperDX
○How ClickHouse is used for metrics/traces/logs
○When to choose which stack
○Migration patterns & ops tips
6www.altinity.com
@Altinity Inc. 2025
2025 Landscape:
OpenTelemetry + ClickHouse
●OpenTelemetry (OTel) has become the default telemetry standard
●ClickHouse widely adopted as unified store for logs/traces/metrics
●Why ClickHouse?
○High ingestion rates with strong compression
○Fast, columnar analytics at high cardinality
○Horizontal scale and low TCO
7www.altinity.com
@Altinity Inc. 2025
Challenges:
Disparate Telemetry Systems
Logs
Scrapers:
What’s the problem?
Metrics
Monitoring:
Is there a problem?
Traces
Exporters:
Where’s the problem?
Metrics
• OpenTelemetry
• Jaeger
• Grafana • Kibana
• PMM
• Fluentd •
Logstash • Vector
Traces Logs
0
1
0
2
0
3
8www.altinity.com
@Altinity Inc. 2025
The OpenTelemetry Project
Observability = Visibility + Understanding
Visibility: Metrics, Logs, Traces (signals from systems)
Understanding: Correlation, context, root-cause insights
OpenTelemetry provides the standard to collect all signals
ClickHouse enables scale and analysis for full observability
12www.altinity.com
@Altinity Inc. 2025
Storage?
Challenge
Solution is ClickHouse
13www.altinity.com
@Altinity Inc. 2025
Introducing ClickHouse
Massively Scalable
Can scale both horizontally
and vertically
Fast Execution
Columnar vectorized
execution
SQL Compatible
Support for ANSI SQL
14www.altinity.com
@Altinity Inc. 2025
ClickHouse for Observability
Open Telemetry
De-facto standard for
traces, metrics, logs
ClickHouse
Fast, reliable and scalable
column store.
15www.altinity.com
@Altinity Inc. 2025
Column Store Capabilities
are not limited to:
Full-text search
Efficient compression
Real-time analytics
Relational
Petabyte-scale
16www.altinity.com
@Altinity Inc. 2025
How Tables Written In Clickhouse?
Part
Part
Rewritten, Bigger Part
Update and delete also rewrite parts
Index Columns
Index Columns
Index Columns
17www.altinity.com
@Altinity Inc. 2025
Mergetree Family
Query efficiency
Unmerged,
freshly inserted
part
Fully merged
part
18www.altinity.com
@Altinity Inc. 2025
How does this help?
Fast writes Time-friendly Easy cleanup
Cost-effective
19www.altinity.com
@Altinity Inc. 2025
Data Transformation
& Management
•Materialized Views
•TTL
•Tiered storage
YES
YES
YES
21www.altinity.com
@Altinity Inc. 2025
ClickHouse: Updates Since
March 2025
●A new, native Parquet reader was introduced in version 25.8.
●In version 25.7, ClickHouse introduced a patch-part mechanism for UPDATE/DELETE
operations.
●25.8: JSON array-of-different-types are now inferred as Array(Dynamic) instead of
unnamed Tuple.
22www.altinity.com
@Altinity Inc. 2025
Integrations via
OpenTelemetry
Prometheus
ClickHouse
OpenTelementry
Collector
FluentD
OpenTelemetry
eBPF
Kafka
23www.altinity.com
@Altinity Inc. 2025
More Benefits
•Excellent compression, even with variable schemas
•Practically unlimited cardinality
•Horizontally scalable ingestion & querying
24www.altinity.com
@Altinity Inc. 2025
Challenges
•SQL is not PromQL*
•Overly complex for small data volumes*
•Not a turn-key solution
25www.altinity.com
@Altinity Inc. 2025
Complete
Observability Solution
26www.altinity.com
@Altinity Inc. 2025
OpenTelemetry: Updates
Since March 2025
●JS SDK 2.0 Released ~March–April. Major breaking changes: dropped support for older
Node.js versions
●Collector Releases (v0.135.0 / v1.41.0 etc.) Latest versions include enhancements:
●Semantic Conventions / Naming Best Practices
●AI Agent Observability / GenAI SIG
●Profile Signal / Support for Profiling
●Collector Performance, Batching & Exporter Enhancements
27www.altinity.com
@Altinity Inc. 2025
Finally
The OpenTelemetry project does not
include any kind of database or backend UI.
Strengths: zero-instrumentation, deep infra visibility
Trade-off: dual storage (Prom + ClickHouse) to
operate
HA: Prometheus federation + ClickHouse
replication
Scale: store-all traces/logs; CH compression lowers
cost
32www.altinity.com
@Altinity Inc. 2025
33www.altinity.com
@Altinity Inc. 2025
34www.altinity.com
@Altinity Inc. 2025
35www.altinity.com
@Altinity Inc. 2025
36www.altinity.com
@Altinity Inc. 2025
●ClickHouse as single telemetry
lake (logs/traces/metrics)
●Built-in OTel Collector (OTLP
4317/4318)
●HyperDX UI: unified search, SQL
console, session replay
●JSON-native schemas for fast log
search
●Simple deploy (Docker/K8s);
modular scale
●Strengths: unified experience,
extreme scale via CH
●Trade-off: newer stack; ops focus
shifts to CH best practices
●HA: CH replication; stateless
collectors/UI
●Scale: proven at very large event
rates with clustering
37www.altinity.com
@Altinity Inc. 2025
38www.altinity.com
@Altinity Inc. 2025
Sometimes,
Comparisons Are Good
●eBPF-based Node-Agent
●OTLP ingestion via Collector
Gateway
●Uses (mostly) standard
OpenTelemetry Exporter
schema + new schema for
profiles
●Automated root cause
analysis with AI companion
coroot
hyperDX
39www.altinity.com
@Altinity Inc. 2025
•Fingerprints for unique time series
•Indexed labels (via Materialized Views)
•Allows for efficient updates (ReplacingMergeTree)
•Null Engine for raw ingest
40www.altinity.com
@Altinity Inc. 2025
Schema Considerations
•ZSTD Compression
•Delta encoding
•Bloom filter indexes for maps (resources) and logs
•MergeTree, partitioned on time
•7-day TTL
41www.altinity.com
@Altinity Inc. 2025
OpenTelemetry Collector
Exporter for ClickHouse
•Maps for metadata
•Efficient full-body text-search
•Materialized View for span durations
42www.altinity.com
@Altinity Inc. 2025
Scaling for Production
High-availability considerations & architecture
44www.altinity.com
@Altinity Inc. 2025
The Altinity Operator
•PVC management
•Rolling upgrades
•Built-in monitoring
45www.altinity.com
@Altinity Inc. 2025
Alerting & Other
Considerations
Set up alerts in Grafana based on query
thresholds.
Integrate with notification channels like email,
Slack, or PagerDuty / Opsgenie.
46www.altinity.com
@Altinity Inc. 2025
OpenTelemetry
Ingestion Patterns
●OTLP everywhere: gRPC/HTTP endpoints for metrics, traces, logs
●Sidecars/DaemonSets: per-node collectors for buffering and batching
●Patterns by stack
○Coroot: OTel-compatible alongside eBPF agents
○SigNoz: OTel native collectors (with optional Kafka for spikes)
○ClickStack: bundled OTel collector tuned for CH ingestion
47www.altinity.com
@Altinity Inc. 2025
ClickHouse:
Schema & Optimization for Telemetry
●Time-partitioned MergeTree tables with TTL policies
●Materialized Views for rollups and derived indexes
●Sparse indexes and bloom filters on high-selectivity fields
●JSON column for semi-structured logs (fast key lookup)
●Compression (ZSTD/LZ4) to keep raw signals affordable
48www.altinity.com
@Altinity Inc. 2025
Scalability & HA Patterns
●Collectors: scale horizontally; batch + backpressure
●ClickHouse: shard by time and/or tenant; replicate for HA
●Prometheus (Coroot): federation + remote write for long-term
●Stateless UI/APIs: run multiple replicas behind LB
●Storage planning: hot vs warm tiers; object storage for archives
49www.altinity.com
@Altinity Inc. 2025
Quick Comparison (2025)
Coroot SigNoz hyperDX
Collection eBPF + OTel (OTLP) support OpenTelemetry-native (OTLP) OpenTelemetry-native (OTLP)
Storage Prometheus (metrics) + ClickHouse
(logs/traces)
ClickHouse (metrics/logs/traces) ClickHouse (metrics/logs/traces)
UX / UI Coroot UI w/ RCA, service maps Unified UI: dashboards, traces, logs,
alerts
HyperDX UI: unified search, SQL,
session replay
Deploy Self-hosted, on-prem focus Self-hosted or Cloud (managed) Self-hosted bundle (Docker/K8s),
Cloud as ClickStack, Altinity.Cloud
HA & Scale PromQL federation + CH
replication
CH cluster + multi-collector +
stateless API/UI
CH cluster + multi-collector +
stateless UI
Alerting SLO monitoring and alerting Alerts and Alerts-as-Code Search-based-thresholds
Best For Infra-heavy teams; fast
zero-instrumentation
App teams standardizing on OTel Teams wanting turn-key unified
stack at high scale
50www.altinity.com
@Altinity Inc. 2025
When to Choose What
●Coroot
○Need instant visibility via eBPF without code changes
○Hybrid metrics (Prometheus) + events (CH) is acceptable
●SigNoz
○You are (or will be) OTel-instrumented across services
○Prefer single-store CH + cohesive OSS UX; optional cloud
●hyperDX
○Desire a turn-key, ClickHouse-first bundle with unified UX as ClickStack
○Aiming for extreme scale and SQL/Lucene power
51www.altinity.com
@Altinity Inc. 2025
Migration & Rollout Patterns
●Start with traces: instrument a few critical services (OTel)
●Add logs next; ship to CH and enable correlation by trace_id
●Gradually bring metrics; align naming/labels to OTel semantics
●Plan retention tiers (hot, warm, archive) early
●Automate schema/TTL via IaC; add SLO/alerting as last mile
52www.altinity.com
@Altinity Inc. 2025
(ClickHouse + OTel)
Operations Tips
●Right-size batch sizes and insert concurrency at collectors
●Partition by time; avoid tiny parts; watch merges
●Use materialized views for frequent queries (e.g., error rates)
●Tune memory/disk for large scans; prefer ZSTD for logs
●Create guardrails: TTLs, quotas, and backpressure testing
53www.altinity.com
@Altinity Inc. 2025
Example Outcomes
●Unify logs+traces+metrics → faster MTTR via cross-signal pivot
●Store-all traces/logs with compression → longer retention at lower cost
●Standardize on OTel → easier vendor changes and team onboarding
●Single-store CH → simpler backups and infra footprint
54www.altinity.com
@Altinity Inc. 2025
Conclusion
Why Unified Observability Storage?
•Simplified management
•Simplified scaling
•Cost management
•Standardization and normalization of metadata
•Post-hoc dependency mapping
•Cross-signal correlation
55www.altinity.com
@Altinity Inc. 2025
Thank You
Q&A