Unified Observability - Alkin Tezuysal - OpenTechDay Summit September 2025 Final .pptx.pdf

askdba 0 views 56 slides Sep 27, 2025
Slide 1
Slide 1 of 56
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56

About This Presentation

This talk supports the Kubernetes and observability ecosystem by showcasing how to consolidate metrics, logs, traces, and eBPF data into a unified backend using ClickHouse. It demonstrates how teams can reduce tool sprawl, simplify their observability pipelines, and improve troubleshooting across di...


Slide Content

1www.altinity.com
@Altinity Inc. 2025
Unified Observability
Alkin Tezuysal, Director of Services - Altinity Inc.
Leveraging ClickHouse as a Comprehensive
Telemetry Database
Open Tech Day
Sept, 2025

2www.altinity.com
@Altinity Inc. 2025
Let’s Get Connected!
@ask_DBA
X
linkedin.com/in/askdba/
LinkedIn
Alkin Tezuysal
Director of Services @AltinityDB
Open Source Database Evangelist
Previously
ChistaDATA, PlanetScale, Percona and Pythian
as Senior Technical Manager, SRE, DBA

Earlier in Life
Enterprise DBA , Informix, Oracle, DB2 , SQL Server

Recent Recognitions:
Most Influential in Database Community 2022 - The Redgate 100
MySQL Cookbook, 4th Edition 2022 - O'Reilly Media, Inc.
MySQL Rockstar 2023 - Oracle (MySQL Community)
Database Design and Modeling with PostgreSQL and MySQL
2024 - <Packt>
Oracle ACE Pro 2025 - Oracle

3www.altinity.com
@Altinity Inc. 2025
Agenda
01 Observibility Basics
02 Introduction to ClickHouse
03 Coroot vs SigNoz vs hyperDX —
with ClickHouse + OpenTelemetry (2025 Update)
04 OpenTelemetry aka OTel features
05 Monitoring tools vs DIY observibility

4www.altinity.com
@Altinity Inc. 2025
Sailing Trivia


What does the Beaufort Scale measure?

The Beaufort Scale measures
wind force/speed, ranging from
0 (calm) to 12 (hurricane).

5www.altinity.com
@Altinity Inc. 2025
What’s New
Since March 2025
●Ecosystem refresh: OpenTelemetry stable for metrics, traces, logs
●ClickHouse-backed stacks maturing rapidly
●Today’s focus
○Architectures & scalability for Coroot, SigNoz, hyperDX
○How ClickHouse is used for metrics/traces/logs
○When to choose which stack
○Migration patterns & ops tips

6www.altinity.com
@Altinity Inc. 2025
2025 Landscape:
OpenTelemetry + ClickHouse
●OpenTelemetry (OTel) has become the default telemetry standard
●ClickHouse widely adopted as unified store for logs/traces/metrics
●Why ClickHouse?
○High ingestion rates with strong compression
○Fast, columnar analytics at high cardinality
○Horizontal scale and low TCO

7www.altinity.com
@Altinity Inc. 2025
Challenges:
Disparate Telemetry Systems
Logs
Scrapers:
What’s the problem?

Metrics
Monitoring:
Is there a problem?
Traces
Exporters:
Where’s the problem?

Metrics
• OpenTelemetry
• Jaeger

• Grafana • Kibana
• PMM

• Fluentd •
Logstash • Vector

Traces Logs
0
1
0
2
0
3

8www.altinity.com
@Altinity Inc. 2025
The OpenTelemetry Project
Observability = Visibility + Understanding

Visibility: Metrics, Logs, Traces (signals from systems)
Understanding: Correlation, context, root-cause insights
OpenTelemetry provides the standard to collect all signals
ClickHouse enables scale and analysis for full observability

9www.altinity.com
@Altinity Inc. 2025
Services
●Applications: Web, gRPC, Databases, Msg
Queues
●Infrastructure: K8s, Linux, Containers, Cloud
runtimes
●Cloud & SaaS integrations

Collector
●Ingest
○Process
■Export
Visualize
●Grafana
Backend
●Metrics
●Traces
●Logs
03
02
01
The OpenTelemetry Project

10www.altinity.com
@Altinity Inc. 2025
What’s stored?

●Metrics, traces, logs, profiles, events
●Resource metadata
●Graphs & topologies
●Snapshots & deltas
●Configuration

11www.altinity.com
@Altinity Inc. 2025
Observibility Pipeline
Prometheus
StoragePipeline
FluentD
OpenTelemetry
eBPF
Kafka

12www.altinity.com
@Altinity Inc. 2025
Storage?
Challenge
Solution is ClickHouse

13www.altinity.com
@Altinity Inc. 2025
Introducing ClickHouse
Massively Scalable
Can scale both horizontally
and vertically
Fast Execution
Columnar vectorized
execution
SQL Compatible
Support for ANSI SQL

14www.altinity.com
@Altinity Inc. 2025
ClickHouse for Observability
Open Telemetry
De-facto standard for
traces, metrics, logs
ClickHouse
Fast, reliable and scalable
column store.

15www.altinity.com
@Altinity Inc. 2025
Column Store Capabilities
are not limited to:
Full-text search
Efficient compression
Real-time analytics
Relational
Petabyte-scale

16www.altinity.com
@Altinity Inc. 2025
How Tables Written In Clickhouse?
Part
Part
Rewritten, Bigger Part
Update and delete also rewrite parts
Index Columns
Index Columns
Index Columns

17www.altinity.com
@Altinity Inc. 2025
Mergetree Family
Query efficiency
Unmerged,
freshly inserted
part
Fully merged
part

18www.altinity.com
@Altinity Inc. 2025
How does this help?
Fast writes Time-friendly Easy cleanup
Cost-effective

19www.altinity.com
@Altinity Inc. 2025
Data Transformation
& Management
•Materialized Views
•TTL
•Tiered storage
YES
YES
YES

20www.altinity.com
@Altinity Inc. 2025
Integrations
•Grafana Datasource Plugin
•Jaeger w/ ClickHouse backend
•Kafka table engine

21www.altinity.com
@Altinity Inc. 2025
ClickHouse: Updates Since
March 2025
●A new, native Parquet reader was introduced in version 25.8.
●In version 25.7, ClickHouse introduced a patch-part mechanism for UPDATE/DELETE
operations.
●25.8: JSON array-of-different-types are now inferred as Array(Dynamic) instead of
unnamed Tuple.

22www.altinity.com
@Altinity Inc. 2025
Integrations via
OpenTelemetry
Prometheus
ClickHouse
OpenTelementry
Collector
FluentD
OpenTelemetry
eBPF
Kafka

23www.altinity.com
@Altinity Inc. 2025
More Benefits
•Excellent compression, even with variable schemas
•Practically unlimited cardinality
•Horizontally scalable ingestion & querying

24www.altinity.com
@Altinity Inc. 2025
Challenges
•SQL is not PromQL*
•Overly complex for small data volumes*
•Not a turn-key solution

25www.altinity.com
@Altinity Inc. 2025
Complete
Observability Solution

Application SDK
Host / Node
Agents
Storage
Analysis UI
ClickHouse
Storage
Gateway
Collection
Sampling
Processing

26www.altinity.com
@Altinity Inc. 2025
OpenTelemetry: Updates
Since March 2025
●JS SDK 2.0 Released ~March–April. Major breaking changes: dropped support for older
Node.js versions
●Collector Releases (v0.135.0 / v1.41.0 etc.) Latest versions include enhancements:
●Semantic Conventions / Naming Best Practices
●AI Agent Observability / GenAI SIG
●Profile Signal / Support for Profiling
●Collector Performance, Batching & Exporter Enhancements

27www.altinity.com
@Altinity Inc. 2025
Finally
The OpenTelemetry project does not
include any kind of database or backend UI.

28www.altinity.com
@Altinity Inc. 2025
Options

29www.altinity.com
@Altinity Inc. 2025
SigNoz:
OTel-native, single-store ClickHouse backend
Key Points

OTLP ingestion via OTel Collector
(metrics/traces/logs)
All telemetry in ClickHouse (unified storage)
Dashboards, trace explorer, logs explorer, alerts
Cloud or self-hosted; multi-tenant ready
2025: richer logs pipeline, advanced query builder
Details

Strengths: standardized instrumentation, cohesive
UX
Trade-off: requires app instrumentation (or
auto-instrumentation)
HA: CH cluster + multiple collectors + stateless
API/UI
Scale: 10TB+/day patterns with proper CH sizing

30www.altinity.com
@Altinity Inc. 2025
Batteries-included, no-code observability, AI agent

31www.altinity.com
@Altinity Inc. 2025
Coroot:
eBPF + OTel, PromQL for metrics, ClickHouse for events
Key Points

Node Agent (eBPF) auto-discovers telemetry
Prometheus-compatible metrics pipeline
ClickHouse for logs, traces, profiles
AI-enabled root-cause insights (RCA)
Size-based retention controls for CH
Details

Strengths: zero-instrumentation, deep infra visibility
Trade-off: dual storage (Prom + ClickHouse) to
operate
HA: Prometheus federation + ClickHouse
replication
Scale: store-all traces/logs; CH compression lowers
cost

32www.altinity.com
@Altinity Inc. 2025

33www.altinity.com
@Altinity Inc. 2025

34www.altinity.com
@Altinity Inc. 2025

35www.altinity.com
@Altinity Inc. 2025

36www.altinity.com
@Altinity Inc. 2025
●ClickHouse as single telemetry
lake (logs/traces/metrics)
●Built-in OTel Collector (OTLP
4317/4318)
●HyperDX UI: unified search, SQL
console, session replay
●JSON-native schemas for fast log
search
●Simple deploy (Docker/K8s);
modular scale
●Strengths: unified experience,
extreme scale via CH
●Trade-off: newer stack; ops focus
shifts to CH best practices
●HA: CH replication; stateless
collectors/UI
●Scale: proven at very large event
rates with clustering

37www.altinity.com
@Altinity Inc. 2025

38www.altinity.com
@Altinity Inc. 2025
Sometimes,
Comparisons Are Good


●ClickHouse (metrics/logs/traces)
●HyperDX UI: unified search, SQL,
session replay
●Self-hosted bundle (Docker/K8s),
Cloud
●CH cluster + multi-collector + stateless
UI
●Exploratory analysis at any scale

●eBPF-based Node-Agent
●OTLP ingestion via Collector
Gateway
●Uses (mostly) standard
OpenTelemetry Exporter
schema + new schema for
profiles
●Automated root cause
analysis with AI companion

coroot
hyperDX

39www.altinity.com
@Altinity Inc. 2025

•Fingerprints for unique time series
•Indexed labels (via Materialized Views)
•Allows for efficient updates (ReplacingMergeTree)
•Null Engine for raw ingest

40www.altinity.com
@Altinity Inc. 2025
Schema Considerations
•ZSTD Compression
•Delta encoding
•Bloom filter indexes for maps (resources) and logs
•MergeTree, partitioned on time
•7-day TTL

41www.altinity.com
@Altinity Inc. 2025
OpenTelemetry Collector
Exporter for ClickHouse
•Maps for metadata
•Efficient full-body text-search
•Materialized View for span durations

42www.altinity.com
@Altinity Inc. 2025
Scaling for Production
High-availability considerations & architecture

43www.altinity.com
@Altinity Inc. 2025
Managing Multiple Collectors
OpenTelemetry
Collector
(daemonset)
OpenTelemetry
Collector
(daemonset)
OpenTelemetry
Collector
(daemonset)
OpenTelemetry
Collectors
(Deployment)
ClickHouse
®
Cluster
(w/ Replicas)
Keeper
(w/ Replicas)
Altinity
®
Operator

44www.altinity.com
@Altinity Inc. 2025
The Altinity Operator
•PVC management
•Rolling upgrades
•Built-in monitoring

45www.altinity.com
@Altinity Inc. 2025
Alerting & Other
Considerations

Set up alerts in Grafana based on query
thresholds.

Integrate with notification channels like email,
Slack, or PagerDuty / Opsgenie.

46www.altinity.com
@Altinity Inc. 2025
OpenTelemetry
Ingestion Patterns
●OTLP everywhere: gRPC/HTTP endpoints for metrics, traces, logs
●Sidecars/DaemonSets: per-node collectors for buffering and batching
●Patterns by stack
○Coroot: OTel-compatible alongside eBPF agents
○SigNoz: OTel native collectors (with optional Kafka for spikes)
○ClickStack: bundled OTel collector tuned for CH ingestion

47www.altinity.com
@Altinity Inc. 2025
ClickHouse:
Schema & Optimization for Telemetry
●Time-partitioned MergeTree tables with TTL policies
●Materialized Views for rollups and derived indexes
●Sparse indexes and bloom filters on high-selectivity fields
●JSON column for semi-structured logs (fast key lookup)
●Compression (ZSTD/LZ4) to keep raw signals affordable

48www.altinity.com
@Altinity Inc. 2025
Scalability & HA Patterns
●Collectors: scale horizontally; batch + backpressure
●ClickHouse: shard by time and/or tenant; replicate for HA
●Prometheus (Coroot): federation + remote write for long-term
●Stateless UI/APIs: run multiple replicas behind LB
●Storage planning: hot vs warm tiers; object storage for archives

49www.altinity.com
@Altinity Inc. 2025
Quick Comparison (2025)
Coroot SigNoz hyperDX
Collection eBPF + OTel (OTLP) support OpenTelemetry-native (OTLP) OpenTelemetry-native (OTLP)
Storage Prometheus (metrics) + ClickHouse
(logs/traces)
ClickHouse (metrics/logs/traces) ClickHouse (metrics/logs/traces)
UX / UI Coroot UI w/ RCA, service maps Unified UI: dashboards, traces, logs,
alerts
HyperDX UI: unified search, SQL,
session replay
Deploy Self-hosted, on-prem focus Self-hosted or Cloud (managed) Self-hosted bundle (Docker/K8s),
Cloud as ClickStack, Altinity.Cloud
HA & Scale PromQL federation + CH
replication
CH cluster + multi-collector +
stateless API/UI
CH cluster + multi-collector +
stateless UI
Alerting SLO monitoring and alerting Alerts and Alerts-as-Code Search-based-thresholds
Best For Infra-heavy teams; fast
zero-instrumentation
App teams standardizing on OTel Teams wanting turn-key unified
stack at high scale

50www.altinity.com
@Altinity Inc. 2025
When to Choose What
●Coroot
○Need instant visibility via eBPF without code changes
○Hybrid metrics (Prometheus) + events (CH) is acceptable
●SigNoz
○You are (or will be) OTel-instrumented across services
○Prefer single-store CH + cohesive OSS UX; optional cloud
●hyperDX
○Desire a turn-key, ClickHouse-first bundle with unified UX as ClickStack
○Aiming for extreme scale and SQL/Lucene power

51www.altinity.com
@Altinity Inc. 2025
Migration & Rollout Patterns
●Start with traces: instrument a few critical services (OTel)
●Add logs next; ship to CH and enable correlation by trace_id
●Gradually bring metrics; align naming/labels to OTel semantics
●Plan retention tiers (hot, warm, archive) early
●Automate schema/TTL via IaC; add SLO/alerting as last mile

52www.altinity.com
@Altinity Inc. 2025
(ClickHouse + OTel)
Operations Tips
●Right-size batch sizes and insert concurrency at collectors
●Partition by time; avoid tiny parts; watch merges
●Use materialized views for frequent queries (e.g., error rates)
●Tune memory/disk for large scans; prefer ZSTD for logs
●Create guardrails: TTLs, quotas, and backpressure testing

53www.altinity.com
@Altinity Inc. 2025
Example Outcomes
●Unify logs+traces+metrics → faster MTTR via cross-signal pivot
●Store-all traces/logs with compression → longer retention at lower cost
●Standardize on OTel → easier vendor changes and team onboarding
●Single-store CH → simpler backups and infra footprint

54www.altinity.com
@Altinity Inc. 2025
Conclusion
Why Unified Observability Storage?
•Simplified management
•Simplified scaling
•Cost management
•Standardization and normalization of metadata
•Post-hoc dependency mapping
•Cross-signal correlation

55www.altinity.com
@Altinity Inc. 2025
Thank You
Q&A

56www.altinity.com
@Altinity Inc. 2025
Resources & Next Steps
●OpenTelemetry: opentelemetry.io • Collector, SDKs, semantic conv.
●ClickHouse docs: clickhouse.com/docs • JSON, TTL, MV, sharding
●Coroot docs: coroot.com/docs
●SigNoz docs: signoz.io/docs
●hyperDX : https://github.com/hyperdxio/hyperdx
●ClickStack: clickhouse.com/clickstack