OpenObserve as a replacement for Elasticsearch

AlirezaKamrani719 10 views 11 slides Oct 31, 2025
Slide 1
Slide 1 of 11
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11

About This Presentation

OpenObserve as a replacement for Elasticsearch


Slide Content

Introduction to OpenObserve with a
comparison of Elasticsearch

OpenObserve: Technical Overview

1. Introduction

OpenObserve is an open-source, cloud-native
observability platform that unifies logs, metrics,
traces, RUM / session replay / analytics in a
single system. It is built for high-volume data
(potentially petabyte-scale), offering high
performance ingestion and query speed, but
with a strong emphasis on storage cost
efficiency and scalable architecture.

Some of its core design goals:

+ Support unified observability workflows
(logs + metrics + traces).

+ Use open, columnar file formats (Parquet)
and object-storage for long term retention.

+ Simplify operations via Kubernetes/Helm
deployments, horizontal scalability and HA
modes.

+ Provide native query engine (SQL / PromQL)

+ dashboarding / alerting.

2. Architecture & High Availability
Deployment Modes

¢ Single-node mode: supports lighter usage or
testing. Uses SQLite or local disk storage.

+ According to documentation, a single-node
setup with default configuration can ingest
up to ~2.6 TB/day on a Mac M2 in tests.

+ HA/ Cluster mode: uses distributed
components.

e InHA mode, local-disk storage is not
supported; you must use object storage (S3
/ compatible).

e Components like router, querier, ingester,
compactor, alert manager can be scaled
horizontally.

e Acluster coordinator is used (NATS) for
orchestration / messaging among nodes.

Scalability 8 Redundancy
+ Because core components are separate
(ingesters, queriers, routers), it's possible to

deploy redundant instances of each. If one
fails or is overloaded, traffic can be re-routed
via router / coordinator logic.

+ Use of object-storage (S3 / Azure Blob /
GCS) means that stored data is shared /
accessible by any querier / compactor
nodes; no data tied to a single node's disk.

+ Kubernetes / Helm support simplifies the
deployment of multiple replicas, pod-level
scaling, monitoring, etc.

Thus OpenObserve provides horizontal scaling
and fault-tolerance (e.g. ingesters can failover,
queriers can scale, routers can balance load).

It is not as fully “enterprise-grade” (yet) as long-
established systems, but its HA mode supports
the essential distributed-system primitives
required for production observability.

3. Compression & Storage Savings

One of OpenObserve's strong technical
differentiators is its storage model, compression
techniques, and the resulting cost / size savings
compared to traditional log / observability
stacks (e.g. Elasticsearch).

Key points:

Storage Format

+ OpenObserve stores log / metrics / trace
data in Apache Parquet format (a columnar
file format widely used in data analytics).

+ It applies compression using zstd
(Zstandard), which is a modern compression
algorithm allowing good compression ratios
while keeping decompression speed
acceptable.

Why Compression & Format Choices Help

+ Columnar storage means that only columns
needed by a query are read (column
pruning). This avoids scanning entire rows
when not needed.

+ Predicate pushdown (e.g. time-range
partitioning, field-level partitioning / filters)
allows much less data to be read on query.

+ Additional techniques include bloom filters
(for high-cardinality fields), inverted-index

option (on specific fields), and partitioning
strategies (time-based, hash partitioning on
fields) to reduce file sizes / scanned data.

Claimed Storage Savings

+ The OpenObserve team cites cases of
~140x lower storage cost versus
Elasticsearch in some workloads.

+ In their blog, they mention that by using
object storage + Parquet + compression,
they can retain more data longer at a
fraction of typical log system cost.

e They also claim much faster ingestion (e.g.
ingestion speed per CPU core is high
because they avoid heavy indexing/
uninverted index overhead).

Performance / Throughput
+ On ingestion side: OpenObserve does not do
full-text indexing for all data (unless you
enable inverted-index on specific fields).
That reduces CPU work at ingest time.
+ Because of that, ingestion rate is

significantly higher than comparable index-
heavy systems. E.g. documentation reports
ingestion speeds of “7-30 MB/sec per vCPU
core” for typical workloads, and with tuning
perhaps double that.

+ They also offer a setting
(ZO_FEATURE_PER_THREAD_LOCK) to
reduce ingestion bottlenecks (by having a
WAL per CPU core to remove lock
contention) — useful for very high ingestion
rates.

So compression & storage savings are a
combination of file format, partition strategy,
light indexing, and efficient ingestion pipeline.

4. Technical Setup / Deployment (High-Level)
Below is a refined version of how you might set
up OpenObserve for production-style usage with
HA and object storage, drawn from
documentation:

Components
A minimal HA / production deployment will

include:

+ Router(s)

+ Ingester(s)

+ Querier(s)

¢ Compactor(s)

+ AlertManager

© Coordinator (via NATS)

+ Metadata database (SQLite / Postgres /
MySQL for metadata)

© Object storage bucket (e.g. AWS S3, GCS,
Azure Blob or S3-compatible MinlO)

Environment / Configuration

+ Use Helm chart for Kubernetes deployment.
You can configure values: storage type
(object storage), number of replicas for each
component, resource limits (CPU / memory).

¢ Invalues.yaml for Helm, set storage.s3 (or
other) with bucket, region, credentials.

+ Configure partitioning settings for streams:
default time-based partitioning plus
optionally field-based partitions or hash
buckets.

+ Configure cache / memory settings (RAM

cache, disk cache) for query performance.
E.g. ZO_MEMORY_CACHE_MAX_SIZE,
ZO_DISK_CACHE_ENABLED, etc.

+ Configure result-caching if needed
(ZO_RESULT_CACHE_ENABLED, etc.).

Scaling / HA Considerations

+ Deploy multiple replicas of ingesters to
spread ingestion load.

+ Deploy multiple queriers to distribute query
load.

+ Ensure NATS (cluster messaging /
coordination) is configured persistently (and
HA-capable itself).

+ Ensure object storage versioning or lifecycle
policies are configured (for retention).

+ Monitor resource usage and adjust CPU /
memory following guidance (e.g. ingester
sizing: many ingestion cores benefit from
more memory; CPU-to-RAM ratio ~2x for
ingestion nodes suggested).

Example (Helm / Kubernetes)

While | don't have one full-page YAML in this
summary, you'd follow these steps:

e Add Helm repo:

helm repo add openobserve https://
openobserve.github.io/helm-charts

+ Prepare a values-production.yaml with
sections such as:

storage: type: s3 s3: bucket: my-openobserve-
logs access_key: <KEY> secret_key: <SECRET>
region: <REGION> replicaCounts: ingester: 2
querier: 2 router: 2 resources: ingester: limits:
cpu: "4" memory: "8Gi" requests: cpu: "2"
memory: "4Gi"

e Set environment settings (via Helm chart
extraEnvs) for cache sizes, special flags like
ZO_FEATURE_PER_THREAD_LOCK,
ZO_RESULT_CACHE_ENABLED, etc.

« Install / upgrade:

helm install openobserve openobserve/
openobserve -f values-production.yaml

+ Port-forward or set up ingress / load-
balancer to expose the Ul (port 5080 by
default).

¢ Configure your ingestion pipeline

(OpenTelemetry collector, Prometheus
remote_write, log forwarding agents) to send
to the ingestion endpoints.

Monitoring & Backup

+ Monitor internal usage / metrics
(OpenObserve may expose its internal
metrics via usage streams / meta-org for
ingestion volumes, node performance, error
rates).

+ Back up your metadata database (Postgres /
MySQL / etc) if used.

+ For object storage, ensure you have lifecycle
/ retention / versioning policies (e.g.
transition older logs to cold-storage, delete
after retention period).

5. Limitations & Open Questions
While OpenObserve offers advanced features
and promising performance / cost advantages,
there are some caveats and areas to verify:
+ Many figures (e.g. “140x lower storage
cost”) are quoted by the project based on

internal benchmarks; real-world savings will
depend on your data's schema, cardinality,
field counts, and query patterns.

Full-text search by default is not as
optimized (since inverted index is opt-in), so
for certain queries (e.g. “find substring in log
message across millions of records”),
Elasticsearch's inverted index may still
outperform unless you carefully configure
inverted-index fields.

Operational maturity (monitoring of cluster
health, backup & restore of object storage,
NATS cluster scaling) needs to be evaluated
in your environment.

Query latency under heavy load with many
users / dashboards should be tested in pilot
phase.