Introduction to OpenObserve with a
comparison of Elasticsearch
OpenObserve: Technical Overview
1. Introduction
OpenObserve is an open-source, cloud-native
observability platform that unifies logs, metrics,
traces, RUM / session replay / analytics in a
single system. It is built for high-volume data
(potentially petabyte-scale), offering high
performance ingestion and query speed, but
with a strong emphasis on storage cost
efficiency and scalable architecture.
Some of its core design goals:
+ Support unified observability workflows
(logs + metrics + traces).
+ Use open, columnar file formats (Parquet)
and object-storage for long term retention.
+ Simplify operations via Kubernetes/Helm
deployments, horizontal scalability and HA
modes.
+ Provide native query engine (SQL / PromQL)
+ dashboarding / alerting.
2. Architecture & High Availability
Deployment Modes
¢ Single-node mode: supports lighter usage or
testing. Uses SQLite or local disk storage.
+ According to documentation, a single-node
setup with default configuration can ingest
up to ~2.6 TB/day on a Mac M2 in tests.
+ HA/ Cluster mode: uses distributed
components.
e InHA mode, local-disk storage is not
supported; you must use object storage (S3
/ compatible).
e Components like router, querier, ingester,
compactor, alert manager can be scaled
horizontally.
e Acluster coordinator is used (NATS) for
orchestration / messaging among nodes.
Scalability 8 Redundancy
+ Because core components are separate
(ingesters, queriers, routers), it's possible to
deploy redundant instances of each. If one
fails or is overloaded, traffic can be re-routed
via router / coordinator logic.
+ Use of object-storage (S3 / Azure Blob /
GCS) means that stored data is shared /
accessible by any querier / compactor
nodes; no data tied to a single node's disk.
+ Kubernetes / Helm support simplifies the
deployment of multiple replicas, pod-level
scaling, monitoring, etc.
Thus OpenObserve provides horizontal scaling
and fault-tolerance (e.g. ingesters can failover,
queriers can scale, routers can balance load).
It is not as fully “enterprise-grade” (yet) as long-
established systems, but its HA mode supports
the essential distributed-system primitives
required for production observability.
3. Compression & Storage Savings
One of OpenObserve's strong technical
differentiators is its storage model, compression
techniques, and the resulting cost / size savings
compared to traditional log / observability
stacks (e.g. Elasticsearch).
Key points:
Storage Format
+ OpenObserve stores log / metrics / trace
data in Apache Parquet format (a columnar
file format widely used in data analytics).
+ It applies compression using zstd
(Zstandard), which is a modern compression
algorithm allowing good compression ratios
while keeping decompression speed
acceptable.
Why Compression & Format Choices Help
+ Columnar storage means that only columns
needed by a query are read (column
pruning). This avoids scanning entire rows
when not needed.
+ Predicate pushdown (e.g. time-range
partitioning, field-level partitioning / filters)
allows much less data to be read on query.
+ Additional techniques include bloom filters
(for high-cardinality fields), inverted-index
option (on specific fields), and partitioning
strategies (time-based, hash partitioning on
fields) to reduce file sizes / scanned data.
Claimed Storage Savings
+ The OpenObserve team cites cases of
~140x lower storage cost versus
Elasticsearch in some workloads.
+ In their blog, they mention that by using
object storage + Parquet + compression,
they can retain more data longer at a
fraction of typical log system cost.
e They also claim much faster ingestion (e.g.
ingestion speed per CPU core is high
because they avoid heavy indexing/
uninverted index overhead).
Performance / Throughput
+ On ingestion side: OpenObserve does not do
full-text indexing for all data (unless you
enable inverted-index on specific fields).
That reduces CPU work at ingest time.
+ Because of that, ingestion rate is
significantly higher than comparable index-
heavy systems. E.g. documentation reports
ingestion speeds of “7-30 MB/sec per vCPU
core” for typical workloads, and with tuning
perhaps double that.
+ They also offer a setting
(ZO_FEATURE_PER_THREAD_LOCK) to
reduce ingestion bottlenecks (by having a
WAL per CPU core to remove lock
contention) — useful for very high ingestion
rates.
So compression & storage savings are a
combination of file format, partition strategy,
light indexing, and efficient ingestion pipeline.
4. Technical Setup / Deployment (High-Level)
Below is a refined version of how you might set
up OpenObserve for production-style usage with
HA and object storage, drawn from
documentation:
Components
A minimal HA / production deployment will
+ Use Helm chart for Kubernetes deployment.
You can configure values: storage type
(object storage), number of replicas for each
component, resource limits (CPU / memory).
¢ Invalues.yaml for Helm, set storage.s3 (or
other) with bucket, region, credentials.
+ Configure partitioning settings for streams:
default time-based partitioning plus
optionally field-based partitions or hash
buckets.
+ Configure cache / memory settings (RAM
cache, disk cache) for query performance.
E.g. ZO_MEMORY_CACHE_MAX_SIZE,
ZO_DISK_CACHE_ENABLED, etc.
+ Configure result-caching if needed
(ZO_RESULT_CACHE_ENABLED, etc.).
Scaling / HA Considerations
+ Deploy multiple replicas of ingesters to
spread ingestion load.
+ Deploy multiple queriers to distribute query
load.
+ Ensure object storage versioning or lifecycle
policies are configured (for retention).
+ Monitor resource usage and adjust CPU /
memory following guidance (e.g. ingester
sizing: many ingestion cores benefit from
more memory; CPU-to-RAM ratio ~2x for
ingestion nodes suggested).
Example (Helm / Kubernetes)
While | don't have one full-page YAML in this
summary, you'd follow these steps:
+ Port-forward or set up ingress / load-
balancer to expose the Ul (port 5080 by
default).
¢ Configure your ingestion pipeline
(OpenTelemetry collector, Prometheus
remote_write, log forwarding agents) to send
to the ingestion endpoints.
Monitoring & Backup
+ Monitor internal usage / metrics
(OpenObserve may expose its internal
metrics via usage streams / meta-org for
ingestion volumes, node performance, error
rates).
+ Back up your metadata database (Postgres /
MySQL / etc) if used.
+ For object storage, ensure you have lifecycle
/ retention / versioning policies (e.g.
transition older logs to cold-storage, delete
after retention period).
5. Limitations & Open Questions
While OpenObserve offers advanced features
and promising performance / cost advantages,
there are some caveats and areas to verify:
+ Many figures (e.g. “140x lower storage
cost”) are quoted by the project based on
internal benchmarks; real-world savings will
depend on your data's schema, cardinality,
field counts, and query patterns.
Full-text search by default is not as
optimized (since inverted index is opt-in), so
for certain queries (e.g. “find substring in log
message across millions of records”),
Elasticsearch's inverted index may still
outperform unless you carefully configure
inverted-index fields.
Operational maturity (monitoring of cluster
health, backup & restore of object storage,
NATS cluster scaling) needs to be evaluated
in your environment.
Query latency under heavy load with many
users / dashboards should be tested in pilot
phase.