QuestDB: The building blocks of a fast open-source time-series database

December 12-14 2023 December 12-14 2023
The Building Blocks of a
Time-Series database
Javier Ramirez
Database advocate
@supercoco9

Timestamp problems are hard

https://twitter.com/rshidashrf/status/975566803052134400

https://stackoverflow.com/questions/6841333/why-is-subtracting-these-two-epoch-milli-times-in-year-1927-giving-a-strange-r

1553983200
* This is a date. Which one?

When is this?

1/4/19
* is this April 19th? january 4th? April 1st?

Working with timestamped data in a
database is tricky*
* specially working with analytics of data changing over time or at a high rate

If you can use only one
database for everything, go
with PostgreSQL*
* Or any other major and well supported RDBMS

Some things RDBMS are not designed for
●Writing data faster than it is read (several millions of inserts per day and faster)
●Aggregations scoped to different time units (per year/minute/microsecond)
●Identifying gaps or missing data for a given interval
●Joining tables by approximate timestamp
●Sparse data (tables with hundreds or thousands of columns)
●Aggregates over billions of records

●a factory ﬂoor with 500 machines, or
●a ﬂeet with 500 vehicles, or
●50 trains, with 10 cars each, or
●500 users with a mobile phone

Sending data every second
How I made my ﬁrst billion

86,400
* Seconds in one day

604,800
* Seconds in one week

2,628,288
* Seconds in one month. Well, in the average month of 30.42 days anyway

43,200,000 rows a day…….
302,400,000 rows a week….
1,314,144,000 rows a month

How I made my ﬁrst billion
* See? On streaming data, It is kind of easy to get your ﬁrst billion of data points

Not all data
problems are
the same

Time-series database basics

●Optimised for fast append-only ingestion
●Data lifecycle policies
●Analytics over chunks of time
●Time-based aggregations
●Often power real-time dashboards

QuestDB would like to be known for:
●Performance
○Also with smaller machines

●Developer Experience

●Proudly Open Source (Apache 2.0)

Fast streaming
ingestion
* You can try ingesting streaming data using https://github.com/javier/questdb-quickstart

QuestDB ingestion and storage layer
●Data always stored by incremental timestamp.

●Data partitioned by time units and stored in columnar format.

●No indexes needed. Data is immediately available after writing.

●Predictable ingestion rate, even under demanding workloads (millions/second).

●Built-in event deduplication.

●Optimized data types (Symbol, geohash, ipv4, uuid).

●Row updates and upserts supported.

Lifecycle policies
ALTER TABLE my_table DROP PARTITION LIST '2021-01-01', '2021-01-02';
--Delete days before 2021-01-03
ALTER TABLE my_table DROP PARTITION WHERE timestamp < to_timestamp('2021-01-03', 'yyyy-MM-dd');
ALTER TABLE x DETACH PARTITION LIST '2019-02-01', '2019-02-02';

-- It is also possible to use WHERE clause to define the partition list
ALTER TABLE sensors DETACH PARTITION WHERE < '2019-02-03T00';
CREATE TABLE my_table (i symbol, ts timestamp) IN VOLUME SECONDARY_VOLUME;

Connectivity, protocols, and interfaces
●REST API and web console: Query execution, CSV imports/exports. Basic charts.

●Pgwire: perfect for querying, DDL, and DML. Ingestion supported, up to moderate
throughput. Compatible with any low-level postgresql client or library.

●Influx Line Protocol(ILP): socket-based, ingestion only, very high throughput. Official clients
available for C/C++, JAVA, Python, Rust, Go, NodeJS, and .Net.

●Health/Metrics: HTTP endpoint with Prometheus format

●Integrations with: Apache Kafka, Apache Flink, Apache Spark, Python Pandas, Grafana,
Superset, Telegraf, Redpanda, qStudio, SQLAlchemy, Cube…

The query engine

QuestDB Query engine internals
●Our Java codebase has zero dependencies. No garbage collection on
the hot path. As close to the hardware as possible.

●We research the latest trends. Our code takes advantage of the
state-of-the-art in CPU, storage design, and data structures.

●We implement our own Just in Time Compiler to make query execution
as parallel and fast as possible.

●We spend weeks of development to save microseconds or
nanoseconds in many operations.

The query language: SQL with
time-series extensions

LATEST ON … PARTITION BY …
Retrieves the latest entry by timestamp for a given key or combination of keys, for scenarios where multiple
time series are stored in the same table.

SELECT * FROM trades
LATEST ON timestamp PARTITION BY symbol;
Try it live on
https://demo.questdb.io

LATEST ON … PARTITION BY …
Retrieves the latest entry by timestamp for a given key or combination of keys, for scenarios where multiple
time series are stored in the same table.

SELECT * FROM trades
WHERE symbol in ('BTC-USD', 'ETH-USD')
LATEST ON timestamp PARTITION BY symbol, side;
Try it live on
https://demo.questdb.io

SAMPLE BY
Aggregates data in homogeneous time chunks

SELECT
timestamp,
sum(price * amount) / sum(amount) AS vwap_price,
sum(amount) AS volume
FROM trades
WHERE symbol = 'BTC-USD' AND timestamp > dateadd('d', -1, now())
SAMPLE BY 15m ALIGN TO CALENDAR;
SELECT timestamp, min(tempF),
max(tempF), avg(tempF)
FROM weather SAMPLE BY 1M;
Try it live on
https://demo.questdb.io

How do you ask your database to
return which data is not stored?

I am sending data every second or
so. Tell me which devices didn’t
send any data with more than 1.5
seconds gap

SAMPLE BY … FILL
Can fill missing time chunks using different strategies (NULL, constant, LINEAR, PREVious value)

SELECT
timestamp,
sum(price * amount) / sum(amount) AS vwap_price,
sum(amount) AS volume
FROM trades
WHERE symbol = 'BTC-USD' AND timestamp > dateadd('d', -1, now())
SAMPLE BY 1s FILL(NULL) ALIGN TO CALENDAR;
Try it live on
https://demo.questdb.io

WHERE … TIME RANGE
SELECT * from trips WHERE pickup_datetime in '2018';
SELECT * from trips WHERE pickup_datetime in '2018-06';
SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59';

Try it live on
https://demo.questdb.io

WHERE … TIME RANGE
SELECT * from trips WHERE pickup_datetime in '2018';
SELECT * from trips WHERE pickup_datetime in '2018-06';
SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59';

SELECT * from trips WHERE pickup_datetime in '2018;2M' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018;10s' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018;-3d' LIMIT -10;

Try it live on
https://demo.questdb.io

WHERE … TIME RANGE
SELECT * from trips WHERE pickup_datetime in '2018';
SELECT * from trips WHERE pickup_datetime in '2018-06';
SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59';

SELECT * from trips WHERE pickup_datetime in '2018;2M' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018;10s' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018;-3d' LIMIT -10;

SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59:58;4s;1d;7'
SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59:58;4s;-1d;7'

Try it live on
https://demo.questdb.io

What if I have two tables, where
data is (obviously) not sent at the
same exact timestamps and I want
to join by closest matching
timestamp?

ASOF JOIN (LT JOIN and SPLICE JOIN variations)
ASOF JOIN joins two different time-series measured. For each row in the ﬁrst time-series, the ASOF JOIN takes from
the second time-series a timestamp that meets both of the following criteria:
●The timestamp is the closest to the ﬁrst timestamp.
●The timestamp is strictly prior or equal to the ﬁrst timestamp.

WITH trips2018 AS (
SELECT * from trips WHERE pickup_datetime in '2016'
)
SELECT pickup_datetime, timestamp, fare_amount, tempF, windDir
FROM trips2018
ASOF JOIN weather;
Try it live on
https://demo.questdb.io

Some things we are trying out next for performance
●Compression, and exploring data formats like arrow/ parquet

●Own ingestion protocol

●Second level partitioning

●Improved vectorization of some operations (group by multiple columns or by expressions

●Add speciﬁc joins optimizations (index nested loop joins, for example)

QuestDB OSS
Open Source. Self-managed. Suitable for
production workloads.
https://github.com/questdb/questdb

QuestDB Enterprise
Licensed. Self-managed. Enterprise features like
RBAC, compression, replication, TLS on all
protocols, cold storage, K8s operator…
https://questdb.io/enterprise/

QuestDB Cloud
Fully managed, pay per usage environment,
with enterprise-grade features.
https://questdb.io/cloud/

OSA CON | December 12-14 2023
Q&A
●github.com/questdb/questdb
●https://questdb.io
●https://demo.questdb.io
●https://github.com/javier/questdb-quickstart
●https://slack.questdb.io/
40
Javier Ramirez
@supercoco9
We ?????? contributions
and GitHub ⭐ stars

QuestDB: The building blocks of a fast open-source time-series database

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

QuestDB: The building blocks of a fast open-source time-series database

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx