Where Does Your Data Go When You Put In Into A Database_.pdf

davidmstokes 0 views 47 slides Oct 14, 2025
Slide 1
Slide 1 of 47
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47

About This Presentation

Do you ever think about what a PostgreSQL database server does with your data when you enter it?
This presentation takes you through the steps that happen to your data when you insert it.
You will see how PostgreSQL decides where to place your table, how it organizes a block of data, and how to tr...


Slide Content

1PAGE
Where Does
Your Data Go
When You Put
In Into A
Database?
David Stokes
DBeaver

Slide deck @ https://www.slideshare.net/davidmstokes
1

2PAGE
Description
Do you ever think about what a PostgreSQL database server does with your data when you
enter it?
This presentation takes you through the steps that happen to your data when you insert it.
You will see how PostgreSQL decides where to place your table, how it organizes a block of
data, and how to trace everything as the server does it’s job.
The mechanisms of how PostgreSQL stores information are explains on a high-level and
medium level, not mind numbing pages of C code.
And you will see how PostgreSQL tracks data as rows are updated. This is a general
introduction to PostgreSQL internals and you will actually learn some party tricks that you
can use to show off to impress others!
2

3PAGE
3
A lot of us are spoiled by modern technology
GUIs are wonderful, but they obscure many things from your view.

BTW the original query here is in English ‘What are the top five cities for revenue?’

4PAGE
4
But do you wonder what happens behind the scenes?

5PAGE
What this session is about
What happens when you create a database, a table, and enter data?
This is a gentle introduction to some of the PostgreSQL internals.
No previous experience needed
What the underlying ‘scaffolding’ looks likes
And, maybe, a brief introduction to transactions
5

6PAGE
6
Create a database, connect to it
$ createdb ato
Password:
$ psql ato
Password for user stoker:
psql (17.6 (Ubuntu 17.6-1.pgdg24.04+1))
Type "help" for help.

ato=#



Here we create a PostgreSQL database named ‘ato’

Then we connect to that database

7PAGE
7
Where does the server store the data?
ato=# show data_directory;
data_directory
-----------------------------
/var/lib/postgresql/17/main
(1 row)

ato=#



The data directory is where the database server ‘holds’
what it needs to store you data and its data.

This example is for a PostgreSQL Version 17 server.

8PAGE Let’s ask the catalog of databases where the newly created ‘ato’ is located
8
ato=# select oid, datname from pg_database;
oid | datname
-------+-----------
5 | postgres
16389 | demo
1 | template1
4 | template0
16511 | imdb
17092 | sakila
17178 | dvdrental
24689 | test
28000 | ato (remember this number!)
(9 rows)


Object Identifier, or oid

Every new object gets an oid

9PAGE
9
Where does the server store the data?
ato=# create table ex01 (a int);
CREATE TABLE

ato=# select attrelid, attname, atttypid
from pg_attribute where attrelid = 'ex01'::regclass;
attrelid | attname | atttypid
----------+----------+----------
28004 | tableoid | 26
28004 | cmax | 29
28004 | xmax | 28
28004 | cmin | 29
28004 | xmin | 28
28004 | ctid | 27
28004 | a | 23
(7 rows)

ato=#




‘Behind the scenes’ a lot of work
goes into setting up the metadata
for a table, even before we try to
enter data.

This ‘alphabet soup’ can be
confusing.
Remember 28004 too!

10PAGE
10
Where does the server store the data?
ato=# select pg_relation_filepath('ex01');
pg_relation_filepath
----------------------
base/28000/28004
(1 row)

ato=#




root@ThinkPad:/var/lib/postgresql/17/main# ls -la base/28000/28004
-rw------- 1 postgres postgres 0 Sep 3 15:13 base/28000/28004
root@ThinkPad:/var/lib/postgresql/17/main#


Size = 0 (empty)
ato=# select pg_relation_size('ex01');
pg_relation_size
------------------
0


Our new table is a file at
/var/lib/postgresql/17/main/base/28000/28004

11PAGE
11
Store data!
ato=# insert into ex01 (a) values (1),(3),(5),(7),(9),(11);
INSERT 0 6
ato=#






root@ThinkPad:/var/lib/postgresql/17/main# ls -la base/28000/28004
-rw------- 1 postgres postgres 8192 Sep 3 15:32 base/28000/28004
root@ThinkPad:/var/lib/postgresql/17/main#




Size - 8K
ato=# select pg_relation_size('ex01');
pg_relation_size
------------------
8192
(1 row)


Default is 8K blocks

12PAGE
Now you know
the where …
12
On to the ‘how’ the data is stored

13PAGE Maybe an low level dump will help visualize?
13
# od -x 28004
0000000 0006 0000 1390 2076 0000 0000 0030 1f40
0000020 2000 2004 0000 0000 9fe0 0038 9fc0 0038
0000040 9fa0 0038 9f80 0038 9f60 0038 9f40 0038
0000060 0000 0000 0000 0000 0000 0000 0000 0000
*
0017500 0516 0000 0000 0000 0000 0000 0000 0000
0017520 0006 0001 0800 0018 000 b 0000 0000 0000
0017540 0516 0000 0000 0000 0000 0000 0000 0000
0017560 0005 0001 0800 0018 000 9 0000 0000 0000
0017600 0516 0000 0000 0000 0000 0000 0000 0000
0017620 0004 0001 0800 0018 000 7 0000 0000 0000
0017640 0516 0000 0000 0000 0000 0000 0000 0000
0017660 0003 0001 0800 0018 000 5 0000 0000 0000
0017700 0516 0000 0000 0000 0000 0000 0000 0000
0017720 0002 0001 0800 0018 000 3 0000 0000 0000
0017740 0516 0000 0000 0000 0000 0000 0000 0000
0017760 0001 0001 0800 0018 000 1 0000 0000 0000
0020000


A hex dump shows that
the rows are layered from
the bottom up.

The top four lines are
header information.

14PAGE
14
We need the pageinspect extension to proceed further
ato=# CREATE EXTENSION pageinspect;
CREATE EXTENSION






The pageinspect extension provides low-level functions to inspect the
raw contents of database pages, such as tables and indexes.

15PAGE An Intro to CTID and rows
15
ato=# SELECT t_ctid,
tuple_data_split('ex01'::regclass, t_data, t_infomask, t_infomask2, t_bits) AS
raw_row_data
FROM heap_page_items(get_raw_page('ex01',0));
t_ctid | raw_row_data
--------+-----------------
(0,1) | {"\\x01000000"}
(0,2) | {"\\x03000000"}
(0,3) | {"\\x05000000"}
(0,4) | {"\\x07000000"}
(0,5) | {"\\x09000000"}
(0,6) | {"\\x0b000000"}
(6 rows)



Input data: Insert into ex01 (a)
values (1),(3),(5),(7),(9),(11);

1\\x01
3\\x03
5\\x05
7\\x07
9\\x09
11\\x0b 11 in hexidecimal (base 16) is ‘b’

16PAGE Let me introduce to you the ‘Lego Model’ of data storage
16
ato=# SELECT t_ctid,
tuple_data_split('ex01'::regclass, t_data, t_infomask, t_infomask2, t_bits)
AS raw_row_data
FROM heap_page_items(get_raw_page('ex01',0));
t_ctid | raw_row_data
--------+-----------------
(0,1) | {"\\x01000000"}
(0,2) | {"\\x03000000"}
(0,3) | {"\\x05000000"}
(0,4) | {"\\x07000000"}
(0,5) | {"\\x09000000"}
(0,6) | {"\\x0b000000"}
(6 rows)

ato=#


Consider the CTID as ‘Lego blocks’ (more on that later)

17PAGE A Quick Look At One Row
17
0516 0000 0000 0000 0000 0000 0000 0000
0001 0001 0800 0018 0001 0000 0000 0000

ato=# SELECT * FROM heap_page_items(get_raw_page('ex01',0)) where lp = 1;
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+------------
1 | 8160 | 1 | 28 | 1302 | 0 | 0 | (0,1) | 1 | 2304 | 24 | | | \x01000000
(1 row)


Not exactly obvious what all this stuff is!

18PAGE
The Heap
18
In PostgreSQL, the "heap"
refers to the primary
storage area for table
data. It is where the actual
rows (tuples) of a table are
stored on disk.

19PAGE At table creation the Heap is pretty empty, just before first data arrives!
19
Page Header 24 bits
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
Special

20PAGE The Header has information on the block of data
20
root@ThinkPad:/var/lib/postgresql/17/main/base/28000# od -N 24 -c 28004
0000000 006 \0 \0 \0 p 274 200 \0 \0 \0 \0 4 \0 037
0000020 \0 004 035 005 \0 \0
0000030


ato=# select * from page_header(get_raw_page('ex01',0));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
6/2080BC70 | 0 | 0 | 52 | 8000 | 8192 | 8192 | 4 | 1302
(1 row)


We can ignore most of now except for where
it tells us where that data begins and ends!

21PAGE At table creation the Heap is pretty empty, just before first data arrives!
21
Page Header 24 bits
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
Special

22PAGE First record needs pointer to data
22
Page Header 24 bits

FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
x01
Special
(0,1)

t_ctid | raw_row_data
--------+-----------------
(0,1) | {"\\x01000000"}

23PAGE Second record stacked on the first
23
Page Header 24 bits

FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
x03
x01
Special
(0,1) (0,2)

t_ctid | raw_row_data
--------+-----------------
(0,1) | {"\\x01000000"}
(0,2) | {"\\x03000000"}

24PAGE Third record stacked on the second
24
Page Header 24 bits

FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
x03
x01
Special
(0,1) (0,2) (0,3)
t_ctid | raw_row_data
--------+-----------------
(0,1) | {"\\x01000000"}
(0,2) | {"\\x03000000"}
(0,3) | {"\\x05000000"}


And so on and so forth, until free space within
the block runs out.

And then a new 8K block is created

25PAGE Tuple Identifiers
25
Page Header 24 bits

REMAINING FREE SPACE
x0b
x09
x07
x05
x03
x01
Special
(0,1) (0,2) (0,3) (0,4) (0,5) (0,6)

t_ctid | raw_row_data
--------+-----------------
(0,1) | {"\\x01000000"}
(0,2) | {"\\x03000000"}
(0,3) | {"\\x05000000"}
(0,4) | {"\\x07000000"}
(0,5) | {"\\x09000000"}
(0,6) | {"\\x0b000000"}






Tuple index
Page number

26PAGE The page header describes the heap and holds metadata
26
Page Header 24 bits

REMAINING FREE SPACE
x0b
x09
x07
x05
x03
x01
Special
(0,1) (0,2) (0,3) (0,4) (0,5) (0,6)
The Page Header is fixed 24-byte section at the beginning of the page
containing metadata, including:
○The latest Write-Ahead Log (WAL) entry.
○Offsets to the start and end of free space (pd_lower and
pd_upper) plus the start of special space (pd_special).
○Flag bits and a checksum.

27PAGE The header also hold line pointers that link to the CTIDs
27
Page Header 24 bits

REMAINING FREE SPACE
x0b
x09
x07
x05
x03
x01
Special
(0,1) (0,2) (0,3) (0,4) (0,5) (0,6)
(Line Pointers): An array of 4-byte entries that grows from the top of the
page towards the bottom.

Each item ID points to the location and length of a specific tuple (row
version) within the page.

Because these pointers never move, the ctid (a tuple's physical
location) can reliably reference a specific row version, even if the tuple
itself is moved within the page to compact free space.

t_ctid | raw_row_data
--------+-----------------
(0,1) | {"\\x01000000"}
(0,2) | {"\\x03000000"}
(0,3) | {"\\x05000000"}
(0,4) | {"\\x07000000"}
(0,5) | {"\\x09000000"}
(0,6) | {"\\x0b000000"}

28PAGE What does the row look like?
28
ato=# SELECT lp, lp_off, lp_len, t_ctid, t_data FROM
heap_page_items(get_raw_page('ex01', 0));
lp | lp_off | lp_len | t_ctid | t_data
----+--------+--------+--------+------------
1 | 8160 | 28 | (0,1) | \x01000000
2 | 8128 | 28 | (0,2) | \x03000000
3 | 8096 | 28 | (0,3) | \x05000000
4 | 8064 | 28 | (0,4) | \x07000000
5 | 8032 | 28 | (0,5) | \x09000000
6 | 8000 | 28 | (0,6) | \x0b000000
(6 rows)


●Lp_len is the length of the tuple (tuple size in bytes, including header and attributes)
●Lp_off is the byte offset of a row from beginning of page, count backwards from 8K

29PAGE
29
The ‘Lego’ blocks again
ato=# SELECT lp, lp_off, lp_len, t_ctid, t_data from
heap_page_items(get_raw_page('ex01', 0));
lp | lp_off | lp_len | t_ctid | t_data
----+--------+--------+--------+------------
1 | 8160 | 28 | (0,1) | \x01000000
2 | 8128 | 28 | (0,2) | \x03000000
3 | 8096 | 28 | (0,3) | \x05000000
4 | 8064 | 28 | (0,4) | \x07000000
5 | 8032 | 28 | (0,5) | \x09000000
6 | 8000 | 28 | (0,6) | \x0b 000000
(6 rows)

ato=#








The t_data is the integer entered earlier: 1,3,5,7,9,11

An INTEGER is 32 bits of data

8192 (8x block size) - 32 is 8160
8160 - 32 is 8128
8128 - 32 is 8096

So the first lego is 32 bytes long and starts at position
8160

The second lego starts at 8128, etc.

30PAGE
Now you know the
where & how ..
30
On to the ‘complicated stuff’

31PAGE How full is our heap?
31
ato=# create extension pgstattuple;
CREATE EXTENSION

ato=# \x
Expanded display is on.
ato=# select * from pgstattuple('ex01');
-[ RECORD 1 ]------+------
table_len | 8192
tuple_count | 6 – this is the number of rows entered
tuple_len | 168 - length of all the data
tuple_percent | 2.05 - Percentage of live tuples
dead_tuple_count | 0 - Explained later
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 7948 - How much space we have left
free_percent | 97.02

ato=#

32PAGE
60 seconds on
transaction
32

33PAGE MVCC is a Multi Version Concurrency Control . . .
33
Basically "A single rows can have multiple versions."
ato=# select lp, lp_off, t_ctid, t_data from
heap_page_items(get_raw_page('ex01',0));
lp | lp_off | t_ctid | t_data
----+--------+--------+------------
1 | 8160 | (0,1) | \x01000000
2 | 8128 | (0,2) | \x03000000
3 | 8096 | (0,3) | \x05000000
4 | 8064 | (0,4) | \x07000000
5 | 8032 | (0,5) | \x09000000
6 | 8000 | (0,6) | \x0b000000
(6 rows)


ato=# start transaction;
START TRANSACTION
ato=*# update ex01 set a = 2 where a = 1;
UPDATE 1
ato=*# select lp, lp_off, t_ctid, t_data from
heap_page_items(get_raw_page('ex01',0));
lp | lp_off | t_ctid | t_data
----+--------+--------+------------
1 | 8160 | (0,7) | \x01000000
2 | 8128 | (0,2) | \x03000000
3 | 8096 | (0,3) | \x05000000
4 | 8064 | (0,4) | \x07000000
5 | 8032 | (0,5) | \x09000000
6 | 8000 | (0,6) | \x0b000000
7 | 7968 | (0,7) | \x02000000
(7 rows)


Updating a row inside a transaction
does not REPLACE it!

34PAGE Let’s start a transaction and update the first row
34
ato=# start transaction;
START TRANSACTION
ato=*# update ex01 set a = 2 where a = 1;
UPDATE 1
ato=*# select lp, lp_off, t_ctid, t_data from
heap_page_items(get_raw_page('ex01',0));
lp | lp_off | t_ctid | t_data
----+--------+--------+------------
1 | 8160 | (0,7) | \x01000000
2 | 8128 | (0,2) | \x03000000
3 | 8096 | (0,3) | \x05000000
4 | 8064 | (0,4) | \x07000000
5 | 8032 | (0,5) | \x09000000
6 | 8000 | (0,6) | \x0b000000
7 | 7968 | (0,7) | \x02000000
(7 rows)


ato=*# select a from ex01;
a
----
3
5
7
9
11
2
(6 rows)


We do not see ‘1’ any more
from the command line.

But it is still in the heap
and marked as ‘dead’*.

35PAGE Now we get a whole bunch of new ‘system’ columns
35
ato=*# select a, ctid, xmin, xmax, cmin, cmax from ex01;
a | ctid | xmin | xmax | cmin | cmax
----+-------+------+------+------+------
3 | (0,2) | 1302 | 0 | 0 | 0
5 | (0,3) | 1302 | 0 | 0 | 0
7 | (0,4) | 1302 | 0 | 0 | 0
9 | (0,5) | 1302 | 0 | 0 | 0
11 | (0,6) | 1302 | 0 | 0 | 0
2 | (0,7) | 1309 | 0 | 0 | 0
(6 rows)


ato=*# select txid_current();
txid_current
--------------
1309
(1 row)


Xmin - the transaction id of the inserting transaction.
Note: the other rows were inserted on xmin 1302

36PAGE ROLLBACK - do not complete the transaction
36
ato=*# rollback;
ROLLBACK
ato=# select * from ex01;
a
----
1
3
5
7
9
11
(6 rows)




ato=# select lp, lp_off, t_ctid, t_data from
heap_page_items(get_raw_page('ex01',0));
lp | lp_off | t_ctid | t_data
----+--------+--------+------------
1 | 8160 | (0,7) | \x01000000
2 | 8128 | (0,2) | \x03000000
3 | 8096 | (0,3) | \x05000000
4 | 8064 | (0,4) | \x07000000
5 | 8032 | (0,5) | \x09000000
6 | 8000 | (0,6) | \x0b000000
7 | 7968 | (0,7) | \x02000000
(7 rows)
ato=# select txid_current();
txid_current
--------------
1310
(1 row)

37PAGE Transactions are great
37
Transactions are amazingly useful

But I can spend another hour on them

And I do not have that time

The good news is that transactions are very well documented

Look for xmin and xmax

And be warned, it gets DEEP quickly :-)

38PAGE
38
Show of hands? Do you feel like this?

39PAGE
Other Items
39
That you need to think about

40PAGE What happens when the 8K is filled?
40

When a PostgreSQL heap table needs more space, it allocates a new data page rather than
expanding the existing one. This process is managed automatically and is a fundamental aspect of
PostgreSQL's Multi-Version Concurrency Control (MVCC) architecture.
Here is a breakdown of what happens when a new data page is required:
●Page-based storage: PostgreSQL stores table data (the heap) in fixed-size blocks called
pages, which are typically 8 KB. New rows (tuples) are added to a page with available free
space.
●New page allocation: If an INSERT command tries to add a row to a table and all existing data
pages are full, PostgreSQL allocates and appends a new, empty 8 KB page to the end of the
table's data file. The new tuple is then written to this page.

41PAGE What happens if the data is bigger than 8k
41

PostgreSQL determines if TOAST (The Oversized-Attribute Storage Technique)
is needed when the total size of a row exceeds a certain threshold, typically 2 KB.
Since PostgreSQL has a fixed page size, usually 8 KB, it cannot store a single row
that is too large. TOAST handles this by compressing and/or moving large field
values to a separate storage area.
The decision-making process is a multi-step, automatic, and transparent process
that follows a specific logic.

42PAGE What happens if the data is bigger than 8k
42
The TOAST process
1.Triggering the process: The TOAST mechanism is triggered when a tuple (row) exceeds the
TOAST_TUPLE_THRESHOLD , which defaults to around 2 KB. This threshold ensures that multiple rows
can fit on a single 8 KB page, which is more efficient for the database.
2.Compression: Once the threshold is crossed, PostgreSQL first attempts to compress the largest
"TOAST-able" columns, such as text, jsonb, or bytea.
●If the compression is successful and reduces the row size below the threshold, the compressed
data is stored directly in the main table.
●Compression is only attempted if it is expected to provide a significant space saving.
3.Out-of-line storage: If compression is not enough or doesn't provide enough size reduction, PostgreSQL
moves the data for the large column(s) to a separate, associated TOAST table.
●A small 18-byte pointer is then stored in the main table in place of the original large data.
●The data is broken into chunks and stored as separate rows in the TOAST table, which has its own
unique index for fast retrieval.

43PAGE When Does Your Data Get ‘Written’?
43
When you commit a transaction, the sequence to write it to disk is

1.Write to the Write Ahead Log
2.Force WAL to disk using fsync
3.Return a successful commit to you application
4.Write the data pages to disk in the background

The WAL will have enough information to reconstruct any committed
transaction even if the data pages have not been written.

●Several transactions can be group together in the WAL
●Sequential WAL writes, not random data page writes
●You are dependent on the WAL but this provides predictable write times
based on WAL write speed, nit more complex page writes

44PAGE
Questions and

Hopefully answers!

Slide deck @ https://www.slideshare.net/davidmstokes
44

45PAGE
DBeaver PRO AI Capabilities Out of the Box
Supported AI providers:
oOpenAI
oAzure OpenAI
oGoogle Gemini
oOllama
oGitHub Copilot
DBeaver Lite
DBeaver Enterprise DBeaver Ultimate
● AI chat panel
● AI Query Explanation
● Explain and Fix SQL
Code
● Smart Metadata
Descriptions
● AI Query Suggestions
● AI Command
● AI chat panel
● AI Query Explanation
● Explain and Fix SQL
Code
● Smart Metadata
Descriptions
● AI Query Suggestions
● AI Command

● AI chat panel
● AI Query Explanation
● Explain and Fix SQL Code
● Smart Metadata
Descriptions
● AI Query Suggestions
● AI Command
CloudBeaver Enterprise
● AI Command
● AI Assistant

Team Edition
● AI chat panel
● AI Query Explanation
● Explain and Fix SQL Code
● Smart Metadata Descriptions
● AI Query Suggestions
● AI Command
● AI Assistant
DBeaver Lite
Two week free evals
45

46PAGE
46
Resources
https://duongludien.wordpress.com/2024/04/21/structure-of-heap-table-in-postgres/

https://dev.to/mongodb/embedding-into-jsonb-still-feels-like-a-join-for-large-documents-3nd0

https://postgrespro.com/community/books/internals

47PAGE


[email protected]
DBeaver.com
47
Thank you for
attending!!