Where Does Your Data Go When You Put In Into A Database_.pdf
davidmstokes
0 views
47 slides
Oct 14, 2025
Slide 1 of 47
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
About This Presentation
Do you ever think about what a PostgreSQL database server does with your data when you enter it?
This presentation takes you through the steps that happen to your data when you insert it.
You will see how PostgreSQL decides where to place your table, how it organizes a block of data, and how to tr...
Do you ever think about what a PostgreSQL database server does with your data when you enter it?
This presentation takes you through the steps that happen to your data when you insert it.
You will see how PostgreSQL decides where to place your table, how it organizes a block of data, and how to trace everything as the server does it’s job.
The mechanisms of how PostgreSQL stores information are explains on a high-level and medium level, not mind numbing pages of C code.
And you will see how PostgreSQL tracks data as rows are updated. This is a general introduction to PostgreSQL internals and you will actually learn some party tricks that you can use to show off to impress others!
Size: 1.75 MB
Language: en
Added: Oct 14, 2025
Slides: 47 pages
Slide Content
1PAGE
Where Does
Your Data Go
When You Put
In Into A
Database?
David Stokes
DBeaver
2PAGE
Description
Do you ever think about what a PostgreSQL database server does with your data when you
enter it?
This presentation takes you through the steps that happen to your data when you insert it.
You will see how PostgreSQL decides where to place your table, how it organizes a block of
data, and how to trace everything as the server does it’s job.
The mechanisms of how PostgreSQL stores information are explains on a high-level and
medium level, not mind numbing pages of C code.
And you will see how PostgreSQL tracks data as rows are updated. This is a general
introduction to PostgreSQL internals and you will actually learn some party tricks that you
can use to show off to impress others!
2
3PAGE
3
A lot of us are spoiled by modern technology
GUIs are wonderful, but they obscure many things from your view.
BTW the original query here is in English ‘What are the top five cities for revenue?’
4PAGE
4
But do you wonder what happens behind the scenes?
5PAGE
What this session is about
What happens when you create a database, a table, and enter data?
This is a gentle introduction to some of the PostgreSQL internals.
No previous experience needed
What the underlying ‘scaffolding’ looks likes
And, maybe, a brief introduction to transactions
5
6PAGE
6
Create a database, connect to it
$ createdb ato
Password:
$ psql ato
Password for user stoker:
psql (17.6 (Ubuntu 17.6-1.pgdg24.04+1))
Type "help" for help.
ato=#
Here we create a PostgreSQL database named ‘ato’
Then we connect to that database
7PAGE
7
Where does the server store the data?
ato=# show data_directory;
data_directory
-----------------------------
/var/lib/postgresql/17/main
(1 row)
ato=#
The data directory is where the database server ‘holds’
what it needs to store you data and its data.
This example is for a PostgreSQL Version 17 server.
8PAGE Let’s ask the catalog of databases where the newly created ‘ato’ is located
8
ato=# select oid, datname from pg_database;
oid | datname
-------+-----------
5 | postgres
16389 | demo
1 | template1
4 | template0
16511 | imdb
17092 | sakila
17178 | dvdrental
24689 | test
28000 | ato (remember this number!)
(9 rows)
Object Identifier, or oid
Every new object gets an oid
9PAGE
9
Where does the server store the data?
ato=# create table ex01 (a int);
CREATE TABLE
‘Behind the scenes’ a lot of work
goes into setting up the metadata
for a table, even before we try to
enter data.
This ‘alphabet soup’ can be
confusing.
Remember 28004 too!
10PAGE
10
Where does the server store the data?
ato=# select pg_relation_filepath('ex01');
pg_relation_filepath
----------------------
base/28000/28004
(1 row)
18PAGE
The Heap
18
In PostgreSQL, the "heap"
refers to the primary
storage area for table
data. It is where the actual
rows (tuples) of a table are
stored on disk.
19PAGE At table creation the Heap is pretty empty, just before first data arrives!
19
Page Header 24 bits
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
Special
20PAGE The Header has information on the block of data
20
root@ThinkPad:/var/lib/postgresql/17/main/base/28000# od -N 24 -c 28004
0000000 006 \0 \0 \0 p 274 200 \0 \0 \0 \0 4 \0 037
0000020 \0 004 035 005 \0 \0
0000030
We can ignore most of now except for where
it tells us where that data begins and ends!
21PAGE At table creation the Heap is pretty empty, just before first data arrives!
21
Page Header 24 bits
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
Special
22PAGE First record needs pointer to data
22
Page Header 24 bits
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
FREE SPACE
x01
Special
(0,1)
26PAGE The page header describes the heap and holds metadata
26
Page Header 24 bits
REMAINING FREE SPACE
x0b
x09
x07
x05
x03
x01
Special
(0,1) (0,2) (0,3) (0,4) (0,5) (0,6)
The Page Header is fixed 24-byte section at the beginning of the page
containing metadata, including:
○The latest Write-Ahead Log (WAL) entry.
○Offsets to the start and end of free space (pd_lower and
pd_upper) plus the start of special space (pd_special).
○Flag bits and a checksum.
27PAGE The header also hold line pointers that link to the CTIDs
27
Page Header 24 bits
REMAINING FREE SPACE
x0b
x09
x07
x05
x03
x01
Special
(0,1) (0,2) (0,3) (0,4) (0,5) (0,6)
(Line Pointers): An array of 4-byte entries that grows from the top of the
page towards the bottom.
Each item ID points to the location and length of a specific tuple (row
version) within the page.
Because these pointers never move, the ctid (a tuple's physical
location) can reliably reference a specific row version, even if the tuple
itself is moved within the page to compact free space.
●Lp_len is the length of the tuple (tuple size in bytes, including header and attributes)
●Lp_off is the byte offset of a row from beginning of page, count backwards from 8K
The t_data is the integer entered earlier: 1,3,5,7,9,11
An INTEGER is 32 bits of data
8192 (8x block size) - 32 is 8160
8160 - 32 is 8128
8128 - 32 is 8096
So the first lego is 32 bytes long and starts at position
8160
The second lego starts at 8128, etc.
30PAGE
Now you know the
where & how ..
30
On to the ‘complicated stuff’
31PAGE How full is our heap?
31
ato=# create extension pgstattuple;
CREATE EXTENSION
ato=# \x
Expanded display is on.
ato=# select * from pgstattuple('ex01');
-[ RECORD 1 ]------+------
table_len | 8192
tuple_count | 6 – this is the number of rows entered
tuple_len | 168 - length of all the data
tuple_percent | 2.05 - Percentage of live tuples
dead_tuple_count | 0 - Explained later
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 7948 - How much space we have left
free_percent | 97.02
ato=#
32PAGE
60 seconds on
transaction
32
33PAGE MVCC is a Multi Version Concurrency Control . . .
33
Basically "A single rows can have multiple versions."
ato=# select lp, lp_off, t_ctid, t_data from
heap_page_items(get_raw_page('ex01',0));
lp | lp_off | t_ctid | t_data
----+--------+--------+------------
1 | 8160 | (0,1) | \x01000000
2 | 8128 | (0,2) | \x03000000
3 | 8096 | (0,3) | \x05000000
4 | 8064 | (0,4) | \x07000000
5 | 8032 | (0,5) | \x09000000
6 | 8000 | (0,6) | \x0b000000
(6 rows)
37PAGE Transactions are great
37
Transactions are amazingly useful
But I can spend another hour on them
And I do not have that time
The good news is that transactions are very well documented
Look for xmin and xmax
And be warned, it gets DEEP quickly :-)
38PAGE
38
Show of hands? Do you feel like this?
39PAGE
Other Items
39
That you need to think about
40PAGE What happens when the 8K is filled?
40
When a PostgreSQL heap table needs more space, it allocates a new data page rather than
expanding the existing one. This process is managed automatically and is a fundamental aspect of
PostgreSQL's Multi-Version Concurrency Control (MVCC) architecture.
Here is a breakdown of what happens when a new data page is required:
●Page-based storage: PostgreSQL stores table data (the heap) in fixed-size blocks called
pages, which are typically 8 KB. New rows (tuples) are added to a page with available free
space.
●New page allocation: If an INSERT command tries to add a row to a table and all existing data
pages are full, PostgreSQL allocates and appends a new, empty 8 KB page to the end of the
table's data file. The new tuple is then written to this page.
41PAGE What happens if the data is bigger than 8k
41
PostgreSQL determines if TOAST (The Oversized-Attribute Storage Technique)
is needed when the total size of a row exceeds a certain threshold, typically 2 KB.
Since PostgreSQL has a fixed page size, usually 8 KB, it cannot store a single row
that is too large. TOAST handles this by compressing and/or moving large field
values to a separate storage area.
The decision-making process is a multi-step, automatic, and transparent process
that follows a specific logic.
42PAGE What happens if the data is bigger than 8k
42
The TOAST process
1.Triggering the process: The TOAST mechanism is triggered when a tuple (row) exceeds the
TOAST_TUPLE_THRESHOLD , which defaults to around 2 KB. This threshold ensures that multiple rows
can fit on a single 8 KB page, which is more efficient for the database.
2.Compression: Once the threshold is crossed, PostgreSQL first attempts to compress the largest
"TOAST-able" columns, such as text, jsonb, or bytea.
●If the compression is successful and reduces the row size below the threshold, the compressed
data is stored directly in the main table.
●Compression is only attempted if it is expected to provide a significant space saving.
3.Out-of-line storage: If compression is not enough or doesn't provide enough size reduction, PostgreSQL
moves the data for the large column(s) to a separate, associated TOAST table.
●A small 18-byte pointer is then stored in the main table in place of the original large data.
●The data is broken into chunks and stored as separate rows in the TOAST table, which has its own
unique index for fast retrieval.
43PAGE When Does Your Data Get ‘Written’?
43
When you commit a transaction, the sequence to write it to disk is
1.Write to the Write Ahead Log
2.Force WAL to disk using fsync
3.Return a successful commit to you application
4.Write the data pages to disk in the background
The WAL will have enough information to reconstruct any committed
transaction even if the data pages have not been written.
●Several transactions can be group together in the WAL
●Sequential WAL writes, not random data page writes
●You are dependent on the WAL but this provides predictable write times
based on WAL write speed, nit more complex page writes
45PAGE
DBeaver PRO AI Capabilities Out of the Box
Supported AI providers:
oOpenAI
oAzure OpenAI
oGoogle Gemini
oOllama
oGitHub Copilot
DBeaver Lite
DBeaver Enterprise DBeaver Ultimate
● AI chat panel
● AI Query Explanation
● Explain and Fix SQL
Code
● Smart Metadata
Descriptions
● AI Query Suggestions
● AI Command
● AI chat panel
● AI Query Explanation
● Explain and Fix SQL
Code
● Smart Metadata
Descriptions
● AI Query Suggestions
● AI Command
● AI chat panel
● AI Query Explanation
● Explain and Fix SQL Code
● Smart Metadata
Descriptions
● AI Query Suggestions
● AI Command
CloudBeaver Enterprise
● AI Command
● AI Assistant
Team Edition
● AI chat panel
● AI Query Explanation
● Explain and Fix SQL Code
● Smart Metadata Descriptions
● AI Query Suggestions
● AI Command
● AI Assistant
DBeaver Lite
Two week free evals
45