HOT Understanding this important update optimization

GrantMcAlister 1,040 views 86 slides Oct 24, 2018
Slide 1
Slide 1 of 86
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86

About This Presentation

In this session we dive deep into HOT (Heap Only Tuple) update optimization. Utilizing this optimization can result in improved writes rates, less index bloat and reduced vacuum effort but to enable PostgreSQL to use this optimization may require changing your application design and database settin...


Slide Content

Grant McAlister – Senior Principal Engineer – Amazon RDS Oct 2018 HOT UNDERSTANDING THIS IMPORTANT UPDATE OPTIMIZATION

What is HOT - Heap Only Tuples The Heap Only Tuple (HOT) feature eliminates redundant index entries and allows the re-use of space taken by DELETEd or obsoleted UPDATEd tuples without performing a table-wide vacuum. It does this by allowing single-page vacuuming, also called "defragmentation". Full description - src /backend/access/heap/README.HOT

Regular Update heap 1 2 lp index leaf index A index B index C tuple v1 block0 block1

Regular Update heap 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2

pageinspect – one row inserted postgres =# SELECT * FROM page_header ( get_raw_page ('benchmark_uuid2',0)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid -----------+----------+-------+-------+-------+---------+----------+---------+----------- 0/15E6380 | 0 | 0 | 28 | 8008 | 8192 | 8192 | 4 | 0 (1 row) postgres=# select lp, lp_off, case when lp_flags = 0 THEN 'LP_UNUSED' when lp_flags=1 THEN 'LP_NORMAL' when lp_flags=2 THEN 'LP_REDIRECT' when lp_flags=3 THEN 'LP_DEAD' END as lp_flags, lp_len, t_xmin, t_xmax, t_ctid,infomask(t_infomask, 1) as infomask,infomask(t_infomask2, 2) as infomask2 from heap_page_items(get_raw_page('benchmark_uuid2', 0)); lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+--------+-----------+--------+--------+--------+--------+--------------------------+----------- 1 | 8008 | LP_NORMAL | 184 | 1861 | 0 | (0,1) | XMAX_INVALID|HASVARWIDTH | (1 row) postgres=# SELECT * FROM bt_page_items('i_benchmark_uuid2_id', 1); itemoffset | ctid | itemlen | nulls | vars | data ------------+-------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 04 00 00 00 00 00 00 00 (1 row)

pageinspect – regular update postgres =# update benchmark_uuid2 set last_updated = now() where id=4; UPDATE 1 postgres =# SELECT * FROM page_header ( get_raw_page ('benchmark_uuid2',0)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid -----------+----------+-------+-------+-------+---------+----------+---------+----------- 0/15E67A8 | 0 | 0 | 32 | 7824 | 8192 | 8192 | 4 | 1862 (1 row) postgres=# select lp, lp_off, case when lp_flags = 0 THEN 'LP_UNUSED' when lp_flags=1 THEN 'LP_NORMAL' when lp_flags=2 THEN 'LP_REDIRECT' when lp_flags=3 THEN 'LP_DEAD' END as lp_flags, lp_len, t_xmin, t_xmax, t_ctid,infomask(t_infomask, 1) as infomask,infomask(t_infomask2, 2) as infomask2 from heap_page_items(get_raw_page('benchmark_uuid2', 0)); lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+--------+-----------+--------+--------+--------+--------+----------------------------------+----------- 1 | 8008 | LP_NORMAL | 184 | 1861 | 1862 | (0,2) | XMIN_COMMITTED|HASVARWIDTH | 2 | 7824 | LP_NORMAL | 184 | 1862 | 0 | (0,2) | UPDATED|XMAX_INVALID|HASVARWIDTH | (2 rows) postgres=# SELECT * FROM bt_page_items('i_benchmark_uuid2_id', 1); itemoffset | ctid | itemlen | nulls | vars | data ------------+-------+---------+-------+------+------------------------- 1 | (0,2) | 16 | f | f | 04 00 00 00 00 00 00 00 2 | (0,1) | 16 | f | f | 04 00 00 00 00 00 00 00 (2 rows)

HOT Update heap 1 lp index leaf index A index B index C tuple v1 block0 block1

HOT Update heap 1 lp index leaf index A index B index C tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple tuple tuple v2

HOT Update heap 1 lp index leaf index A index B index C tuple v1 block0 block1

HOT Update heap 1 lp index leaf index A index B index C tuple v1 block0 block1

HOT Update heap 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2

HOT Update heap 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2

HOT Update heap 3 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2 tuple v3

HOT Update heap 3 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2 tuple v3

HOT Update heap 3 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2 tuple v3

HOT Update heap 3 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2 tuple v3

HOT Update heap 3 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2 tuple v3

pageinspect – first hot update postgres =# update benchmark_uuid2 set e=cast(0 as boolean ) where id = 5; UPDATE 1 postgres =# SELECT * FROM page_header ( get_raw_page ('benchmark_uuid2',0)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid -----------+----------+-------+-------+-------+---------+----------+---------+----------- 0/15EC1A0 | 0 | 0 | 32 | 7824 | 8192 | 8192 | 4 | 1865 (1 row) postgres=# select lp, lp_off, case when lp_flags = 0 THEN 'LP_UNUSED' when lp_flags=1 THEN 'LP_NORMAL' when lp_flags=2 THEN 'LP_REDIRECT' when lp_flags=3 THEN 'LP_DEAD' END as lp_flags, lp_len, t_xmin, t_xmax, t_ctid,infomask(t_infomask, 1) as infomask,infomask(t_infomask2, 2) as infomask2 from heap_page_items(get_raw_page('benchmark_uuid2', 0)); lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+--------+-----------+--------+--------+--------+--------+----------------------------------+----------------- 1 | 8008 | LP_NORMAL | 184 | 1864 | 1865 | (0,2) | XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED 2 | 7824 | LP_NORMAL | 184 | 1865 | 0 | (0,2) | UPDATED|XMAX_INVALID|HASVARWIDTH | HEAP_ONLY_TUPLE (2 rows) postgres=# SELECT * FROM bt_page_items('i_benchmark_uuid2_id', 1); itemoffset | ctid | itemlen | nulls | vars | data ------------+-------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 05 00 00 00 00 00 00 00 (1 row)

pageinspect – second hot update postgres =# update benchmark_uuid2 set e=cast(1 as boolean ) where id = 5; UPDATE 1 postgres =# SELECT * FROM page_header ( get_raw_page ('benchmark_uuid2',0)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid -----------+----------+-------+-------+-------+---------+----------+---------+----------- 0/15EC4C8 | 0 | 0 | 36 | 7640 | 8192 | 8192 | 4 | 1865 (1 row) postgres =# select lp , lp_off , case when lp_flags = 0 THEN 'LP_UNUSED' when lp_flags =1 THEN 'LP_NORMAL' when lp_flags =2 THEN 'LP_REDIRECT' when lp_flags =3 THEN 'LP_DEAD' END as lp_flags , lp_len , t_xmin , t_xmax , t_ctid,infomask ( t_infomask , 1) as infomask,infomask (t_infomask2, 2) as infomask2 from heap_page_items ( get_raw_page ('benchmark_uuid2', 0)); lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+--------+-----------+--------+--------+--------+--------+-------------------------------------------+----------------------------- 1 | 8008 | LP_NORMAL | 184 | 1864 | 1865 | (0,2) | XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED 2 | 7824 | LP_NORMAL | 184 | 1865 | 1866 | (0,3) | UPDATED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 3 | 7640 | LP_NORMAL | 184 | 1866 | 0 | (0,3) | UPDATED|XMAX_INVALID|HASVARWIDTH | HEAP_ONLY_TUPLE (3 rows) postgres =# SELECT * FROM bt_page_items ('i_benchmark_uuid2_id', 1); itemoffset | ctid | itemlen | nulls | vars | data ------------+-------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 05 00 00 00 00 00 00 00 (1 row)

HOT Update - Pruning heap 3 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2 tuple v3

HOT Update - Pruning heap 3 1 2 lp index leaf index A index B index C block0 block1 tuple v2 tuple v3 X

HOT Update - Pruning heap 3 1 2 lp index leaf index A index B index C block0 block1 tuple v2 tuple v3

HOT Update - Pruning heap 3 1 2 lp index leaf index A index B index C block0 block1 tuple v3

pageinspect – pruning (almost full page) postgres =# select lp , lp_off , case when lp_flags = 0 THEN 'LP_UNUSED' when lp_flags =1 THEN 'LP_NORMAL' when lp_flags =2 THEN 'LP_REDIRECT' when lp_flags =3 THEN 'LP_DEAD' END as lp_flags , lp_len , t_xmin , t_xmax , t_ctid,infomask ( t_infomask , 1) as infomask,infomask (t_infomask2, 2) as infomask2 from heap_page_items ( get_raw_page ('benchmark_uuid2', 0)); lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+--------+-----------+--------+--------+--------+--------+---------------------------------------------------+----------------------------- 1 | 8008 | LP_NORMAL | 184 | 1864 | 1865 | (0,2) | XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED 2 | 7824 | LP_NORMAL | 184 | 1865 | 1866 | (0,3) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 3 | 7640 | LP_NORMAL | 184 | 1866 | 1867 | (0,4) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 4 | 7456 | LP_NORMAL | 184 | 1867 | 1868 | (0,5) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 5 | 7272 | LP_NORMAL | 184 | 1868 | 1869 | (0,6) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 6 | 7088 | LP_NORMAL | 184 | 1869 | 1870 | (0,7) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 7 | 6904 | LP_NORMAL | 184 | 1870 | 1871 | (0,8) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 8 | 6720 | LP_NORMAL | 184 | 1871 | 1872 | (0,9) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 9 | 6536 | LP_NORMAL | 184 | 1872 | 1873 | (0,10) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 10 | 6352 | LP_NORMAL | 184 | 1873 | 1874 | (0,11) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 11 | 6168 | LP_NORMAL | 184 | 1874 | 1875 | (0,12) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 12 | 5984 | LP_NORMAL | 184 | 1875 | 1876 | (0,13) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 13 | 5800 | LP_NORMAL | 184 | 1876 | 1877 | (0,14) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE ... 28 | 3040 | LP_NORMAL | 184 | 1891 | 1892 | (0,29) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 29 | 2856 | LP_NORMAL | 184 | 1892 | 1893 | (0,30) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 30 | 2672 | LP_NORMAL | 184 | 1893 | 1894 | (0,31) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 31 | 2488 | LP_NORMAL | 184 | 1894 | 1895 | (0,32) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 32 | 2304 | LP_NORMAL | 184 | 1895 | 1896 | (0,33) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 33 | 2120 | LP_NORMAL | 184 | 1896 | 1897 | (0,34) | UPDATED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 34 | 1936 | LP_NORMAL | 184 | 1897 | 0 | (0,34) | UPDATED|XMAX_INVALID|HASVARWIDTH | HEAP_ONLY_TUPLE (34 rows)

pageinspect – pruned lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+--------+-------------+--------+--------+--------+--------+---------------------------------------------------+----------------------------- 1 | 40 | LP_REDIRECT | 0 | | | | | 2 | 7824 | LP_NORMAL | 184 | 1904 | 1905 | (0,3) | UPDATED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 3 | 7640 | LP_NORMAL | 184 | 1905 | 0 | (0,3) | UPDATED|XMAX_INVALID|HASVARWIDTH | HEAP_ONLY_TUPLE 4 | 0 | LP_UNUSED | 0 | | | | | 5 | 0 | LP_UNUSED | 0 | | | | | 6 | 0 | LP_UNUSED | 0 | | | | | 7 | 0 | LP_UNUSED | 0 | | | | | 8 | 0 | LP_UNUSED | 0 | | | | | 9 | 0 | LP_UNUSED | 0 | | | | | 10 | 0 | LP_UNUSED | 0 | | | | | 11 | 0 | LP_UNUSED | 0 | | | | | 12 | 0 | LP_UNUSED | 0 | | | | | 13 | 0 | LP_UNUSED | 0 | | | | | ... 28 | 0 | LP_UNUSED | 0 | | | | | 29 | 0 | LP_UNUSED | 0 | | | | | 30 | 0 | LP_UNUSED | 0 | | | | | 31 | 0 | LP_UNUSED | 0 | | | | | 32 | 0 | LP_UNUSED | 0 | | | | | 33 | 0 | LP_UNUSED | 0 | | | | | 34 | 0 | LP_UNUSED | 0 | | | | | 35 | 0 | LP_UNUSED | 0 | | | | | 36 | 0 | LP_UNUSED | 0 | | | | | 37 | 0 | LP_UNUSED | 0 | | | | | 38 | 0 | LP_UNUSED | 0 | | | | | 39 | 0 | LP_UNUSED | 0 | | | | | 40 | 8008 | LP_NORMAL | 184 | 1903 | 1904 | (0,2) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE (40 rows)

Fillfactor heap tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple tuple block2 fillfactor =100 (default) Continuous inserts and update the same single tuple 100 times

Fillfactor heap tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple tuple tuple v2 block2 tuple tuple tuple tuple tuple tuple tuple tuple fillfactor =100 (default) Continuous inserts and update the same single tuple 100 times

Fillfactor heap tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple tuple tuple v2 tuple v3 block2 tuple tuple tuple tuple tuple tuple tuple tuple fillfactor =100 (default) Continuous inserts and update the same single tuple 100 times

Fillfactor heap tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple block2 fillfactor =90 Continuous inserts and update the same single tuple 100 times

Fillfactor heap tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple block2 fillfactor =90 tuple v2 Continuous inserts and update the same single tuple 100 times

Fillfactor heap tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple block2 tuple tuple tuple tuple tuple tuple tuple fillfactor =90 tuple v2 tuple v3 tuple Continuous inserts and update the same single tuple 100 times

Fillfactor heap tuple v1 block0 block1 tuple block2 tuple fillfactor =10 tuple tuple tuple Continuous inserts and update the same single tuple 100 times

Fillfactor heap tuple v1 block0 block1 tuple block2 tuple fillfactor =10 tuple tuple v2 tuple tuple Continuous inserts and update the same single tuple 100 times

Fillfactor heap tuple v1 block0 block1 tuple block2 tuple fillfactor =10 tuple tuple v2 tuple v3 tuple tuple Continuous inserts and update the same single tuple 100 times

Fillfactor on Insert & Single Update Workload Insert at 2K TPS and update one row 100 times while having a long running transaction Fillfactor Single Key Fetch Table Scan 100 90 5 10 101 blocks 18 blocks 5 blocks 3 blocks 5.5K blocks 6K blocks 11K blocks 60K blocks Bloat comes in many disguises

Fillfactor on Insert & Single Update Workload Insert at 2K TPS and update one row 100 times while having a long running transaction Fillfactor Single Key Fetch Table Scan 100 90 5 10 5.5K blocks 6K blocks 11K blocks 60K blocks 2 blocks 1 block 1 block Bloat comes in many disguises

Fillfactor on Insert & Single Update Workload Insert at 2K TPS and update one row 100 times while having a long running transaction Fillfactor Single Key Fetch Table Scan 100 90 5 10 5.5K blocks 6K blocks 11K blocks 60K blocks 2 blocks 1 block 1 block 1 block Bloat comes in many disguises

Measuring Longest Running Transaction postgres =# select max(now() - xact_start ) from pg_stat_activity ; max ----------------- 02:02:48.021408 2+ hours

One the fly cleanup - HEAP heap 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2

One the fly cleanup - HEAP heap 1 2 lp index leaf index A index B index C block0 block1 tuple v2

One the fly cleanup - HEAP heap 1 2 lp index leaf index A index B index C block0 block1 tuple v2

On the fly Heap Tuple Pruning – Full Table lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+--------+-----------+--------+---------+---------+--------+---------------------------------------------------+----------- 1 | 7976 | LP_NORMAL | 216 | 3265456 | 3265457 | (0,2) | XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 2 | 7760 | LP_NORMAL | 216 | 3265457 | 3265458 | (0,3) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 3 | 7544 | LP_NORMAL | 216 | 3265458 | 3265459 | (0,4) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 4 | 7328 | LP_NORMAL | 216 | 3265459 | 3265460 | (0,5) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 5 | 7112 | LP_NORMAL | 216 | 3265460 | 3265461 | (0,6) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 6 | 6896 | LP_NORMAL | 216 | 3265461 | 3265462 | (0,7) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 7 | 6680 | LP_NORMAL | 216 | 3265462 | 3265463 | (0,8) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 8 | 6464 | LP_NORMAL | 216 | 3265463 | 3265464 | (0,9) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 9 | 6248 | LP_NORMAL | 216 | 3265464 | 3265465 | (0,10) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 10 | 6032 | LP_NORMAL | 216 | 3265465 | 3265466 | (0,11) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 11 | 5816 | LP_NORMAL | 216 | 3265466 | 3265467 | (0,12) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 12 | 5600 | LP_NORMAL | 216 | 3265467 | 3265468 | (0,13) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 13 | 5384 | LP_NORMAL | 216 | 3265468 | 3265469 | (0,14) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 14 | 5168 | LP_NORMAL | 216 | 3265469 | 3265470 | (0,15) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 15 | 4952 | LP_NORMAL | 216 | 3265470 | 3265471 | (0,16) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 16 | 4736 | LP_NORMAL | 216 | 3265471 | 3265472 | (0,17) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 17 | 4520 | LP_NORMAL | 216 | 3265472 | 3265473 | (0,18) | UPDATED|XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | 18 | 4304 | LP_NORMAL | 216 | 3265473 | 3265474 | (0,19) | UPDATED|XMIN_COMMITTED|HASVARWIDTH | 19 | 4088 | LP_NORMAL | 216 | 3265474 | 0 | (0,19) | UPDATED|XMAX_INVALID|HASVARWIDTH | (19 rows)

On the fly Heap Tuple Pruning – Pruned Table lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+--------+-----------+--------+---------+---------+--------+------------------------------------+----------- 1 | 0 | LP_DEAD | 0 | | | | | 2 | 0 | LP_DEAD | 0 | | | | | 3 | 0 | LP_DEAD | 0 | | | | | 4 | 0 | LP_DEAD | 0 | | | | | 5 | 0 | LP_DEAD | 0 | | | | | 6 | 0 | LP_DEAD | 0 | | | | | 7 | 0 | LP_DEAD | 0 | | | | | 8 | 0 | LP_DEAD | 0 | | | | | 9 | 0 | LP_DEAD | 0 | | | | | 10 | 0 | LP_DEAD | 0 | | | | | 11 | 0 | LP_DEAD | 0 | | | | | 12 | 0 | LP_DEAD | 0 | | | | | 13 | 0 | LP_DEAD | 0 | | | | | 14 | 0 | LP_DEAD | 0 | | | | | 15 | 0 | LP_DEAD | 0 | | | | | 16 | 0 | LP_DEAD | 0 | | | | | 17 | 0 | LP_DEAD | 0 | | | | | 18 | 0 | LP_DEAD | 0 | | | | | 19 | 7976 | LP_NORMAL | 216 | 3265474 | 3265475 | (0,20) | UPDATED|XMIN_COMMITTED|HASVARWIDTH | 20 | 7760 | LP_NORMAL | 216 | 3265475 | 0 | (0,20) | UPDATED|XMAX_INVALID|HASVARWIDTH | (20 rows)

On the fly Heap Tuple Pruning – Index Not! itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,20) | 16 | f | f | b7 ab 03 00 00 00 00 00 2 | (0,19) | 16 | f | f | b7 ab 03 00 00 00 00 00 3 | (0,18) | 16 | f | f | b7 ab 03 00 00 00 00 00 4 | (0,17) | 16 | f | f | b7 ab 03 00 00 00 00 00 5 | (0,16) | 16 | f | f | b7 ab 03 00 00 00 00 00 6 | (0,15) | 16 | f | f | b7 ab 03 00 00 00 00 00 7 | (0,14) | 16 | f | f | b7 ab 03 00 00 00 00 00 8 | (0,13) | 16 | f | f | b7 ab 03 00 00 00 00 00 9 | (0,12) | 16 | f | f | b7 ab 03 00 00 00 00 00 10 | (0,11) | 16 | f | f | b7 ab 03 00 00 00 00 00 11 | (0,10) | 16 | f | f | b7 ab 03 00 00 00 00 00 12 | (0,9) | 16 | f | f | b7 ab 03 00 00 00 00 00 13 | (0,8) | 16 | f | f | b7 ab 03 00 00 00 00 00 14 | (0,7) | 16 | f | f | b7 ab 03 00 00 00 00 00 15 | (0,6) | 16 | f | f | b7 ab 03 00 00 00 00 00 16 | (0,5) | 16 | f | f | b7 ab 03 00 00 00 00 00 17 | (0,4) | 16 | f | f | b7 ab 03 00 00 00 00 00 18 | (0,3) | 16 | f | f | b7 ab 03 00 00 00 00 00 19 | (0,2) | 16 | f | f | b7 ab 03 00 00 00 00 00 20 | (0,1) | 16 | f | f | b7 ab 03 00 00 00 00 00 (20 rows)

One the fly cleanup - Index heap 1 2 lp index leaf index A index B index C block0 block1 tuple v2

One the fly cleanup - Index heap 1 2 lp index leaf index A index B index C block0 block1 tuple v2

Index on fly cleanup – almost full page itemoffset | ctid | itemlen | nulls | vars | data ------------+---------+---------+-------+------+------------------------- 1 | (1,116) | 16 | f | f | b5 ab 03 00 00 00 00 00 2 | (1,115) | 16 | f | f | b5 ab 03 00 00 00 00 00 3 | (1,114) | 16 | f | f | b5 ab 03 00 00 00 00 00 4 | (1,113) | 16 | f | f | b5 ab 03 00 00 00 00 00 5 | (1,112) | 16 | f | f | b5 ab 03 00 00 00 00 00 6 | (1,111) | 16 | f | f | b5 ab 03 00 00 00 00 00 7 | (1,110) | 16 | f | f | b5 ab 03 00 00 00 00 00 8 | (1,109) | 16 | f | f | b5 ab 03 00 00 00 00 00 9 | (1,108) | 16 | f | f | b5 ab 03 00 00 00 00 00 10 | (1,107) | 16 | f | f | b5 ab 03 00 00 00 00 00 11 | (1,106) | 16 | f | f | b5 ab 03 00 00 00 00 00 12 | (1,105) | 16 | f | f | b5 ab 03 00 00 00 00 00 … 397 | (0,11) | 16 | f | f | b5 ab 03 00 00 00 00 00 398 | (0,10) | 16 | f | f | b5 ab 03 00 00 00 00 00 399 | (0,9) | 16 | f | f | b5 ab 03 00 00 00 00 00 400 | (0,8) | 16 | f | f | b5 ab 03 00 00 00 00 00 401 | (0,7) | 16 | f | f | b5 ab 03 00 00 00 00 00 402 | (0,6) | 16 | f | f | b5 ab 03 00 00 00 00 00 403 | (0,5) | 16 | f | f | b5 ab 03 00 00 00 00 00 404 | (0,4) | 16 | f | f | b5 ab 03 00 00 00 00 00 405 | (0,3) | 16 | f | f | b5 ab 03 00 00 00 00 00 406 | (0,2) | 16 | f | f | b5 ab 03 00 00 00 00 00 407 | (0,1) | 16 | f | f | b5 ab 03 00 00 00 00 00 (407 rows)

Index on fly cleanup – cleaned up itemoffset | ctid | itemlen | nulls | vars | data ------------+---------+---------+-------+------+------------------------- 1 | (1,117) | 16 | f | f | b5 ab 03 00 00 00 00 00 2 | (1,116) | 16 | f | f | b5 ab 03 00 00 00 00 00 (2 rows)

Vacuum heap 1 2 lp index leaf index A index B index C block0 block1

Vacuum heap 1 2 lp index leaf index A index B index C block0 block1 Scan to find id’s

Vacuum heap 1 2 lp index leaf index A index B index C block0 block1

Vacuum heap 1 2 lp index leaf index A index B index C block0 block1

Vacuum heap 1 2 lp index leaf index A index B index C block0 block1

Vacuum heap 1 2 lp index leaf index A index B index C block0 block1

Vacuum heap 1 2 lp index leaf index A index B index C block0 block1

Vacuum heap 1 2 lp index leaf index A index B index C block0 block1

Vacuum heap 1 2 lp index leaf index A index B index C block0 block1

Vacuum Block Cleanup – Lots of DEAD LP’s lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 -----+--------+----------+--------+--------+--------+--------+----------+----------- 1 | 0 | LP_DEAD | 0 | | | | | 2 | 0 | LP_DEAD | 0 | | | | | 3 | 0 | LP_DEAD | 0 | | | | | 4 | 0 | LP_DEAD | 0 | | | | | 5 | 0 | LP_DEAD | 0 | | | | | 6 | 0 | LP_DEAD | 0 | | | | | 7 | 0 | LP_DEAD | 0 | | | | | 8 | 0 | LP_DEAD | 0 | | | | | 9 | 0 | LP_DEAD | 0 | | | | | 10 | 0 | LP_DEAD | 0 | | | | | ... 281 | 0 | LP_DEAD | 0 | | | | | 282 | 0 | LP_DEAD | 0 | | | | | 283 | 0 | LP_DEAD | 0 | | | | | 284 | 0 | LP_DEAD | 0 | | | | | 285 | 0 | LP_DEAD | 0 | | | | | 286 | 0 | LP_DEAD | 0 | | | | | 287 | 0 | LP_DEAD | 0 | | | | | 288 | 0 | LP_DEAD | 0 | | | | | 289 | 0 | LP_DEAD | 0 | | | | | 290 | 0 | LP_DEAD | 0 | | | | | 291 | 0 | LP_DEAD | 0 | | | | | (291 rows)

Vacuum Block Cleanup – Unused LP’s lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 -----+--------+-----------+--------+--------+--------+--------+----------+----------- 1 | 0 | LP_UNUSED | 0 | | | | | 2 | 0 | LP_UNUSED | 0 | | | | | 3 | 0 | LP_UNUSED | 0 | | | | | 4 | 0 | LP_UNUSED | 0 | | | | | 5 | 0 | LP_UNUSED | 0 | | | | | 6 | 0 | LP_UNUSED | 0 | | | | | 7 | 0 | LP_UNUSED | 0 | | | | | 8 | 0 | LP_UNUSED | 0 | | | | | 9 | 0 | LP_UNUSED | 0 | | | | | 10 | 0 | LP_UNUSED | 0 | | | | | ... 281 | 0 | LP_UNUSED | 0 | | | | | 282 | 0 | LP_UNUSED | 0 | | | | | 283 | 0 | LP_UNUSED | 0 | | | | | 284 | 0 | LP_UNUSED | 0 | | | | | 285 | 0 | LP_UNUSED | 0 | | | | | 286 | 0 | LP_UNUSED | 0 | | | | | 287 | 0 | LP_UNUSED | 0 | | | | | 288 | 0 | LP_UNUSED | 0 | | | | | 289 | 0 | LP_UNUSED | 0 | | | | | 290 | 0 | LP_UNUSED | 0 | | | | | 291 | 0 | LP_UNUSED | 0 | | | | | (291 rows)

Is that an update? ORM cases ORM and other software sometimes update all columns in table including all indexed columns UPDATE to NEW VALUE postgres=# update benchmark_uuid2 set last_updated = now() where id=2; UPDATE 1 postgres=# select n_tup_upd, n_tup_hot_upd from pg_stat_all_tables where relname = 'benchmark_uuid2'; n_tup_upd | n_tup_hot_upd -----------+--------------- 1 | 0 UPDATE to SAME VALUE (i.e. ORM case) postgres=# update benchmark_uuid2 set last_updated = ( select last_updated benchmark_uuid2 where id=2) where id=2; UPDATE 1 postgres=# select n_tup_upd, n_tup_hot_upd from pg_stat_all_tables where relname = 'benchmark_uuid2'; n_tup_upd | n_tup_hot_upd -----------+--------------- 2 | 1 (1 row) !=

Index Tests – A table with 2 to 64 indexes Create the table with a PK, 64 Random int Columns and 1 last updated timestamp create table benchmark_serial ( pk serial constraint pk_benchmark_serial_pk PRIMARY KEY, a1 int not null , ….. a64 int not null, last_updated timestamp ); Create between 2 and 64 indexes on the random int columns create index i_benchmark_serial_a1 on benchmark_serial (a1 ); …. create index i_benchmark_serial_a64 on benchmark_serial (a64); For the Regular Test create index i_benchmark_serial_lu on benchmark_serial ( last_updated );

Full Page Writes Block in Memory PostgreSQL update t set y = 6; Full Block WAL

Full Page Writes Block in Memory PostgreSQL update t set y = 6; Full Block WAL

Full Page Writes Block in Memory PostgreSQL update t set y = 6; Checkpoint Datafile Full Block WAL Archive

Full Page Writes Block in Memory PostgreSQL update t set y = 6; Checkpoint Datafile Full Block WAL Archive 4K 4K 8K

Full Page Writes Block in Memory PostgreSQL update t set y = 6; Checkpoint Datafile Full Block WAL Archive 4K 4K 8K During crash recovery PostgreSQL uses the FPW block in the WAL to replace the bad checkpointed block

1-200 1-100 101-200 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 Insert a Sequence Number into a B-tree

1-200 1-100 101-200 Insert 201 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 1-201 101-201 151-201 176-201 4 blocks loaded Insert a Sequence Number into a B-tree

1-200 1-100 101-200 Insert 201 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 Insert 202 1-201 101-201 151-201 176-201 1-202 101-202 151-202 176-202 4 blocks loaded Insert a Sequence Number into a B-tree 0 blocks loaded

1-200 1-100 101-200 Insert 201 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 Insert 202 1-201 101-201 151-201 176-201 1-202 101-202 151-202 176-202 4 blocks loaded Insert a Sequence Number into a B-tree 0 blocks loaded At least 1 FPW

1-200 1-100 101-200 Insert 124 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 1-200 101-200 101-150 101-125 4 blocks loaded Insert a Random value into a B-tree

1-200 1-100 101-200 Insert 124 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 Insert 99 1-200 101-200 101-150 101-125 4 blocks loaded 1-100 51-100 76-100 3 blocks loaded Insert a Random value into a B-tree

1-200 1-100 101-200 Insert 124 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 Insert 99 1-200 101-200 101-150 101-125 4 blocks loaded 1-100 51-100 76-100 3 blocks loaded Insert a Random value into a B-tree 151-200 151-200 2 blocks loaded Insert 161 At least 3 FPW

HOT Updates – Looking at FPW in the logs HOT Updated Heap 14/ 68, , d: HOT_UPDATE off 19 xmax 2327993188 ; new off 3 xmax 0, blkref #0: rel 1663/41083/41086 blk 28 XLOG 0/ 3368 , , d: FPI_FOR_HINT , blkref #0: rel 1663/41083/41092 blk 1492899 FPW Transaction 8/ 34, , d: COMMIT 2017-09-07 00:07:17.532647 UTC Non HOT Update Heap 14/ 75, , d: UPDATE off 67 xmax 2327993195 ; new off 7 xmax 0, blkref #0: rel 1663/41083/41086 blk 285 XLOG 0/ 2774 , , d: FPI_FOR_HINT , blkref #0: rel 1663/41083/41090 blk 7039952 FPW Btree 2/ 120, , d: INSERT_LEAF off 17, blkref #0: rel 1663/41083/41090 blk 7039952 XLOG 0/ 3150 , , d: FPI_FOR_HINT , blkref #0: rel 1663/41083/41092 blk 29 FPW Btree 2/ 64, , d: INSERT_LEAF off 205, blkref #0: rel 1663/41083/41092 blk 29 Btree 2/ 2639 , , d: INSERT_LEAF off 73, blkref #0: rel 1663/41083/41093 blk 4 FPW Btree 2/ 3148 , , d: INSERT_LEAF off 2, blkref #0: rel 1663/41083/41094 blk 1 FPW Btree 2/ 5099 , , d: INSERT_LEAF off 364, blkref #0: rel 1663/41083/41095 blk 4237904 FPW Transaction 8/ 34, , d: COMMIT 2017-09-07 00:24:29.427017 UTC 3.4K VS 16.7K

35X

Index Tests – Vacuuming benchdb => vacuum verbose benchmark_serial ; INFO: vacuuming " public.benchmark_serial " INFO: scanned index " pk_benchmark_serial_pk " to remove 1939503 row versions DETAIL: CPU 0.05s/4.93u sec elapsed 5.52 sec INFO: scanned index "i_benchmark_serial_a1" to remove 1939503 row versions DETAIL: CPU 0.00s/9.52u sec elapsed 9.67 sec INFO: scanned index "i_benchmark_serial_a2" to remove 1939503 row versions DETAIL: CPU 0.02s/9.26u sec elapsed 9.43 sec INFO: scanned index "i_benchmark_serial_a3" to remove 1939503 row versions DETAIL: CPU 0.00s/9.22u sec elapsed 9.49 sec INFO: scanned index "i_benchmark_serial_a4" to remove 1939503 row versions DETAIL: CPU 0.00s/9.05u sec elapsed 9.23 sec INFO: scanned index "i_benchmark_serial_a5" to remove 1939503 row versions DETAIL: CPU 0.00s/9.13u sec elapsed 9.44 sec INFO: scanned index "i_benchmark_serial_a6" to remove 1939503 row versions DETAIL: CPU 0.00s/9.21u sec elapsed 9.37 sec INFO: scanned index "i_benchmark_serial_a7" to remove 1939503 row versions DETAIL: CPU 0.01s/9.18u sec elapsed 9.57 sec INFO: scanned index "i_benchmark_serial_a8" to remove 1939503 row versions DETAIL: CPU 0.00s/9.22u sec elapsed 9.49 sec INFO: scanned index "i_benchmark_serial_a9" to remove 1939503 row versions ….. DETAIL : CPU 0.02s/9.03u sec elapsed 9.56 sec INFO: scanned index "i_benchmark_serial_a16" to remove 1939503 row versions ….. Serial

101X

Example – Keep track of table changes PK C1 C2 C3 C4 C4 Last Updated 1 X A1 FOO HOT 9 01-Jun-1999 2 Y A2 BAR IS 9 01-Jul-2001 3 Z A2 RLL REALLY 7 19-OCT-2009 4 A A1 MFM USEFUL 2 21-OCT-1972

Example – Keep track of table changes PK C1 C2 C3 C4 C4 Last Updated 1 X A1 FOO HOT 9 01-Jun-1999 2 Y A2 BAR IS 9 01-Jul-2001 3 Z A2 RLL REALLY 7 19-OCT-2009 4 A A1 MFM USEFUL 2 21-OCT-1972 Data Lake

Example – Keep track of table changes PK C1 C2 C3 C4 C4 Last Updated 1 X A1 FOO HOT 9 01-Jun-1999 2 Y A2 BAR IS 9 01-Jul-2001 3 Z A2 RLL REALLY 7 19-OCT-2009 4 A A1 MFM USEFUL 2 21-OCT-1972 Data Lake Full Table Scan

Example – Build an index on Last Updated PK C1 C2 C3 C4 C4 Last Updated 1 X A1 FOO HOT 9 01-Jun-1999 2 Y A2 BAR IS 9 01-Jul-2001 3 Z A2 RLL REALLY 8 15-OCT-2018 4 A A1 MFM USEFUL 2 21-OCT-1972 Data Lake Index Scan

Example – Build an index on Last Updated PK C1 C2 C3 C4 C4 Last Updated 1 X A1 FOO HOT 9 01-Jun-1999 2 Y A2 BAR IS 9 01-Jul-2001 3 Z A2 RLL REALLY 8 15-OCT-2018 4 A A1 MFM USEFUL 2 21-OCT-1972 Data Lake Index Scan Before: Updates to C1,C3,C4, Last Updated are HOT After: Every update is a regular update

Example – Logical Replication PK C1 C2 C3 C4 C4 Last Updated 1 X A1 FOO HOT 9 01-Jun-1999 2 Y A2 BAR IS 9 01-Jul-2001 3 Z A2 RLL REALLY 7 19-OCT-2009 4 A A1 MFM USEFUL 2 21-OCT-1972 All changes WAL logged WAL Logical Decoding a nd Replication Data Lake

Questions?