HOT Understanding this important update optimization
GrantMcAlister
1,040 views
86 slides
Oct 24, 2018
Slide 1 of 86
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
About This Presentation
In this session we dive deep into HOT (Heap Only Tuple) update optimization. Utilizing this optimization can result in improved writes rates, less index bloat and reduced vacuum effort but to enable PostgreSQL to use this optimization may require changing your application design and database settin...
In this session we dive deep into HOT (Heap Only Tuple) update optimization. Utilizing this optimization can result in improved writes rates, less index bloat and reduced vacuum effort but to enable PostgreSQL to use this optimization may require changing your application design and database settings. We will examine how the number of indexes, frequency of updates, fillfactor and vacuum settings can influence when HOT will be utilized and what benefits you may be able to gain.
Size: 2.7 MB
Language: en
Added: Oct 24, 2018
Slides: 86 pages
Slide Content
Grant McAlister – Senior Principal Engineer – Amazon RDS Oct 2018 HOT UNDERSTANDING THIS IMPORTANT UPDATE OPTIMIZATION
What is HOT - Heap Only Tuples The Heap Only Tuple (HOT) feature eliminates redundant index entries and allows the re-use of space taken by DELETEd or obsoleted UPDATEd tuples without performing a table-wide vacuum. It does this by allowing single-page vacuuming, also called "defragmentation". Full description - src /backend/access/heap/README.HOT
Regular Update heap 1 2 lp index leaf index A index B index C tuple v1 block0 block1
Regular Update heap 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2
pageinspect – one row inserted postgres =# SELECT * FROM page_header ( get_raw_page ('benchmark_uuid2',0)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid -----------+----------+-------+-------+-------+---------+----------+---------+----------- 0/15E6380 | 0 | 0 | 28 | 8008 | 8192 | 8192 | 4 | 0 (1 row) postgres=# select lp, lp_off, case when lp_flags = 0 THEN 'LP_UNUSED' when lp_flags=1 THEN 'LP_NORMAL' when lp_flags=2 THEN 'LP_REDIRECT' when lp_flags=3 THEN 'LP_DEAD' END as lp_flags, lp_len, t_xmin, t_xmax, t_ctid,infomask(t_infomask, 1) as infomask,infomask(t_infomask2, 2) as infomask2 from heap_page_items(get_raw_page('benchmark_uuid2', 0)); lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+--------+-----------+--------+--------+--------+--------+--------------------------+----------- 1 | 8008 | LP_NORMAL | 184 | 1861 | 0 | (0,1) | XMAX_INVALID|HASVARWIDTH | (1 row) postgres=# SELECT * FROM bt_page_items('i_benchmark_uuid2_id', 1); itemoffset | ctid | itemlen | nulls | vars | data ------------+-------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 04 00 00 00 00 00 00 00 (1 row)
pageinspect – regular update postgres =# update benchmark_uuid2 set last_updated = now() where id=4; UPDATE 1 postgres =# SELECT * FROM page_header ( get_raw_page ('benchmark_uuid2',0)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid -----------+----------+-------+-------+-------+---------+----------+---------+----------- 0/15E67A8 | 0 | 0 | 32 | 7824 | 8192 | 8192 | 4 | 1862 (1 row) postgres=# select lp, lp_off, case when lp_flags = 0 THEN 'LP_UNUSED' when lp_flags=1 THEN 'LP_NORMAL' when lp_flags=2 THEN 'LP_REDIRECT' when lp_flags=3 THEN 'LP_DEAD' END as lp_flags, lp_len, t_xmin, t_xmax, t_ctid,infomask(t_infomask, 1) as infomask,infomask(t_infomask2, 2) as infomask2 from heap_page_items(get_raw_page('benchmark_uuid2', 0)); lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+--------+-----------+--------+--------+--------+--------+----------------------------------+----------- 1 | 8008 | LP_NORMAL | 184 | 1861 | 1862 | (0,2) | XMIN_COMMITTED|HASVARWIDTH | 2 | 7824 | LP_NORMAL | 184 | 1862 | 0 | (0,2) | UPDATED|XMAX_INVALID|HASVARWIDTH | (2 rows) postgres=# SELECT * FROM bt_page_items('i_benchmark_uuid2_id', 1); itemoffset | ctid | itemlen | nulls | vars | data ------------+-------+---------+-------+------+------------------------- 1 | (0,2) | 16 | f | f | 04 00 00 00 00 00 00 00 2 | (0,1) | 16 | f | f | 04 00 00 00 00 00 00 00 (2 rows)
HOT Update heap 1 lp index leaf index A index B index C tuple v1 block0 block1
HOT Update heap 1 lp index leaf index A index B index C tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple tuple tuple v2
HOT Update heap 1 lp index leaf index A index B index C tuple v1 block0 block1
HOT Update heap 1 lp index leaf index A index B index C tuple v1 block0 block1
HOT Update heap 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2
HOT Update heap 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2
HOT Update heap 3 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2 tuple v3
HOT Update heap 3 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2 tuple v3
HOT Update heap 3 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2 tuple v3
HOT Update heap 3 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2 tuple v3
HOT Update heap 3 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2 tuple v3
pageinspect – first hot update postgres =# update benchmark_uuid2 set e=cast(0 as boolean ) where id = 5; UPDATE 1 postgres =# SELECT * FROM page_header ( get_raw_page ('benchmark_uuid2',0)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid -----------+----------+-------+-------+-------+---------+----------+---------+----------- 0/15EC1A0 | 0 | 0 | 32 | 7824 | 8192 | 8192 | 4 | 1865 (1 row) postgres=# select lp, lp_off, case when lp_flags = 0 THEN 'LP_UNUSED' when lp_flags=1 THEN 'LP_NORMAL' when lp_flags=2 THEN 'LP_REDIRECT' when lp_flags=3 THEN 'LP_DEAD' END as lp_flags, lp_len, t_xmin, t_xmax, t_ctid,infomask(t_infomask, 1) as infomask,infomask(t_infomask2, 2) as infomask2 from heap_page_items(get_raw_page('benchmark_uuid2', 0)); lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+--------+-----------+--------+--------+--------+--------+----------------------------------+----------------- 1 | 8008 | LP_NORMAL | 184 | 1864 | 1865 | (0,2) | XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED 2 | 7824 | LP_NORMAL | 184 | 1865 | 0 | (0,2) | UPDATED|XMAX_INVALID|HASVARWIDTH | HEAP_ONLY_TUPLE (2 rows) postgres=# SELECT * FROM bt_page_items('i_benchmark_uuid2_id', 1); itemoffset | ctid | itemlen | nulls | vars | data ------------+-------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 05 00 00 00 00 00 00 00 (1 row)
pageinspect – second hot update postgres =# update benchmark_uuid2 set e=cast(1 as boolean ) where id = 5; UPDATE 1 postgres =# SELECT * FROM page_header ( get_raw_page ('benchmark_uuid2',0)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid -----------+----------+-------+-------+-------+---------+----------+---------+----------- 0/15EC4C8 | 0 | 0 | 36 | 7640 | 8192 | 8192 | 4 | 1865 (1 row) postgres =# select lp , lp_off , case when lp_flags = 0 THEN 'LP_UNUSED' when lp_flags =1 THEN 'LP_NORMAL' when lp_flags =2 THEN 'LP_REDIRECT' when lp_flags =3 THEN 'LP_DEAD' END as lp_flags , lp_len , t_xmin , t_xmax , t_ctid,infomask ( t_infomask , 1) as infomask,infomask (t_infomask2, 2) as infomask2 from heap_page_items ( get_raw_page ('benchmark_uuid2', 0)); lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+--------+-----------+--------+--------+--------+--------+-------------------------------------------+----------------------------- 1 | 8008 | LP_NORMAL | 184 | 1864 | 1865 | (0,2) | XMAX_COMMITTED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED 2 | 7824 | LP_NORMAL | 184 | 1865 | 1866 | (0,3) | UPDATED|XMIN_COMMITTED|HASVARWIDTH | HOT_UPDATED|HEAP_ONLY_TUPLE 3 | 7640 | LP_NORMAL | 184 | 1866 | 0 | (0,3) | UPDATED|XMAX_INVALID|HASVARWIDTH | HEAP_ONLY_TUPLE (3 rows) postgres =# SELECT * FROM bt_page_items ('i_benchmark_uuid2_id', 1); itemoffset | ctid | itemlen | nulls | vars | data ------------+-------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 05 00 00 00 00 00 00 00 (1 row)
HOT Update - Pruning heap 3 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2 tuple v3
HOT Update - Pruning heap 3 1 2 lp index leaf index A index B index C block0 block1 tuple v2 tuple v3 X
HOT Update - Pruning heap 3 1 2 lp index leaf index A index B index C block0 block1 tuple v2 tuple v3
HOT Update - Pruning heap 3 1 2 lp index leaf index A index B index C block0 block1 tuple v3
Fillfactor heap tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple tuple block2 fillfactor =100 (default) Continuous inserts and update the same single tuple 100 times
Fillfactor heap tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple tuple tuple v2 block2 tuple tuple tuple tuple tuple tuple tuple tuple fillfactor =100 (default) Continuous inserts and update the same single tuple 100 times
Fillfactor heap tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple tuple tuple v2 tuple v3 block2 tuple tuple tuple tuple tuple tuple tuple tuple fillfactor =100 (default) Continuous inserts and update the same single tuple 100 times
Fillfactor heap tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple block2 fillfactor =90 Continuous inserts and update the same single tuple 100 times
Fillfactor heap tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple block2 fillfactor =90 tuple v2 Continuous inserts and update the same single tuple 100 times
Fillfactor heap tuple v1 block0 block1 tuple tuple tuple tuple tuple tuple tuple block2 tuple tuple tuple tuple tuple tuple tuple fillfactor =90 tuple v2 tuple v3 tuple Continuous inserts and update the same single tuple 100 times
Fillfactor heap tuple v1 block0 block1 tuple block2 tuple fillfactor =10 tuple tuple tuple Continuous inserts and update the same single tuple 100 times
Fillfactor heap tuple v1 block0 block1 tuple block2 tuple fillfactor =10 tuple tuple v2 tuple tuple Continuous inserts and update the same single tuple 100 times
Fillfactor heap tuple v1 block0 block1 tuple block2 tuple fillfactor =10 tuple tuple v2 tuple v3 tuple tuple Continuous inserts and update the same single tuple 100 times
Fillfactor on Insert & Single Update Workload Insert at 2K TPS and update one row 100 times while having a long running transaction Fillfactor Single Key Fetch Table Scan 100 90 5 10 101 blocks 18 blocks 5 blocks 3 blocks 5.5K blocks 6K blocks 11K blocks 60K blocks Bloat comes in many disguises
Fillfactor on Insert & Single Update Workload Insert at 2K TPS and update one row 100 times while having a long running transaction Fillfactor Single Key Fetch Table Scan 100 90 5 10 5.5K blocks 6K blocks 11K blocks 60K blocks 2 blocks 1 block 1 block Bloat comes in many disguises
Fillfactor on Insert & Single Update Workload Insert at 2K TPS and update one row 100 times while having a long running transaction Fillfactor Single Key Fetch Table Scan 100 90 5 10 5.5K blocks 6K blocks 11K blocks 60K blocks 2 blocks 1 block 1 block 1 block Bloat comes in many disguises
Measuring Longest Running Transaction postgres =# select max(now() - xact_start ) from pg_stat_activity ; max ----------------- 02:02:48.021408 2+ hours
One the fly cleanup - HEAP heap 1 2 lp index leaf index A index B index C tuple v1 block0 block1 tuple v2
One the fly cleanup - HEAP heap 1 2 lp index leaf index A index B index C block0 block1 tuple v2
One the fly cleanup - HEAP heap 1 2 lp index leaf index A index B index C block0 block1 tuple v2
Is that an update? ORM cases ORM and other software sometimes update all columns in table including all indexed columns UPDATE to NEW VALUE postgres=# update benchmark_uuid2 set last_updated = now() where id=2; UPDATE 1 postgres=# select n_tup_upd, n_tup_hot_upd from pg_stat_all_tables where relname = 'benchmark_uuid2'; n_tup_upd | n_tup_hot_upd -----------+--------------- 1 | 0 UPDATE to SAME VALUE (i.e. ORM case) postgres=# update benchmark_uuid2 set last_updated = ( select last_updated benchmark_uuid2 where id=2) where id=2; UPDATE 1 postgres=# select n_tup_upd, n_tup_hot_upd from pg_stat_all_tables where relname = 'benchmark_uuid2'; n_tup_upd | n_tup_hot_upd -----------+--------------- 2 | 1 (1 row) !=
Index Tests – A table with 2 to 64 indexes Create the table with a PK, 64 Random int Columns and 1 last updated timestamp create table benchmark_serial ( pk serial constraint pk_benchmark_serial_pk PRIMARY KEY, a1 int not null , ….. a64 int not null, last_updated timestamp ); Create between 2 and 64 indexes on the random int columns create index i_benchmark_serial_a1 on benchmark_serial (a1 ); …. create index i_benchmark_serial_a64 on benchmark_serial (a64); For the Regular Test create index i_benchmark_serial_lu on benchmark_serial ( last_updated );
Full Page Writes Block in Memory PostgreSQL update t set y = 6; Full Block WAL
Full Page Writes Block in Memory PostgreSQL update t set y = 6; Full Block WAL
Full Page Writes Block in Memory PostgreSQL update t set y = 6; Checkpoint Datafile Full Block WAL Archive
Full Page Writes Block in Memory PostgreSQL update t set y = 6; Checkpoint Datafile Full Block WAL Archive 4K 4K 8K
Full Page Writes Block in Memory PostgreSQL update t set y = 6; Checkpoint Datafile Full Block WAL Archive 4K 4K 8K During crash recovery PostgreSQL uses the FPW block in the WAL to replace the bad checkpointed block
1-200 1-100 101-200 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 Insert a Sequence Number into a B-tree
1-200 1-100 101-200 Insert 201 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 1-201 101-201 151-201 176-201 4 blocks loaded Insert a Sequence Number into a B-tree
1-200 1-100 101-200 Insert 201 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 Insert 202 1-201 101-201 151-201 176-201 1-202 101-202 151-202 176-202 4 blocks loaded Insert a Sequence Number into a B-tree 0 blocks loaded
1-200 1-100 101-200 Insert 201 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 Insert 202 1-201 101-201 151-201 176-201 1-202 101-202 151-202 176-202 4 blocks loaded Insert a Sequence Number into a B-tree 0 blocks loaded At least 1 FPW
1-200 1-100 101-200 Insert 124 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 1-200 101-200 101-150 101-125 4 blocks loaded Insert a Random value into a B-tree
1-200 1-100 101-200 Insert 124 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 Insert 99 1-200 101-200 101-150 101-125 4 blocks loaded 1-100 51-100 76-100 3 blocks loaded Insert a Random value into a B-tree
1-200 1-100 101-200 Insert 124 1-50 51-100 1-25 26-50 51-75 76-100 101-150 151-200 101-125 126-150 151-175 176-200 Insert 99 1-200 101-200 101-150 101-125 4 blocks loaded 1-100 51-100 76-100 3 blocks loaded Insert a Random value into a B-tree 151-200 151-200 2 blocks loaded Insert 161 At least 3 FPW
HOT Updates – Looking at FPW in the logs HOT Updated Heap 14/ 68, , d: HOT_UPDATE off 19 xmax 2327993188 ; new off 3 xmax 0, blkref #0: rel 1663/41083/41086 blk 28 XLOG 0/ 3368 , , d: FPI_FOR_HINT , blkref #0: rel 1663/41083/41092 blk 1492899 FPW Transaction 8/ 34, , d: COMMIT 2017-09-07 00:07:17.532647 UTC Non HOT Update Heap 14/ 75, , d: UPDATE off 67 xmax 2327993195 ; new off 7 xmax 0, blkref #0: rel 1663/41083/41086 blk 285 XLOG 0/ 2774 , , d: FPI_FOR_HINT , blkref #0: rel 1663/41083/41090 blk 7039952 FPW Btree 2/ 120, , d: INSERT_LEAF off 17, blkref #0: rel 1663/41083/41090 blk 7039952 XLOG 0/ 3150 , , d: FPI_FOR_HINT , blkref #0: rel 1663/41083/41092 blk 29 FPW Btree 2/ 64, , d: INSERT_LEAF off 205, blkref #0: rel 1663/41083/41092 blk 29 Btree 2/ 2639 , , d: INSERT_LEAF off 73, blkref #0: rel 1663/41083/41093 blk 4 FPW Btree 2/ 3148 , , d: INSERT_LEAF off 2, blkref #0: rel 1663/41083/41094 blk 1 FPW Btree 2/ 5099 , , d: INSERT_LEAF off 364, blkref #0: rel 1663/41083/41095 blk 4237904 FPW Transaction 8/ 34, , d: COMMIT 2017-09-07 00:24:29.427017 UTC 3.4K VS 16.7K
35X
Index Tests – Vacuuming benchdb => vacuum verbose benchmark_serial ; INFO: vacuuming " public.benchmark_serial " INFO: scanned index " pk_benchmark_serial_pk " to remove 1939503 row versions DETAIL: CPU 0.05s/4.93u sec elapsed 5.52 sec INFO: scanned index "i_benchmark_serial_a1" to remove 1939503 row versions DETAIL: CPU 0.00s/9.52u sec elapsed 9.67 sec INFO: scanned index "i_benchmark_serial_a2" to remove 1939503 row versions DETAIL: CPU 0.02s/9.26u sec elapsed 9.43 sec INFO: scanned index "i_benchmark_serial_a3" to remove 1939503 row versions DETAIL: CPU 0.00s/9.22u sec elapsed 9.49 sec INFO: scanned index "i_benchmark_serial_a4" to remove 1939503 row versions DETAIL: CPU 0.00s/9.05u sec elapsed 9.23 sec INFO: scanned index "i_benchmark_serial_a5" to remove 1939503 row versions DETAIL: CPU 0.00s/9.13u sec elapsed 9.44 sec INFO: scanned index "i_benchmark_serial_a6" to remove 1939503 row versions DETAIL: CPU 0.00s/9.21u sec elapsed 9.37 sec INFO: scanned index "i_benchmark_serial_a7" to remove 1939503 row versions DETAIL: CPU 0.01s/9.18u sec elapsed 9.57 sec INFO: scanned index "i_benchmark_serial_a8" to remove 1939503 row versions DETAIL: CPU 0.00s/9.22u sec elapsed 9.49 sec INFO: scanned index "i_benchmark_serial_a9" to remove 1939503 row versions ….. DETAIL : CPU 0.02s/9.03u sec elapsed 9.56 sec INFO: scanned index "i_benchmark_serial_a16" to remove 1939503 row versions ….. Serial
101X
Example – Keep track of table changes PK C1 C2 C3 C4 C4 Last Updated 1 X A1 FOO HOT 9 01-Jun-1999 2 Y A2 BAR IS 9 01-Jul-2001 3 Z A2 RLL REALLY 7 19-OCT-2009 4 A A1 MFM USEFUL 2 21-OCT-1972
Example – Keep track of table changes PK C1 C2 C3 C4 C4 Last Updated 1 X A1 FOO HOT 9 01-Jun-1999 2 Y A2 BAR IS 9 01-Jul-2001 3 Z A2 RLL REALLY 7 19-OCT-2009 4 A A1 MFM USEFUL 2 21-OCT-1972 Data Lake
Example – Keep track of table changes PK C1 C2 C3 C4 C4 Last Updated 1 X A1 FOO HOT 9 01-Jun-1999 2 Y A2 BAR IS 9 01-Jul-2001 3 Z A2 RLL REALLY 7 19-OCT-2009 4 A A1 MFM USEFUL 2 21-OCT-1972 Data Lake Full Table Scan
Example – Build an index on Last Updated PK C1 C2 C3 C4 C4 Last Updated 1 X A1 FOO HOT 9 01-Jun-1999 2 Y A2 BAR IS 9 01-Jul-2001 3 Z A2 RLL REALLY 8 15-OCT-2018 4 A A1 MFM USEFUL 2 21-OCT-1972 Data Lake Index Scan
Example – Build an index on Last Updated PK C1 C2 C3 C4 C4 Last Updated 1 X A1 FOO HOT 9 01-Jun-1999 2 Y A2 BAR IS 9 01-Jul-2001 3 Z A2 RLL REALLY 8 15-OCT-2018 4 A A1 MFM USEFUL 2 21-OCT-1972 Data Lake Index Scan Before: Updates to C1,C3,C4, Last Updated are HOT After: Every update is a regular update
Example – Logical Replication PK C1 C2 C3 C4 C4 Last Updated 1 X A1 FOO HOT 9 01-Jun-1999 2 Y A2 BAR IS 9 01-Jul-2001 3 Z A2 RLL REALLY 7 19-OCT-2009 4 A A1 MFM USEFUL 2 21-OCT-1972 All changes WAL logged WAL Logical Decoding a nd Replication Data Lake