MySQL-InnoDB

mayankmysql 1,471 views 53 slides Nov 19, 2019
Slide 1
Slide 1 of 53
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53

About This Presentation

An overview of new enhancements/features done in MySQL-InnoDB in MySQL-8.0


Slide Content

InnoDB–MySQL Storage Engine
Mayank Prasad
InnoDB Developer

November 12, 2019
Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended
for information purposes only, and may not be incorporated into any contract. It
is not a commitment to deliver any material, code, or functionality, and should
not be relied upon in making purchasing decisions. The development, release,
and timing of any features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.
2

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
About Me
Mayank Prasad
•M.tech Computer Science.
•Consulting Member Technical Staff at Oracle-MySQL.
•Part of MySQL Server Development team since 2011.
•Worked in MySQL-Performance Schema Development for 6 years.
•Joined InnoDB development team in 2017.
•Reach me :
•@manku (Twitter)

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
A brief intro
Latest Features
Performance & Scalability
Q&A
4
Agenda

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
A brief intro
Latest Features
Performance & Scalability
Q&A
5
Agenda

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

 
[JDBC, ODBC, .NET, PHP, Python, C API, Perl]
 
 
Logs (bootstrap/error/slow query/binary/redo/undo), files (.ibd
files etc.)
[Java, .NET, PHP, Python, C, Perl]
Pluggable Storage Engine APIs Interface
Achieve Memory PFSMyISAM
Connection PoolAuthentication/Authorization
SQL Interface
(DDL, DML, SP, Triggers,
views etc.)
Tools/Utilities
(Backup/restore,
Upgrade, Import,
workbench, replication
etc.)
Parser
(Lexical analysis, Syntactic/
Semantic analysis, parse tree
generation etc.)
Optimizer
(Rewriting queries, order of scan,
plan generation etc.)
Others
MySQL Server
MySQL Connectors
Applications
Plugin Interface
(Daemon, I_S Table, Audit
Log, Query rewrite,
Password validation etc.)
Query Execution
(Sending/retrieving data to/from
SE)
Cache/Buffers
(Query cache, table
cache, etc.)
Disk
Custom
Plugins
MySQL Client
InnoDB
Storage Engines:
CREATE

FETCH
DELETE
UPDATE
MySQL
Server

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
MySQL Storage Engines
•Pluggable infrastructure. Write a new SE and plug it in.
•InnoDB : Default SE since MySQL 5.5
•Transactional SE. ACID compliant.
•Row-level-locking, Crash recovery, MVCC.
•Foreign-key referential integrity constraint.
•Different SEs are for different purpose.
7
Introduction

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Why InnoDB ?
•ACID compliant.
•High performance/reliability
•Row-level-locking. High concurrency.
•Foreign-key referential integrity
•Flexible with configuration variables to suite individual need.
•Customer/community feedback driven development
•…
8
Benefits

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
A brief intro
Latest Features
Performance & Scalability
Q&A
9
Agenda

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Data Dictionary
•Metadata of tables.
•Earlier :
•InnoDB maintains its own data dictionary for tables in it.
•Meta data information was kept on disk as well in different files (.frm etc).
•Mismatch possible. (b/w InnoDB Dictionary and .frm files on disk)
•Not crash proof.
10
Introduction
SQLs
I_S Tables
InnoDB
Dictionary
Disc
.frm files
.frm files
.frm files

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Single New Data Dictionary
•Now :
•One source of truth – System Tables
•Persistent metadata stored in System Tables which are stored in InnoDB.
•Atomic DDL
•No more .frm and InnoDB data dictionary mismatch issues
•Control meta-data access using a single locking mechanism (MDL)
•Server supports the concept of Table-spaces
•.frm files were per table, made tablespace support messy
•All DD tables are in DD table space now.
11
Benefits
SQLs
I_S
Data
Dictionary
DD Tables
System Tables
Views
InnoDB
DD
Tablespace
DD Tables
System Tables

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
●Upgrade from 5.7 only
●Make sure no crash and previous innodb_fast_shutdown is not set to 2
●Upgrade automatically
●Create new DD tables in DD tablespace
●Update all tables to new DD tables
●Upgrade to the new Undo tablespaces seamlessly
●Create SDI in all the .ibd files.
●Finally, legacy InnoDB system tables are dropped
New Data Dictionary : Upgrade

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Serialized Dictionary Information (SDI)
●Purpose
●Redundancy. Metadata stored in the tablespace in addition to the DD
●To make the tablespace self descriptive
●Updated on DDL

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
SDI tools
•A tool for extracting Serialized Dictionary Information (SDI)
•ibd2sdi
•Works offline and online
•Extracts the SDI data in JSON format
14

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Serialized Dictionary Information (SDI)
["ibd2sdi"
,
{
"type": 1,
"id": 340,
"object":
{

"dd_object_type": "Table",
"dd_object": {
"name": "t1",
"mysql_version_id": 80019,
"created": 20191104101218,
"last_altered": 20191104101218,

"columns": [
{
"name": "id",
"type": 4,
"is_nullable": false,
"is_zerofill": false,
"is_unsigned": false,
"is_auto_increment": false,

"schema_ref": “test",

"indexes": [
{
"name": "PRIMARY",
"hidden": false,

"foreign_keys": [],
"check_constraints": [],
mysql> CREATE TABLE test.t1 (id INT KEY, name CHAR(100));
mayankprasad:~/mysql-bin$ ./bin/ibd2sdi data/test/t1.ibd


Offline

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
SDI tools (Example)
•Useful during disaster recovery
•DD info is lost. Only tablespace (ibd) files are available.
•Extract SDI from tablespace (.ibd) file.
•Use SDI to restore table metadata and .ibd to restored data (IMPORT).
•Shared tablespace. One table t1 got corrupted.
•Extract the meta-data from the .ibd file into a separate .SDI file
•Manually edit .SDI file to remove t1 meta-data.
•Use edited .SDI file to import the resining table in tablespace.
16

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
DDL_log table – For Atomic DDL
•Internal table. Resides in DD tablespace.
•No user DDL/DML allowed in this.
•To support atomic DDL – records physical file actions during DDL.
•A single DDL may generate several entries in this table.
•Changes are persisted immediately, exempted from
innodb_flush_log_at_trx_commit setting.
•Entries are deleted when DDL finishes.
17

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
DDL_log table – For Atomic DDL
•Records physical changes on disk. Eg : tablespace file creation/drop, index trees
creation/drop, file rename etc.
•These physical operations in a DDL cannot be rolled back by a transaction.
•DDL transaction and this table together, makes Atomic (Crash safe DDL) possible
18

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Example – CREATE TABLE with Atomic DDL
MySQL 5.7
•CREATE TABLE T1 …
•InnoDB creates the physical tablespaces/files(file-per-table), Cluster index tree
and other index trees
•InnoDB starts its own transactions to insert new table/indexes metadata to
InnoDB own System tables (note: separate dictionary in InnoDB)
•If server crashes after physical tablespace/files created, before metadata
updated, these files will be orphaned and left behind.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Example – CREATE TABLE with Atomic DDL
MySQL 8.0
•CREATE TABLE T1 …
•InnoDB creates the physical tablespaces/files(file-per-table), Cluster index tree
and other index trees
•InnoDB logs the operations in the DDL_LOG.
•SQL layer commits or rollbacks.
•Call to SE post_ddl(). If rollback, the post_ddl() physically deletes the tablespace/
ibd (file-per-table) and drops the index trees for the table.
•If server crashes, then during DDL recovery ddl_log is consulted to get rid of
orphaned and left behind files.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
ALTER TABLE – ADD COLUMN
●Biggest pain point for users
●New columns have to be added (to a big table) from time to time
●Table rebuild was needed. Because each record header keeps columns
specific info.
●Costly - time, disk, resource schedule
●Table lock
●Replication - Master does it. Then slave does the same. Lots of time.
●…

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
ALTER TABLE - INSTANT ADD COLUMN
●ALTER TABLE t1 ADD COLUMN … ALGORITHM = INSTANT;
●New (default) algorithm
●Internals
●This will result in a metadata change only
●Existing records not touched. No table rebuild required
●Keep new Column’s default in Metadata for existing records.
●No table lock required
●Saves time/space/resources

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Table
Old format rows
Old format rowsOld format rows
DML
INSERT
DML
UPDATE
ALTER TABLE ..
ADD COLUMN c, [],
ALGORITHM=INSTANT
Data Dictionary
Core DD tablesSystem tables
No update to table!
Number
of fields
Default
values
ALTER TABLE - INSTANT ADD COLUMN

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Instant Add Column - Limitations
●Only support adding columns as the last column in the row
●No support for COMPRESSED tables, which is seldom used
●No support for tables with a full-text index
●No support for table residing in DD tablespace
●No support for temporary tables (it goes with COPY)

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
UNDO Log
●No longer stored in the system table space
●Manage undo logs using SQL
●File names must end in ‘.ibu’ suffix.
●CREATE UNDO TABLESPACE undo001 ADD DATAFILE ‘undo001. ibu’;
●ALTER UNDO TABLESPACE undo001 SET INACTIVE;
●ALTER UNDO TABLESPACE undo001 SET ACTIVE;
●DROP UNDO TABLESPACE undo001;
●Default is two UNDO table spaces

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Observability (Information Schema)
• New table, INNODB_CACHED_INDEXES
• Report pages cached in the InnoDB buffer pool for each index.
mysql> select * from INFORMATION_SCHEMA.INNODB_CACHED_INDEXES where INDEX_ID=144;
+----------+----------+----------------+
| SPACE_ID | INDEX_ID | N_CACHED_PAGES |
+----------+----------+----------------+
| 3 | 144 | 1 |
+----------+----------+----------------+
•Undo log status in INNODB_TABLESPACES
mysql> SELECT NAME, SPACE_TYPE, STATE FROM INFORMATION_SCHEMA.INNODB_TABLESPACES WHERE SPACE_TYPE="Undo" ORDER BY
NAME;

+-----------------+------------+--------+
| NAME | SPACE_TYPE | STATE |
+-----------------+------------+--------+
| innodb_undo_001 | Undo | active |
| innodb_undo_002 | Undo | active |
| undo003 | Undo | empty |
+-----------------+------------+--------+
26

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Dedicated Server
•To utilise server resources to max.
•--innodb-dedicated-server := boolean (default OFF)
•Sets values based on physical memory available
•—innodb-log-file-size based on physical memory size
•—innodb-buffer-pool-size based on physical memory size
•—innodb-flush-method = O_DIRECT_NO_FSYNC
•Explicitly set values get preference.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
If phy_mem_size < 1G
Use InnoDB default value
Else If phy_mem_size <= 4GB
Use 50% of phy_mem_size
Else
Use 75% of phy_mem_size
End
--innodb-buffer-pool-size
Dedicated Server (contd.) :
If phy_mem_size < 1G
Use InnoDB default value
Else If phy_mem_size <= 4GB
Set to 128 MB
Else if phy_mem_size <= 8 GB
Set to 512 MB
Else if phy_mem_size <= 16 GB
Set to 1GB
Else
Set to 2GB
End
--innodb-log-file-size

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Parallel Select Count(*) And Check Table
•Scans the sub-trees below the root node in parallel
•For very large tables it will dynamically break the scan to sub-ranges
•—innodb-parallel-thread-count = [1-256]
•Session variable. Default 4.
•Maximum threads 256.
•Scans partitions in parallel too
•Only works on the clustered index (yet)
•Foundation for future work for all non-locking range selects
•Stay tuned
29

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Lock Manager
•Contribution by University of Michigan DB researchers
•Key idea is :
•transaction weight : How it blocks other transactions.
•Allow transaction with more weight to proceed
•Cab driver vs. Bus driver
•No configuration required. Switches b/w FIFO and CATS automatically.
•Threshold is >= 32 waiting Transactions. To make lower contention
work as it is.
30
CATS (Contention Aware Transaction Scheduling)

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Lock Manager
•Performance Improvements
•Working on improving CATS

further.
•4K -> 15K TPS (512 clients)
•Blog link here
31
CATS (Contention Aware Transaction Scheduling)

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Redesigned LOB infrastructure (for DocumentStore)
•For partial fetch and update
•BLOB pages are stored in singly list
•Finding a given offset is time

and I/O consuming
•Some byte in last page

of BLOB is to be read in

very large BLOB.
32
Uncompressed LOBs

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Redesigned LOB infrastructure (for DocumentStore)
•New format
•LOB index pages
•Calculate which page to read

based on LOB index entries
•Then read that page only.
33
Uncompressed LOBs (contd.)

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Redesigned LOB infrastructure (for DocumentStore)
•Performance Improvement
•500 Threads (10K->110K)
•Blog link here
34
Uncompressed LOBs (contd.)

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Redesigned LOB infrastructure (for DocumentStore)
•Compressed pages

stored in singly linked

list.
•For small update, full

BLOB is to be 

uncompressed
35
Compressed LOBs

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Redesigned LOB infrastructure (for DocumentStore)
•BLOB divided into chunks
•Chunks are compressed 

separately with index on it.
•For small update, a chunk

can be found and updated.
36
Compressed LOBs (contd.)

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Redesigned LOB infrastructure (for DocumentStore)
•Improved performance 

results
•Blog link here
37
Compressed LOBs (contd.)

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Encryption : Redo/Undo, General Tablespace
• Encrypt redo and undo log
•--innodb-redo-log-encrypt := bool
•--innodb-undo-log-encrypt := bool
• Shared tablespaces
•Support Encryption
•CREATE/ALTER TABLESPACE … ENCRYPTION=‘Y’/’N’
•Support Import/Export using SDI (WIP)

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Removed Major I/O bottleneck
•All I/O was covered by a single mutex (fil_sys_t::mutex)
•Sharded the fil_sys_t::mutex
•64 shards
•Dedicated shard for redo log files and undo tablespaces
39

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Performance Results
40
For 512 connections, 600K ->1M TPS.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Lock Free Redo log

New design (lock free)Old design

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Performance Results

For 256 connections, 60K ->240K TPS.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Clone
•Simple and efficient way to create a MySQL replica
•Cloned replica is a physical snapshot with exact data.
•Can be a cloned locally or remotely
43

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Clone
44

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Clone
•Configuration parameters are not cloned
•Does not clone binary logs
•Only InnoDB support
•Tables in other storage engines will be cloned as empty tables
•Blocks all concurrent DDLs in Donor node
45

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Clone : Observability
•performance_schema.clone_status
•performance_schema.clone_progress
STAGE STATE START TIME DURATION Estimate Done(%)
DROP DATA Completed 17:23:26 790.86 ms 0 MB 100%
FILE COPY Completed 17:23:27 10.33 ms 94,729 MB 100%
PAGE COPY Completed 17:33:47 15.91 s 11,885 MB 100%
REDO COPY Completed 17:34:03 1.07 s 293 MB 100%
FILE SYNC In Progress 17:34:04 51.68 s 0 MB 0%
RESTART Not Started NULL NULL 0 MB 0%
RECOVERY Not Started NULL NULL 0 MB 0%
46

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Clone : Resource Control
•Limit network usage
•—clone-max-network-bandwidth (dynamic)
•Affects currently running Clone too
•—clone-enable-compression
•CPU consumption at Donor is going to increase
•Limit data usage
•—clone-max-data-bandwidth (dynamic)
•Set to reduce clone transfer rate to reduce impact
•To control Donor’s performance impact (less I/O).

47

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Clone : Security
•Secure connection
•CLONE … REQUIRE SSL; — to force secure connection
•For encrypted tablespaces SSL is always enforced
•—clone-ssl* configurations for using certificates
•Clone user at donor instance must have
•Privilege to connect from recipient host user@host
•BACKUP_ADMIN privilege for data copy
•Clone user at recipient must have
•CLONE_ADMIN privilege to replace instance data
•Donor host name and port must be in the recipient configured list
•—clone-valid-donor-list [SYSTEM_VARIABLES_ADMIN privilege is needed]
48

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Clone : Performance Control
•Clone plugin uses multiple connections from recipient to donor
server
•—clone-max-concurrency Limits maximum connections
•Helps in preventing CPU/IO/Network saturation

•Clone starts with 1 thread and threads are increased in steps
•—clone-autotune-concurrency [auto tune spawning threads]
49

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Clone : Handles Network Failure
50

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Clone : Data Directory
51
•Clone local instance data to a local data directory
mysql> CLONE LOCAL DATA DIRECTORY = “/absolute/path/data”;
•Clone remote instance data
•Instead of replacing current data directory
mysql> CLONE INSTANCE
FROM [email protected]:3306
IDENTIFIED BY “clone_password”
DATA DIRECTORY = “/absolute/path/data”;

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Stay tuned to MySQL Server Blogs
http://mysqlserverteam.com
Thank you for using MySQL!