MySQL Indexing

badraccount 135 views 43 slides Dec 29, 2016
Slide 1
Slide 1 of 43
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43

About This Presentation

The idea behind database indexing and the implementation of indexes in MySQL, one of the most famous opensource databases out there.


Slide Content

MySQL indexing

Agenda
●Indexes
-What is index
-B-Tree
-More about indexes

●Queries
-Temporary Tables and filesort in MySQL
-GROUP BY optimization
-Order By Optimization

Indexing in the Nutshell
●Indexes is a data structure which is created and
targeted to speed access to database to make your
query run faster.
●Queries can be ran without any indexes but it can take
really long time.

What Does That mean?

Overhead of The Indexing
Writes:
Updating data means updating index

Reads:
Additional Indexes lead to wasting space and memory, and
also additional overhead during query optimization.

Costly, don’t add more indexes than you need.

Types of Indexes
●BTREE => Majority of indexes in MySQL
●RTREE => MyISAM only, for GIS
●HASH => MEMORY, NDB
●FULLTEXT => MyISAM, Innodb starting from 5.6

BTREE is default index except for MEMORY engine.
RTREE is for queries like show me all cities within 100 mile of Alex.
NDB is a MySQL Cluster Storage Engine.
FULLTEXT InnoDB, have an inverted index design.

B-Tree
B-Trees were described originally as
generalizations of binary search trees BST.
The generalization is that instead of one value,
the node has a list of values,
and the list is of size n ( n > 2 ).


BST

B+Tree
Branch/Root Node

less than 3



Leaf Node


Data Pointers

●SELECT * FROM table where id=2
●range scan SELECT * from table where id in (2,4,6)
●innodb, pointers in two directions optimized for range scan
●have free space, data can be added later.

B+Tree
InnoDB uses a B+Tree structure for its indexes. A B+Tree is
particularly efficient when data doesn’t fit in memory and
must be read from the disk, as it ensures that a fixed
maximum number of reads would be required to access
any data requested, based only on the depth of the tree,
which scales nicely.

B-Tree VS B+Tree
●B+ trees don't store data pointer in interior nodes, they are ONLY
stored in leaf nodes. This is not optional as in B-Tree. This means
that interior nodes can fit more keys on block of memory.
●The leaf nodes of B+ trees are linked, so doing a linear scan of all
keys will requires just one pass through all the leaf nodes. A B
tree, on the other hand, would require a traversal of every level
in the tree. This property can be utilized for efficient search as
well since data is stored only in leafs.

What Operations can B+Tree do?
●Find all rows with KEY=5 (point lookup)
●Find all rows with KEY>5 (open range)
●Find all rows with 5<KEY>10 (closed range)
●Can not find rows where last digit of the KEY is Zero

Summary
●Linear search is very slow, complexity is O(n)
●Indexes improve search performance.
●Many different type of indexes.
●B-Tree Indexes and derivatives (MyISAM, InnoDB)
●But add extra cost to INSERT/UPDATE/ DELETE

Indexes in MyISAM vs Innodb
MyISAM:
data pointers points to physical offset in the data file.

Innodb:
Primary Keys: stores data in the leaf pages of the
index, not pointer.
Secondary Keys: stores Primary Keys as data pointer.

Indexing Innodb table
1.Every table has a primary key; if the CREATE TABLE does not specify one,
the first non-NULL unique key is used, and failing that, a 48-bit hidden
“Row ID” field is automatically added to the table structure and used as
the primary key. Always add a primary key yourself. The hidden one is
useless to you but still costs 6 bytes per row.
2.The “row data” (non-PRIMARY KEY fields) are stored in the PRIMARY KEY
index structure, which is also called the “clustered key”. This index
structure is keyed on the PRIMARY KEY fields, and the row data is the
value attached to that key.
3.Secondary keys are stored in an identical index structure, but they are
keyed on the KEY fields, and the primary key value (PKV) is attached to
that key.

Indexing Innodb table
Data is clustered by Primary Key
-For comments (POST_ID, COMMENT_ID) can be good
PRIMARY KEY, storing all comments for a single post
close together.
-Primary Key is implicitly appended to all indexes

KEY(A, B, C) -- ORDER of columns matters
Index is USED
●A>5
●A=5 AND B>6
●A=5 AND B=6 AND C=7
●A=5 AND B IN (2,3) AND C>5
Index is NOT USED
●B<5
●B=6 AND C=7
●A>5 AND B=2 -> range on first column, only use this key part
●A=5 AND B>6 AND C=2 -> range on second column, use 2 parts



Multiple Column Index

First Rule of MySQL Optimizer
MySQL will stop using key parts in multi parts index as
soon as it met real range (>, <, BETWEEN), it however is
able to continue key parts further to the right if IN (..)
range is used.

Use Index for Sorting
SELECT * FROM players ORDER BY score DESC LIMIT 10
●Use index on score column
●Without the index, MySQL will do “filesort” (very
expensive)
SELECT * FROM players WHERE country=”US” ORDER BY
score DESC LIMIT 10
●Best served by index on KEY(country, score)

Multi Column Indexes for efficient sorting
KEY(A, B)
Index is USED
●ORDER BY A
●A=5 ORDER BY B
●ORDER BY A DESC, B DESC
●A>5 ORDER BY A
Index is NOT USED
●ORDER BY B
●A>5 ORDER BY B
●A IN (1,2) ORDER BY B
●ORDER BY A ASC, B DESC

MySQL Index for Sorting Rules
●You can’t sort in different order by 2 columns
●You can only have equality comparison for columns
which are not part of the ORDER BY

Avoiding Reading The data
Covering Index
●Reading Index only, not accessing the data.
SELECT status FROM orders where csutomer_id=123
KEY(customer_id, status)
●Help Min/Max aggregate functions (Only)
SELECT MAX(salary) FROM employees GROUP BY dept_id
KEY(dept_id, salary)

Indexes and Joins
SELECT * FROM posts, comments WHERE author=”Peter”
AND comments.post_id = posts.id
-Scan posts table for posts with author=”Peter”
-For each row, go to the comments table and fetch
related comments.

●Index comments.post_id
●Index posts.id is not used in this case.

Using multiple indexes for The Table
MySQL can use more than one Index.
SELECT * FROM table WHERE A=5 AND B=6
Can use KEY(A) & KEY(B)
KEY(A, B) is much better.

SELECT * FROM table WHERE A=5 OR B=6
2 separate indexes is as good as it gets.
KEY(A, B) can’t be used.

Let’s Practice
●Combined Index
●Covered Index

Indexes
CREATE TABLE City (
ID int(11) NOT NULL AUTO_INCREMENT,
Name char(35) NOT NULL DEFAULT ‘’,
CountryCode char(3) NOT NULL DEFAULT ‘’,
District char(20) NOT NULL DEFAULT ‘’,
Population int(11) NOT NULL DEFAULT ‘0’,
PRIMARY KEY (ID),
KEY CountryCode (CountryCode)
) Engine=InnoDB;

MySQL Will Use Best Index

Combined Index Example
Combined index is an index that contains multiple keys.

MySQL can use leftmost part of any index.

Leftmost Part of Combined Index
key_len = total size (in bytes) of index parts used.

Combined Indexes: Example
Index: comb(CountryCode, District, Population)
2 leftmost Fields

Combined Indexes: Example
Can not use combined index if not left most

Covered Index: Example
Covered index = cover all fields in the query

ALTER TABLE City ADD KEY cov1(CountryCode, District, population,
name);

Fields order:
1.Where clause
2.Group By/Order By (Not Used Now)
3.Select part (name)

Covered Index: Example

Covered Index: Example
Range & Const condition.

SELECT name FROM City where district=”California” AND population >
30000

Index (district, population, name) in this order.

Rule of thumb:
Const first the Range comes second. (depends on the Query)

Complex Slow Queries
●GROUP BY
●ORDER BY

GROUP BY Queries
How many cities in each country?

What is Filesort?
The truth is, filesort is badly named. Anytime a sort can’t
be performed from an index, it’s a filesort. It has nothing
to do with files.

Temporary tables I
MySQL creates temporary tables when query uses:
⚪GROUP BY
⚪Range + ORDER BY
⚪Some other expressions

2 types of temporary tables:
⚪MEMORY
⚪On-disk

Temporary tables II
First MySQL create temporary table in memory

MySQL configuration variables:
tmp_table_size
⚪maximum size for in Memory temporary tables
max_heap_table_size
⚪Sets the maximum size for MEMORY tables

Temporary tables III

Indexes: Theory
●MySQL choose one best index per table.
●Supports combined index.
●Order of Fields in combined index matter.
●MySQL can use leftmost part of any index.
●MySQL can use index to satisfy GROUP BY/ORDER BY

Questions

References
https://www.youtube.com/watch?v=TPFibi2G_oo
https://www.youtube.com/watch?v=zeRqU0SlJa4
https://www.youtube.com/watch?v=83YSEiwje1A (sound/slides syncing problem, but good video.)
https://www.percona.com/live/mysql-conference-2013/sites/default/files/slides/Percona_Conference_2013_Query_Optimization.pdf
https://www.percona.com/live/mysql-conference-2015/sites/default/files/slides/PL_2015_Query_Optimization.pdf
https://www.percona.com/files/presentations/percona-live/london-2011/PLUK2011-b-tree-indexes-and-innodb.pdf
https://web.stanford.edu/class/cs245/homeworks/b+tree/readings/MySQL_InnoDB.B+Tree.Reading1.pdf
http://blog.jcole.us/2013/01/10/btree-index-structures-in-innodb/
http://www.quora.com/What-are-the-differences-between-B+Tree-and-B-Tree
https://www.percona.com/blog/2009/09/19/multi-column-indexes-vs-index-merge/
https://guptavikas.wordpress.com/2012/12/17/b-tree-index-in-mysql/
http://stackoverflow.com/questions/870218/b-trees-b-trees-difference
https://www.percona.com/blog/2009/03/05/what-does-using-filesort-mean-in-mysql/
http://loveforprogramming.quora.com/Memory-locality-the-magic-of-B-Trees
https://en.wikibooks.org/wiki/Data_Structures/Trees#Binary_Search_Trees
Tags