MySQL NDB Cluster 101

ocklin 327 views 42 slides Feb 04, 2020
Slide 1
Slide 1 of 42
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42

About This Presentation

Introduction into MySQL NDB Cluster 8.0 at MySQL Day before FOSDEM 2020


Slide Content

MySQL Cluster
Proven in serving billions of people
every day when making phone calls,
playing online games or handling
financial transactions.

Massively linear scale
Always-On 99.9999% Availability
Sharded In-Memory Datasets
Always Consistent
Parallel Real-Time Performance.
Auto-partitioning, data distribution
and replication built-in.
Read- and Write Scale-Out
to many TB on commodity hardware.
Designed for mission critical
systems. Masterless, shared-nothing
with no single point of failure.
Transactional consistency across
distributed and partitioned dataset.
Out of the box straightforward
application programming.
Ease of use
Open Source
Written in C++. Can be used standalone
or with MySQL as a SQL front-end.

NDB is the highest throughput
distributed, in-memory, transactional
database in the world.
It’s open source!

Horizontal and vertical scale with MySQL Cluster
15TB in-memory per node

100+TB with on-disk
100+TB to 5 PB

in a single cluster
1 - 2GB/sec read/write 

transactions per shard
400k TPM DBT2

per 1 CPU socket
100+ data nodes
1-4 replicas
10k threads for parallel
query processing

Cluster drives systems with massive scale and real-time answering times
100M trades
analyzed 

every year
20B $ volume
every day
2+B mobile phone
users 

world wide

100+M in a
single installation
30+M active
gamers

every month
12+M in one
multi-player
> > >

MySQL Cluster linear scale
NoSQL Performance
Confidential – Oracle Internal/Restricted/Highly Restricted
Memory optimized tables
Durable
Mix with disk-based tables
Parallel table scans for non-indexed
searches
MySQL Cluster FlexAsych
200M NoSQL Reads/Second
!-!!!!
!50,000,000!!
!100,000,000!!
!150,000,000!!
!200,000,000!!
!250,000,000!!
2!4!6!8!10!12!14!16!18!20!22!24!26!28!30!32!
Reads!per!second!
Data!Nodes!
FlexAsync!Reads!

MySQL Cluster Industries
Telecom
Gaming & Massive Parallel Online Games
Financials

MySQL Cluster Use Cases
•IOT - applications that write massive amounts of data in short periods of
time
•Shopping carts - durable, fast and frequent updates
•User tracking and monitoring
•Authentication and session management systems - real-time user
verification
•User profile and directories - reliable real-time information store
•Messaging systems

Use cluster if you need
Performance
Read and write
scalability
Real-time in-memory
Super low guaranteed
latency
Auto-Sharding
Elastic
Consistency
Always on
Always consistent
ACID
Global consistent view
of the data data
Cross-shard, cross-
replica foreign keys
SQL and NoSQL
Distributed key value
Advanced joins
Parallel distributed
query
Highly Available
Simple HA
Simple programing

MySQL Cluster Usage
•Key Value store

+ High Availability, Scale-Out, Durability
•Transactional object store

+ Multi row transactions and consistency
•Relational database

+ SQL joins, foreign keys, triggers, stored procedures,
generated columns, JSON, GIS, …

Usages

Confidential – Oracle Internal/Restricted/Highly Restricted
Steel Belted Radius Carrier Server
Provides centralized Authentication, Authorization, and
Accounting (AAA or Triple A) management for users who
connect and use a network service.

https://www.juniper.net/documentation/en_US/sbr-carrier8.3.0
Juniper
Radius Server
Switch
Scalable Session State Register
MySQL Cluster

Bredbandsbolaget (B2)
B2 application Nexus to authenticate and grant
customers access to B2 services including
broadband, VoIP and TV broadcasting.
MySQL Cluster for mission critical
authentication and authorisation services.
Up and serving through on-line versions
upgrades and maintenance since 2003!
Real Time Network Operation Platform
Internal Backoffice Systems
Subscriber Profiles
Authentication Services
B2 Core Network

Stream processing in the financial industry
Live trading data
Kafka
Messages
Message
Formatting
Message Event Bus
NDB native
Client processors

Confidential – Oracle Internal/Restricted/Highly Restricted
Hadoop (HopsFS) 

with MySQL Cluster
NameNodes
Leader
HDFS Client
DataNodes
hops.io
ClusterJ
Small
Files

Architecture
Clients
Cluster front-
ends and
connectors
Data Nodes
Partitioning- and distribution engine

Deployment examples
Dev setup
App or MySQL Server
connecting directly
locally to single data
node.
High available setup
Co-located app/MySQL
Server and data nodes
using shared memory
connections.
Classical 3-tier setup
Redundant
SQL and NoSQL mixed
access
Multiple SQL nodes and
native node.js accessing
same datasets

Sharding and redundancy with
no single point of failure
Multiple copies of data are maintained for availability
A group of data nodes shares the same data
1 - 4 replicas

High availability
•Up to 4 replicas per shard
•All replicas active
•Always consistent
•Microsecond failover

High availability
•Majority for 3 and 4 nodes
•Arbitration for 2 replicas
•Proudly using 2-phase

Lets crash an Availability Zone (AD)
•No service 

interruption
source: LogicalClocks, MySQL Innovation Day, January 23, 2019

Synchronous replication locally and asynchronous between physical locations
•With awareness of data locality and availability domains for cloud
Cluster
Data Nodes
Replication
Btw, conflict detection 

and automated resolution.

Data distribution
•Auto-sharding and distribution
No name-node or central master
•Each dataset is split into fragments and
distributed across data nodes
•Within a cluster data is always transactionally
consistent
MySQL Cluster Data Nodes
PK Service Data
253 Tiktok xxx
892 Snapchat xxx
253 Discord xxx
739 Instagram xxx

Data distribution
•Partition of all data to thousands of
virtual partitions
•Virtual partitions distributed to data
nodes

PK Service Data
253 Tiktok xxx
892 Snapchat xxx
253 Discord xxx
739 Instagram xxx

Data distribution
•Deterministic, random distribution by default
•Data in up to 32 data node local partitions
•each controlled by their own threads
•key to parallel query processing
PK Service Data
253 Tiktok xxx
892 Snapchat xxx
253 Discord xxx
739 Instagram xxx

Data distribution and HA
•Each partition is synchronously replicated to
peer data nodes
•ReplicaSets are called Node Group
•All partitions active for read and write
PK Service Data
253 Tiktok xxx
892 Snapchat xxx
253 Discord xxx
739 Instagram xxx
Node Group Node Group

On-line Scaling and Elasticity - Repartitioning
•Virtual partitions re-distributed on-line when adding more data nodes
•Designed to be a slow background process not impacting real-time
performance.

On-line Scaling and Elasticity - Repartitioning
•Minimal amount of data moved
•No re-hashing necessary
•Similar to consistent hashing

Writing data
writes Data Memory
(RAM)
•Memory is locked so it won’t swap

Writing data
•Writes log to disk asynchronously in
background
- key to real-time in-memory transactions
•Partial checkpoints of data memory
- fast node starts and system recovery
Flush writes to disk in
background checkpoints
writes

Data Memory
(RAM)

Writing data
•Writes go to data memory and commit log
•In-memory data reads never involve disks
Flush writes to disk in
background checkpoints
Commit Log (REDO)
writes

time

Data Memory
(RAM)

Data Node
Multiple Threads per data node
Parallel execution of
… multiple queries from …
… multiple users on …
… multiple MySQL Servers
Communication with signals
Goal: minimize context switching
NIC
Main thread
Local
Data
Managers
Receive
Send
Disk/SSD IO
Data Memory
(RAM)

Lock free multi core VM
•Data is partitioned inside the data nodes
•Communication asynchronously, event driven
on cluster’s VM
•No group communication - instead using
Distributed row locks
Non-blocking 2-phase commit
NIC
Main thread
Local
Data
Managers
Receive
Send
Disk/SSD IO
Data Memory
(RAM)

Queries in multithreaded NDB Virtual Machine
•Even a single query from MySQL Server executed in parallel
Data Memory
(RAM)
QUERY …

Massive parallel system executing parallel queries
Receive Send
Transaction Data Manager
Data Node Data Node
Receive Send
Transaction Data Manager

Data distribution awareness
•Key-value with hash on
primary key
•Complemented by
ordered in-memory-
optimised T-Tree
indexes for fast
searches
For PK operations
NDB data partition
is simply calculated
PK Service Data
739 Instagram xxx

Consolidated view of distributed data
•Clients and MySQL
Servers see a
consolidated view of
the distributed data
• Joins are pushed down
to data nodes
• Parallel cross-shard
execution in the data
nodes
• Result consolidation in
MySQL Server
Consolidated view of distributed data

Parallel cross-partition queries
•Parallel execution on the
data nodes and within
data nodes
•64 cpus per node
leveraged
•parallelizes single queries
•144 data nodes 

x 32 partitions 

= 4608! CPUs
+ 32 other processing
threads per node
•automatic batching, event
driven and asynchronous
PK Service Data
253 Tiktok xxx
892 Snapchat xxx
253 Discord xxx
739 Instagram xxx

Parallel cross-partition queries
$ SELECT * FROM services
LEFT JOIN data USING(service)
Data Nodes
Service Data
Snapchat xxx
PK Service
892 Snapchat
PK Service Data
892 Snapchat xxx
… … …
Parallel execution of
single queries
on all data nodes and
within data nodes

MySQL Cluster - matured with advanced features
•Disk tables for even bigger data
•IO control for efficient disk usage
•Local domain / availability domain
•Shared memory connections
•Sub-second failover and self-healing
•Online scaling, maintenance and version upgrades
•Heartbeat mechanisms to support nontrivial network failures
•many more …

Thank You
Bernd Ocklin
Snr Director
MySQL Cluster Development