Demystifying Real time Analytics with TiDB

MyDBOPS 40 views 33 slides Jun 21, 2024
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation

Are you struggling to gain real-time insights from your data?

Mydbops MyWebinar Edition 33 can help you.

Discover how TiDB can revolutionize your analytics game!

Topic: Demystifying Real-Time Analytics with TiDB
Presenter: Kabilesh PR, Founding Partner, Mydbops

In today's data-driven worl...


Slide Content

Demystifying Real Time Analytics With TiDB

Unlocking the Power of TiFlash for Real-Time Data Insights
Kabilesh PR
Co-Founder, Mydbops LLP
33
rd
MyWebinar - Mydbops

About Me
Kabilesh PR
❏Interested in Open Source DB technologies
❏Keen Interest in MySQL, TiDB & Distributed SQL’s
❏Active Tech Speaker/Blogger
❏Pingcap Certified TiDB Professional
❏AWS Database Speciality
❏Founding Partner, Mydbops

Focus on MySQL, MongoDB, PostgreSQL, TiDB, Cassandra
Consulting
Services
Consulting
Services
Managed
Services
24*7
DBA Team
Targeted
Engagement
Mydbops Services

❏Introduction

❏TiDB Architecture

❏Understanding Real-time Analytics

❏Analytical Engine - TiFlash

❏Enabling TiFlash

❏Queries with TiFlash

Agenda

Introduction

TiDB is an Open Source , Distributed HTAP database compatible with MySQL Protocol.
Introduction
2

Understanding TiDB Architecture

MySQL compatible, the TiDB SQL Layer
separates compute from storage to make
scaling simpler,

The Placement Driver functions as a
orchestrator. Responsible for TSO,
scheduling, shard maintenance,
metadata and much more

Tikv is ROW based, Transactional
storage, Offers high-availability, strong
consistency that can auto-scale to
hundreds of node with petabyte data
scale

Advantages of TiDB
Open Source
No Vendor lock-in with a
database that’s 100% open source.
Horizontal Scaling
Grants total transparency into data workloads without
manual sharding.

Horizontal Scaling
Grants total transparency into
data workloads with automatic sharding.
High Availability
Guarantees auto-failover and self-healing for
continuous data access.
MySQL Compatibility
Enjoy the most MySQL compatible
distributed SQL database on the planet.

Multi-Cloud
Deploy database clusters
anywhere in the world.

Mixed Workloads
Streamlined tech stack makes it
easier to produce real-time analytics.
Robust Security
Protect data with enterprise-grade
encryption both in-flight and at-rest.

Global client-Base TiDB

Understanding Real-time Analytics

❏Real-time analytics: Process of analyzing data as it is created, collected, and processed to provide
immediate insights to enable prompt decision-making.
❏Use- Case:
Real-time Fraud detection, Market Analysis, Personalized Recommendations, Demand forecasting
❏Challenges:
Data Volume and Velocity
Integration
Data Quality
Cost

●With TiDB Realtime insights as in when the business happens.
●Easy Integration and maintenance.

Real-Time Analytical Engine = TiFlash

❏An Integrated columnar storage engine built exclusively for analytical workload.
❏It's tightly integrated with TiKV and uses Clickhouse co-processor for providing MPP (Massively
Parallel Processing) analytical queries.

What is TiFlash?

Data Sync with TiFlash

Data Sync to TiFlash is done using the extended Raft-Learner Algorithm

Enabling TiFlash

❏Adding a TiFlash node online won't impact the OLTP workload.
Tiflash_servers:
- host: 10.0.1.10
#tiup cluster scale-out <cluster-name> scale-out-topology.yaml
❏After adding a TiFlash node, replication won’t starts by default.
❏Replication to TiFlash can be at the table level or schema level.
ALTER TABLE table_name SET TIFLASH REPLICA count;
ALTER DATABASE db_name SET TIFLASH REPLICA count;
❏Monitoring of TiFlash replication:
SELECT * FROM information_schema.tiflash_replica;
Enabling TiFlash

Scaling TiFlash

❏Scaling out and scaling in TiFlash nodes is done online and won't impact the OLTP workload.
Nodes Addition:
Tiflash_servers:
- host: 10.0.1.10
- host: 10.0.1.12
#tiup cluster scale-out <cluster-name> scale-out-topology.yaml
Adjust the table / Schema replica count
ALTER TABLE table_name SET TIFLASH REPLICA count;
ALTER DATABASE db_name SET TIFLASH REPLICA count;
Node Removal:
Set the replica count to 0 for table
ALTER TABLE table_name SET TIFLASH REPLICA 0;
#tiup cluster scale-in <cluster-name> --node <tiflash_node_id>


SELECT * FROM information_schema.tiflash_replica;
Scaling TiFlash

Queries With TiFlash

❏TiDB Optimizer automatically determines to use TiFlash replicas based on the COST.
❏This works even in mix of workloads.

Smart Selection

❏You can specify read queries to use replicas of specific engines with TiDB as shown below:
Config file:
[isolation-read]
engines = ["tikv", "tidb", "tiflash"]

SESSION:
set SESSION tidb_isolation_read_engines = "engine list separated
by commas";

Engine Isolation

❏You can force the TiDB to use TiFlash replica as below with manual hint in query.

select /*+ read_from_storage(tiflash[table_name]) */ ... from
table_name;

Manual Hint

TiFlash Modes

❏This mode enables the execution of queries in parallel across multiple nodes.
❏TiDB automatically determines when to select MPP based on the optimizer’s cost estimation.
tidb_allow_mpp ,tidb_enforce_mpp - Control variables

MPP Mode

❏With FastScan, TiFlash provides more efficient query performance but sacrifices the data
consistency.
❏This mode is disabled by default.
❏Query results might include old data of a table.
❏Enable / Disable using tiflash_fastscan


FastScan Mode

Any Questions?

Thank You