Data partitioning

VinodWilson 2,019 views 26 slides Sep 30, 2017
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

Data partitioning


Slide Content

Data partitioning Prepared by Vinod – Architect – Crestron electronics

Data Store N Data Store 1 Data Store 1 Application N …. Application 1 Application 1 Why partition data ? The design of the data stores that an application uses can have a significant bearing on the performance, throughput, and scalability of a system Application Data Store Store Retrieve Traditional Model Application 1 Data Store 1 Store Retrieve Large Scale Systems Physically Partitioned Data stores This is not the same as SQL Server Table Partitioning

Benefits of partitioning data

Designing partitions

Partitioning strategies NOTE: all three strategies described here can be combined

Horizontal partitioning ( sharding ) PartitionKey

Vertical partitioning

Functional partitioning

Issues and considerations Minimize cross-partition data access operations Consider replicating static data in all of the partitions to reduce the requirement for a separate lookup operation in different partition Additional cost associated with synchronizing any changes that might occur to reference data ( static data ) M inimize requirements for referential integrity across vertical and functional partitions E valuate whether strong consistency is actually a requirement C ommon approach in the cloud is to implement eventual consistency When using a horizontal partitioning strategy, consider periodically rebalancing the shards

Data Partitioning – Elastic Database

Horizontal partitioning with Elastic Database Single SQL DB Limitations

Horizontal partitioning with Elastic Database Shard N Data Store 1 Data Store 1 Shard 1 Single Large SQL Database Splitted Into

Shard Each shard is implemented as a SQL database A shard can hold more than one dataset Dataset is also referred as Shardlet E ach database maintains metadata that describes the shardlets that it contains A shardlet can be a single data item, or it can be a group of items that share the same shardlet key Sharding data in a multi-tenant application, the shardlet key could be the tenant ID and all data for a given tenant would be held as part of the same shardlet

global shard-map manager It is a separate SQL database Global Shard-Map Manager

global shard-map manager Client Application Global Shard-Map Manager Shard N Data Store 1 Data Store 1 Shard 1 Global Shard-Map Manager Splitted Into Get a copy of the shard-map (listing shards and shardlets ) 1 Cache shard-map data locally 2 Connect to appropriate shard 3 NOTE: R eplicate the global shard-map manager database to reduce latency and improve availability

schemes for mapping data to shardlets

List Shard Map

Range Shard Map

Hybrid sharding

Things to consider while partitioning Avoid operations that need to access data held in multiple shards Azure SQL Database does not support cross-database joins The data stored in shardlets that belong to the same shard map should have the same schema Transactional operations are only supported for data held within the same shard, and not across shards Place shards near to the users that access the data in those shards (geo-locate shards). This strategy will help to reduce latency . Currently, only a limited set of SQL data types are supported as shardlet keys;  int , bigint , varbinary ,  and  uniqueidentifier  Elastic Database provides a separate Split/Merge service NOTE: Although Azure SQL Database does not support cross-database joins, the Elastic Database API enables you to perform cross-shard queries that can transparently iterate through the data held in all the shardlets referenced by a shard map

Partitioning strategies for Azure Storage

Azure Storage

Azure storage redundancy

Partitioning Azure table storage All entities are stored in a partition Partitions are managed internally by Azure table storage All entities within a partition are sorted lexically, in ascending order, by row key The partition key/row key combination must be unique for each entity and cannot exceed 1KB in length

Table storage

Thank You