Data partitioning Prepared by Vinod – Architect – Crestron electronics
Data Store N Data Store 1 Data Store 1 Application N …. Application 1 Application 1 Why partition data ? The design of the data stores that an application uses can have a significant bearing on the performance, throughput, and scalability of a system Application Data Store Store Retrieve Traditional Model Application 1 Data Store 1 Store Retrieve Large Scale Systems Physically Partitioned Data stores This is not the same as SQL Server Table Partitioning
Benefits of partitioning data
Designing partitions
Partitioning strategies NOTE: all three strategies described here can be combined
Horizontal partitioning ( sharding ) PartitionKey
Vertical partitioning
Functional partitioning
Issues and considerations Minimize cross-partition data access operations Consider replicating static data in all of the partitions to reduce the requirement for a separate lookup operation in different partition Additional cost associated with synchronizing any changes that might occur to reference data ( static data ) M inimize requirements for referential integrity across vertical and functional partitions E valuate whether strong consistency is actually a requirement C ommon approach in the cloud is to implement eventual consistency When using a horizontal partitioning strategy, consider periodically rebalancing the shards
Data Partitioning – Elastic Database
Horizontal partitioning with Elastic Database Single SQL DB Limitations
Horizontal partitioning with Elastic Database Shard N Data Store 1 Data Store 1 Shard 1 Single Large SQL Database Splitted Into
Shard Each shard is implemented as a SQL database A shard can hold more than one dataset Dataset is also referred as Shardlet E ach database maintains metadata that describes the shardlets that it contains A shardlet can be a single data item, or it can be a group of items that share the same shardlet key Sharding data in a multi-tenant application, the shardlet key could be the tenant ID and all data for a given tenant would be held as part of the same shardlet
global shard-map manager It is a separate SQL database Global Shard-Map Manager
global shard-map manager Client Application Global Shard-Map Manager Shard N Data Store 1 Data Store 1 Shard 1 Global Shard-Map Manager Splitted Into Get a copy of the shard-map (listing shards and shardlets ) 1 Cache shard-map data locally 2 Connect to appropriate shard 3 NOTE: R eplicate the global shard-map manager database to reduce latency and improve availability
schemes for mapping data to shardlets
List Shard Map
Range Shard Map
Hybrid sharding
Things to consider while partitioning Avoid operations that need to access data held in multiple shards Azure SQL Database does not support cross-database joins The data stored in shardlets that belong to the same shard map should have the same schema Transactional operations are only supported for data held within the same shard, and not across shards Place shards near to the users that access the data in those shards (geo-locate shards). This strategy will help to reduce latency . Currently, only a limited set of SQL data types are supported as shardlet keys; int , bigint , varbinary , and uniqueidentifier Elastic Database provides a separate Split/Merge service NOTE: Although Azure SQL Database does not support cross-database joins, the Elastic Database API enables you to perform cross-shard queries that can transparently iterate through the data held in all the shardlets referenced by a shard map
Partitioning strategies for Azure Storage
Azure Storage
Azure storage redundancy
Partitioning Azure table storage All entities are stored in a partition Partitions are managed internally by Azure table storage All entities within a partition are sorted lexically, in ascending order, by row key The partition key/row key combination must be unique for each entity and cannot exceed 1KB in length