Scaling MongoDB with Horizontal and Vertical Sharding
MyDBOPS
213 views
39 slides
Apr 05, 2023
Slide 1 of 39
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
About This Presentation
How to effectively scale your MongoDB database using horizontal and vertical sharding techniques in this presentation discover the differences, choose the right strategy, and optimize your configuration for maximum performance and scalability. The presentation presented by Manosh Malai, CTO at Mydbo...
How to effectively scale your MongoDB database using horizontal and vertical sharding techniques in this presentation discover the differences, choose the right strategy, and optimize your configuration for maximum performance and scalability. The presentation presented by Manosh Malai, CTO at Mydbops
Size: 11.57 MB
Language: en
Added: Apr 05, 2023
Slides: 39 pages
Slide Content
Scaling MongoDB with Horizontal
and Vertical Sharding
Manosh Malai
CTO, Mydbops LLP
01st April 2023
MongoDB User Group Bangalore
Interested in Open Source technologies
Interested in MongoDB, DevOps & DevOpSec Practices
Tech Speaker/Blogger
CTO, Mydbops LLP
Manosh Malai
About Me
Consulting
Services
Managed
Services
Focuses on MySQL, MongoDB and PostgreSQL
Mydbops Services
Database Sharding
Database sharding is the process of storing a large database across
multiple machines
WHEN TO SHARD ?
When To Shard - I
Size of Data: If your database is becoming too large to fit on a single server,
sharding may be necessary to distribute the data across multiple servers.
Performance: Sharding can improve query performance by reducing the amount
of data that needs to be processed on a single server.
When To Shard - II
Scalability: Sharding enables you to horizontally scale out your MongoDB
database by distributing data across multiple nodes.
Availability and Redundancy: Sharding can improve query performance
by reducing the amount of data that needs to be processed on a single
server.
When To Shard - III
Availability: Sharding can improve the overall availability of your database
by providing redundancy across multiple nodes.
Flexibility: Sharding enables you to distribute data across multiple nodes
based on your specific requirements.
When To Shard - IV
Cost-effectiveness: Sharding can be a cost-effective way to scale out
your database. Rather than purchasing expensive hardware to support a
single, monolithic database.
Type Of Sharding
Vertical
Sharding
Horizontal
Sharding
Vertical Sharding Strategy - Pros
Different data access patterns:
▪Vertical sharding may be useful when different table are accessed at different frequencies or
have different access patterns.
▪By splitting these tables into different shards, the performance of queries that only need to
access a subset of columns can be improved.
Better data management:
▪Vertical sharding can provide better control over data access, as sensitive or confidential data
can be stored separately from other data. This can help with compliance with regulations such
as GDPR or HIPAA.
Vertical Sharding Strategy - Cons
Data Interconnectedness:
▪Vertical sharding may not be the best solution for databases with heavily interconnected data. If
there is a need for complex joins or queries across multiple columns, horizontal sharding or
other scaling strategies may be more appropriate.
Limited Scalability:
▪Only Suitable for Small or Medium data size.
How We Can Achieve Vertical Sharding?
▪Service Discovery
▪Consul
▪Etcd
▪ZooKeeper
▪Data Sync
▪Mongopush
▪mongosync
▪mongodump&mongorestore
Vertical Sharding Strategy
Vertical Sharding: Service Discovery and Data Migration
▪Use Consul to dynamically discover the nodes in your MongoDB cluster and route traffic to them accordingly.
▪Mongopush sync the data from X1 Cluster to X2 Cluster
Type Of Sharding
Vertical
Sharding
Horizontal
Sharding
Will MongoDB Support Horizontal Sharding?
What MongoDB Horizontal Sharding and Its Components
Each shard contains a subset of the sharded data
Mongos
Cong Server
Shards
Shard Key
Collection Shard Key
Divide and distribute collection evenly using shard key
The shard key consists of a field or fields that exists in the every document in a collection
MongoDB Shard Key
IO Scheduler
Range Sharding
Hash Sharding
Zone Sharding
Pros Cons
▪Even Data Distribution
▪Even Read and Write Workload
Distribution
•Range queries likely trigger
expensive
•broadcast operation
Pros Cons
▪Even Data Distribution
▪Target Operation for both single
and ranged queries
▪Even Read and Write Workload
Distribution
•Susceptible to the selection and
usage of good shard key that used
in both read and write queries
Pros Cons
•Isolate a specific subset of data on
the specific set of shards
•Data geographically closet to
application servers
•Data tiering and sla's based on
shard hardware
•Susceptible to the selection and
usage of good shard key that used
in both read and write queries
Shard Key Indexes
Single-eld Ascending Index
Single-eld Hashed Index
Compound Ascending Index
Compound Hashed Index
Declare Shard Key
sh.shardCollection("db.test", {"fieldA" : 1, "fieldB": "hashed"}, false/true, {numInitialChunks: 5, collation: { locale: "simple" }})
sh.shardCollection(namespace, key, unique, options)
▪When the collection is empty, sh.shardCollection() generates an index on the shard key if an index for that
key does not already exist.
▪If the collection is not empty, you must create the index first before using sh.shardCollection()
▪It is not possible to have a shard key index that indicates a multikey index, text index, or geospatial index on
the fields of the shard key.
▪MongoDB can enforce a uniqueness constraint on ranged shard key index only.
▪In a compound index with uniqueness, where the shard key is a prefix
▪MongoDB ensures uniqueness across the entire key combination, rather than individual components of the
shard key.
Shard Key Improvement After MongoDB v4.2
WITHOUT PREFIX COMPRESSION
Mutable Shard key value (v4.2)
Renable Shard Key (v4.4)
Compound Hashed Shard Key (v4.4)
Live Resharding(v5.0)
What and Why Refinable Shard Key (v4.4)
Shard Key: customer_id
Rening Shard
Key db.adminCommand({refineCollectionShardKey:
database.collection, key:{<existing Key>, <New Suffix1>: <1|""hashed">,...}})
21%
15%
64%
Shard A Shard B Shard C
▪Refine at any time
▪No Database downtime
Refining a collection's shard key
improves data distribution and resolves
issues caused by insufficient cardinality
leading to jumbo chunks.
Refinable Shard Key (v4.4)
Shard Key: vehical_no
Rening Shard
Key db.adminCommand({refineCollectionShardKey: "mydb.test", key:
{vehical_no: 1, user_mnumber: "hashed"}})
Avoid changing the range or hashed type for any existing shard key fields, as it can lead to
inconsistencies in data. For instance, refrain from changing a shard key such as { vehicle_no: 1 }
to { vehicle_no: "hashed", order_id: 1 }.
▪For refining shard keys, your cluster must have a version of at least 4.4 and a feature compatibility version of 4.4.
▪Retain the same prefix when defining the new shard key, i.e., it must begin with the same field(s) as the existing
shard key.
▪When refining shard keys, additional fields can only be added as suffixes to the existing shard key.
▪To support the modified shard key, it is necessary to create a new index.
▪Prior to executing the refineCollectionShardKey command, it is essential to stop the balancer.
▪sh.status to see the status
Guidelines for Refining Shard Keys
Compound Hashed Shard Key (v4.4)
21%
15%
64%
Shard A Shard B Shard C
Existing Shard Key: vehical_no
New Shard Key: vehical_no, user_mnumber
sh.shardCollection( "test.order", {"vehical_no": 1, "user_mnumber": "hashed"})
sh.shardCollection( "test.order", {"vehical_no": "hashed", "user_mnumber": 1})
▪Overcome Monotonicall
increase key
Live Resharding(v5.0)
Resharding without downtime
Any Combinations Change
Compound Hash Range
RangeRange
Range Hash
Resharding Process Flow
▪Before starting a sharding operation on a collection of 1 TB size, it is recommended to have a minimum of
1.2 TB of free storage.
▪I/O: Ensure that your I/O capacity is below 50%.
▪CPU load: Ensure your CPU load is below 80%.
Rewrite your application's queries to use both the current shard key and the new shard key
rewrite your application's queries to use the new shard key without reload
Monitor the resharding process, use a $currentOp pipeline stage
Deploy your rewritten application
Resharding Who's Donor and Recipients
•Donor are shards which currently own chunks of the sharded collection
•Recipients are shards which would own chunks of the sharded collection according to the new
shard key and zones
Resharding Internal Process Flow
Commit Phase
Clone, Apply, and Catch-up
Phase
Index Phase
Initialization PhaseThe balancer determines the new data distribution for the sharded collection.
A new empty sharded collection, with the same collection options as the original one, is
created by each shard recipient.
This new collection serves as the target for the new data written by the recipient shards.
Each shard recipient builds the necessary new indexes.
•Each recipient of a shard makes a copy of the initial documents that it would be
responsible for under the new shard key
•Each shard recipient begins applying oplog entries from operations that happened after the
recipient cloned the data.
•When all shards have reached strict consistency, the resharding coordinator commits
the resharding operation and installs the new routing table.
•The resharding coordinator instructs each donor and recipient shard primary,
independently, to rename the temporary sharded collection. The temporary collection
becomes the new resharded collection
•Each donor shard drops the old sharded collection.
To summarize, what issue does this feature resolve?
•Jumbo Chunks
•Uneven Load Distribution
•Decreased Query Performance Over Time by Scatter-gather queries
Reach Us : [email protected]
Thank You
Database End Of The Life
MySQL 5.7 31 Oct 2023
MongoDB 4.2 30 April 2023
MongoDB 4.4 29 Feb 2024
PostgreSQL 11 9 Nov 2023