Database Sharding: Complete understanding

servicesNitor 112 views 13 slides May 24, 2024
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

Discover how database sharding https://bityl.co/Q6F3 can transform your application's performance by distributing data across multiple servers in our latest blog. With insights into key sharding techniques, you'll further learn how to implement sharding effectively and avoid common pitfalls....


Slide Content

\
Software Engineering   |      30 Apr 2024   |     19 min
Database Sharding: Everything You Need to Know
Shubham Alavni
Senior Software Engineer
    
Shubham Alavni is a Senior Software Engineer at Nitor
Infotech. With a seasoned expertise in web
development, he has left his mark on diverse... Read
More
Imagine this: your latest application is booming with daily active users, more features are
being added, and data seems to pile up by every second. Although this may sound like a
great success but deep down, it’s not as your database performance can be hampered. So,
to keep up with the data load and other bottlenecks, Database Sharding stands out as the
best solution.
In this blog, I’ll provide a clear understanding of database sharding, its architecture, and
advantages. Apart from these, you’ll also get a peek into real-life scenarios and use cases
where database sharding shines.
Let’s get started with the basics!
Understanding Sharding
Sharding, derived from the term “shard,” signifies a fraction of a complete entity, and is a
technique used in database management. It involves the division of a large database into
smaller, more manageable units, a process also known as “horizontal scaling” (something
you will explore in a while). This approach involves splitting the rows of a single table into
distinct tables known as “shards.”
Despite maintaining identical schema and columns, each shard houses different rows,
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
Accept

Cookie policy

ensuring that the data within each shard remains unique and non-overlapping. This method
effectively addresses the constraints of a single database by segmenting the data into
smaller portions and dispersing them across multiple database servers.
For example, it’s like having smaller buckets instead of one big bucket to carry water – each
bucket is easier to manage than one large, heavy bucket.
Keep reading to know how database sharding can help you.
Benefits of Database Sharding
Database Sharding offers several advantages:
Improved Scalability: Sharding allows you to add more servers to your database, spreading the
load and enabling more traffic and faster processing. This contrasts with the traditional method of
scaling up, which involves adding more resources to a single server.
Increased Operation Capacity: By distributing your database into multiple shards, you can
increase both read and write operation capacity if the tasks are performed within one shard at a
time.
Expanded Storage Capacity: It also increases the storage capacity of your database, potentially
achieving nearly infinite storage capacity.
High Availability: If one shard goes down, the other shards can still be accessed. Thus, preventing a
total system shutdown.
Onwards to the two different scaling techniques of database sharding.
Techniques of Scaling Database Sharding: Vertical
Partitioning vs Horizontal Partitioning
Here’s a tabular comparison of vertical versus horizontal partitioning.
Aspect Vertical Partitioning Horizontal Partitioning
Criteria Divides a table based on columns.Divides a table based on rows.
Suitability
Useful for tables with many
columns, where some columns are
rarely used.
Useful for table with many rows,
where data can be divided based on
some criteria.
Performance
Improvement
Improves query performance by
reducing I/O and allowing
efficient indexing of relevant
columns.
Improves query performance by
reducing the number of rows to be
scanned for specific queries.
Requirements
May require joins to retrieve data
from multiple partitions.
Joins between partitions are typically
not required because they contain
disjoint sets of rows.
Example
A table with 100 columns, where 20
columns are frequently accessed,
and 80 columns are rarely
accessed.
A table with 1 billion rows, where 300
million rows are accessed frequently,
and 700 million rows are rarely
accessed.
To get you some clarity, here’s how vertical and horizontal partitions would appear in
contrast to the original table:

Fig: Original vs. Vertical vs. Horizontal Partitions
I’m confident that you are clear with initials of database sharing and now you want to learn
more about how it works and how you can use it in the right manner. For that, get the
answers in the next sections!
Architecture of Sharding
After deciding to shard your database, the next step is to determine how to implement it.
This involves the critical process of running queries or distributing incoming data to sharded
tables or databases, ensuring data goes to the appropriate shard to prevent data loss or slow
queries.
In the following section, we will discuss several prevalent sharding architectures:
1. Key-Based Sharding:
This is also known as hash-based sharding. It uses a hash function to distribute data across
shards. A specific data value, such as a user ID, IP address, ZIP code, or Region, is used as
input to the hash function. The output is a Shard ID, which determines where the data will be
stored.
The data value used in the hash function is called the Shard key. The Shard key should be a
static column, like a primary key, to ensure consistent data distribution and efficient update
operations.
Note: However, key-based sharding can complicate the process of adding or removing

database servers. As servers change, the data must be remapped and migrated. This can be
an expensive and time-consuming process, potentially causing system downtime.
Despite its challenges, key-based sharding is popular for evenly distributing data across
shards and minimizing the risk of database hotspots, ensuring balanced workloads.
Fig: Key-Based Sharding
2. Range-Based Sharding:
This is a technique that divides data into shards based on a specific range of values. For
example, in a product’s database, the products could be sharded based on their price
ranges. Products with prices between $0 and $100 could be stored in one shard, while
products with prices between $100 and $200 could be stored in another shard.
ange-based sharding is a simple and straightforward method. Each shard contains a unique

set of data but maintains the same schema as the original database. The application then
decides the range of the data and writes it to the correct shard.
Note: Range-based sharding can cause uneven data distribution, leading to “database
hotspots” where some shards receive more traffic than others. This can result in
performance issues, slow queries, and imbalanced workloads. For example, a shard
containing products with prices between $0 and $100 may receive more traffic than a shard
containing products with prices between $100 and $200.
Fig: Range-Based Sharding
3. Directory-Based Sharding:
This is a database strategy that utilizes a lookup table to dictate data storage locations. It
assigns each key to a specific shard, using the lookup table that contains fixed data location
information. It is adaptable and simplifies the process of adding new shards.
For example, consider a lookup table with columns for Delivery Zone and Shard ID. The
Delivery Zone column serves as the Shard key, directing data from a particular delivery zone
to the corresponding shard ID in the lookup table.
Directory-based sharding provides flexible data distribution, efficient query routing, and
dynamic scalability. It uses a central directory for managing data-to-shard mapping,
optimizing query performance, and enabling efficient load balancing. The system can scale
dynamically by modifying the number of shards, without affecting the application logic. Thus,
easily adapting to changing needs and workloads.
Note: However, it can potentially slow down operations due to lookup table access for each
query or write. It can also create a single point of failure, making the entire database
inaccessible if the lookup table fails. Using a distributed lookup table can mitigate this but
adds system complexity.

Fig: Directory-Based Sharding
Onwards to the use cases!
Use Cases of Database Sharding
Sharding finds common application in the following scenarios:
E-commerce Platforms: These platforms deal with large volumes of product data, customer data,
and order data. Here, harding helps distribute the load across multiple servers and improve
performance.
Social Media Platforms: With billions of users and large amounts of user-generated content,
sharding helps these platforms manage data effectively.
Gaming Platforms: Real-time data management for millions of players in online multiplayer games
benefits from sharding, as it distributes the load and boosts performance.
To get you an in-depth clarity about its use case, let’s look at a particular scenario!
SCENARIO: DATABASE SHARDING FOR SCALABILITY
Envision that you’re architecting a user account management system for an application. To
address scalability and performance challenges, you need to choose to distribute the user
data across multiple database shards.
You can select directory-based sharding, utilizing the country_code as the key attribute for
sharding. The country_code is a three-letter code representing each country. A lookup table
can be used to store the mapping of each country_code to its corresponding shard_id.

Here are the steps that you can follow:
Step 1: Determine the number of shards
Assuming the application is used in 3 countries, we’ll use 3 shards.
Step 2: Lookup table for mapping country_code to shard_id
We’ll create a lookup table to store the mapping of country_code to shard_id. The table will have two
columns: country_code and shard_id.
The country_code column will store the three-letter code for each country.
country_code example: South Korea (KOR), Thailand (THA), and Malaysia (MYS).
country_code shard_id
KOR 1
THA 2
MYS 3
Step 3: Handling the queries
We’ll demonstrate how a user goes through the process of signing up a new user and how to
choose the correct shard based on the country_code of the user.
We will also show how to choose the correct shard to fetch user data from the database while
signing in the user.
Step 4: Basic Implementation of Database Sharding in Ruby on Rails
Framework (6.1+)
1. First, let’s set up our Rails application with multiple databases. In config/database.yml,
we’ll define our shards:

Next, we’ll make changes to the ApplicationRecord class to connect to the primary and
replica databases, and to the Shard model to connect to the shard databases. We’ll also
define a method to choose the correct shard based on the country_code in this manner:

2. When a new user signs up, we need to choose the correct shard based on the user’s
country_code. We can do this by using the connected_to method to connect to the correct
shard and then create the user.
3. To choose the correct shard for sser sign-in, we need to choose the correct shard to fetch
the user data from the database.

It’s important to note that the above is a simplified example. In a real-world scenario, you
would need to consider additional factors such as data consistency, replication, and failover.
After all this explanation, you might be asking yourself – “Should I shard my database or
not?”, right?
Keep reading to know the conditions when you can consider sharding!
Factors to be considered before Sharding
Consider the following factors before deciding to shard your database:
Database Size: Sharding is typically used for large databases that have outgrown the capacity of a
single server.
Traffic Patterns: If your database experiences uneven traffic patterns, sharding may be beneficial.
Growth Projections: If your database is expected to scale significantly in the future, sharding may
be a good option.
Complexity: Sharding adds complexity to your database architecture and requires careful planning
and maintenance.
Cost: Sharding can be expensive, as it requires additional hardware resources and infrastructure to
support multiple servers.
Note: Sharded databases can increase latency by needing a unique service to direct
queries. They can also raise maintenance by requiring upkeep of shards and additional
nodes, along with syncing data updates if replication is used.
So, database sharding has both its perks and challenges, and you can decide if it suits your
application’s needs.
To know more about database management, reach out to us at Nitor Infotech.
Dive into the experience of building world class software products
in 2024.
Download Datasheet
Table of contents
Understanding Sharding
Benefits of Database Sharding
Techniques of Scaling Database Sharding: Vertical Partitioning vs Horizontal Partitioning
Architecture of Sharding

Use Cases of Database Sharding
SCENARIO: DATABASE SHARDING FOR SCALABILITY
Factors to be considered before Sharding
Related Blogs
10+ Tips to Optimize Your Angular Application Performance
BDD: Your secret weapon for building better software 
Product Testing Completeness with GenAI: A Comprehensive Overview
   Previous Blog Next Blog    
Recent Blogs
10+ Tips to Optimize
Your Angular
Application
Performance
Software Engineering
Pandas vs. PySpark:
Comparing Modern
Python Data
Processing
Paradigms
Big Data and Analytics
BDD: Your secret
weapon for building
better software 
Software Engineering

Subscribe to our
fortnightly newsletter!
we'll keep you in the loop with everything that's trending in the tech world.

Nitor Infotech, an Ascendion company, is an ISV preferred IT software product development
services organization. We serve cutting edge Gen-AI powered services and solutions for the
web, Cloud, data, and devices. Nitor’s consulting-driven value engineering approach makes it
the right fit to be an agile and nimble partner to organizations on the path to digital
transformation.
Armed with a digitalization strategy, we build disruptive solutions for businesses through
innovative, readily deployable, and customizable accelerators and frameworks.
COMPANY
About Us
Leadership
PR &
Events
Career
Contact
Us
INSIGHTS
Blogs
Podcast
Videos
TechKnowpedia
INDUSTRIES
Healthcare
BFSI
Retail
Manufacturing
Supply
Chain
TECHNOLOGIES
AI & ML
Generative
AI
Blockchain
Big Data &
Analytics
Cloud &
DevOps
IoT
SERVICES
Idea To MVP
Product
Engineering
Platform
Engineering
Prompt
Engineering
Research As A
Service
Peer Product
Management
Quality
Engineering
Product
Modernization
Mobile App
Development
Web App
Development
UX
Engineering
Cloud
Migration
GET IN TOUCH
900 National Pkwy, Suite 210,
Schaumburg, IL 60173,
USA
[email protected]
+1 (224) 265-7110
     
SUBSCRIBE
Subscribe to our newsletter & stay updated

Enter Email Address

© 2024 Nitor Infotech All rights reserved Terms Of UsagePrivacy PolicyCookie Policy