This slide deck provides an introduction to the fundamental concepts and principles of system design. Aimed at aspiring software engineers and professionals looking to strengthen their understanding, the presentation covers key topics such as scalability, reliability, load balancing, caching, CDN, p...
This slide deck provides an introduction to the fundamental concepts and principles of system design. Aimed at aspiring software engineers and professionals looking to strengthen their understanding, the presentation covers key topics such as scalability, reliability, load balancing, caching, CDN, partitioning, indexes, replication etc.
Size: 1.01 MB
Language: en
Added: May 25, 2024
Slides: 40 pages
Slide Content
System Design Basics
Pratyush Majumdar
VP - Technology
91Mobiles
A Domain Name System (DNS) translates a domain name such as www.example.com to an IP address. DNS is
hierarchical, with a few authoritative servers at the top level. Your router or ISP provides information about
which DNS server(s) to contact when doing a lookup. Lower level DNS servers cache mappings, which could
become stale due to DNS propagation delays. DNS results can also be cached by your browser or OS for a
certain period of time, determined by the time to live (TTL).
Some important terms
1.NS (Name Server tells the Internet where to go to find out a domain's IP address)
2.CNAME (Canonical Name record works as an alias for domain names that share a single IP address.
3.A Record (Address record points to the IP address for a given domain name.)
4.MX Records (Mail eXchange record directs emails to a mail exchange server)
5.TXT Record (The TXT record lets a domain admin leave notes on a DNS server)
How Internet Works (DNS)
How Internet Works (DNS)
How Internet Works (DNS)
●Scalability
○Vertical Scaling vs Horizontal Scaling
●Availability/Uptime
○99% ("two nines") allows 3.65 days of downtime per year
○99.9% ("three nines") allows 8.77 hours of downtime per year
○99.99% ("four nines") allows 52.60 minutes of downtime per year
○99.999% ("five nines") allows5.26 minutes of downtime per year
●Durability/Reliability (ability to perform the function without failure)
AWS S3 is designed to provide 99.99% availability and 99.999999999% durability of objects over a given year.
Distributed Systems
Horizontal Scaling vs Vertical Scaling
CAP Theorem
CAP Theorem says that while designing a distributed system we can pick only two of the following three options:
Consistency: All nodes see the same data at the same time. Consistency is achieved by updating several nodes before
allowing further reads.
Availability: Every request gets a response on success/failure. Availability is achieved by replicating the data across
different servers.
Partition Tolerance: The system continues to work despite message loss or partial failure. A system that is
partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data is
sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.
CAP Theorem
Load Balancing (1 of 4)
Load Balancing helps to spread the traffic across a cluster of servers to
improve responsiveness and availability of applications, websites or databases.
A Load Balancer can be software based (Eg. HAProxy, Nginx) or a physical
piece of hardware (Eg. F5). A good Load Balancer can also keep track of the
status of all the servers while distributing requests. If a server is not available
to take new requests or is not responding or has an elevated error rate, the
Load Balancer will stop sending traffic to that server. LB helps to prevent Single
Point of Failure (SPOF).
Load Balancer
Load Balancing (2 of 4)
Health Check: To monitor the health of a backend server, a LB performs “health
checks” regularly, to attempt to connect to the backend servers to ensure that
servers are listening. If a server fails a health check, it is automatically removed
from the pool, and traffic will not be forwarded to it until it responds to the
health checks again. A health check is nothing but a http request to check if the
server is responding with a HTTP Status code of 200, without a timeout.
Load Balancing (3 of 4)
Load Balancing Algorithms:
●Least Connection first
●Least Response Time
●Least Bandwidth
●Round Robin
●Weighted Round Robin
●IP Hash (Stickiness)
Load Balancing (4 of 4)
High Availability of LB: The load balancer can also be a single point of failure;
to overcome this, a second load balancer is connected to the first to form a
Active Passive Cluster. Each LB monitors the health of the other and, since both
of them are equally capable of serving traffic and failure detection, in the event
the primary load balancer fails, the secondary load balancer takes over.
High Availability of Load Balancers
Caching
A cache is like a short- term memory. it has a limited amount of space, but is typically faster
than the original data source and contains the most recently accessed items. It is achieved by
storing the data in RAM. A “cache hit” occurs when data is fetched from cache and a “cache
miss” occurs when data is fetched from backend servers. In case the data is fetched from the
backend, the application must also ensure that the data is cached again for future use. A
good caching design should have over 90% cache hit ratio.
The most common algorithm used in any Caching mechanism is Least Recently Used (LRU).
This algorithm helps to maximise the use of limited space available in RAM. The process of
removing older items from cache is called eviction or invalidation.
Caching
Content Distribution Network (CDN)
CDNs are a kind of cache that comes into play for sites serving large amounts
of static content. In a typical CDN setup, a request will first ask the CDN for a
piece of static content; the CDN will serve that content if it has it locally
available. If it isn’t available, the CDN will query the back-end servers for the file,
cache it locally, and serve it to the requesting user.
A CDN network can consist of multiple servers (called Pop or Edge) spread
over a large geographic area, which can span across the world as well. This
helps the CDN system to deliver the content to the user from the nearest
available location. Akamai and AWS Cloudfront are examples of CDN network.
CDN Concept
CDN Edge Locations
Indian Edge locations: New Delhi, Chennai, Mumbai, Bangalore, Hyderabad, Kolkata
Sharding/Partitioning (1 of 2)
Data partitioning (also known as sharding) is a technique to break up a big database (DB) into
many smaller parts. It is the process of splitting up a DB or table across multiple machines to
improve the manageability, performance, availability, and load balancing of an application.
The justification for data sharding is that, after a certain scale point, it is cheaper and more
feasible to scale horizontally by adding more machines than to grow it vertically by adding
hardware.
1.Partitioning Methods
●Horizontal partitioning
●Vertical Partitioning
●Directory Based Partitioning
Sharding (Figure)
Sharding/Partitioning (2 of 2)
2.Partitioning Criteria
●Key or Hash-based partitioning
●List partitioning
●Round-robin partitioning
●Composite partitioning
3.Common Problems of Sharding
●Joins and Denormalization
●Referential integrity
●Rebalancing
Indexes
A database index is a data structure that improves the speed of data retrieval operations on a
database table at the cost of additional writes and storage space to maintain the index data
structure. Indexes are used to quickly locate data without having to search every row in a
database table every time a database table is accessed. Indexes can be created using one or
more columns of a database table, providing the basis for both rapid random lookups and
efficient access of ordered records.
MySQL has three types of indexes
●INDEX
●UNIQUE (which requires each row to have a unique value)
●PRIMARY KEY (which is just a particular UNIQUE index)
Redundancy and Replication (1 of 2)
Redundancy is the duplication of critical components or functions of a system with
the intention of increasing the reliability of the system, usually in the form of a backup
or fail-safe, or to improve actual system performance. For example, if there is only one
copy of a file stored on a single server, then losing that server means losing the file.
Since losing data is seldom a good thing, we can create duplicate or redundant copies
of the file to solve this problem. Redundancy plays a key role in removing the single
points of failure in the system and provides backups if needed in a crisis. For
example, if we have two instances of a service running in production and one fails, the
system can failover to the other one.
MySQL Redundancy
Redundancy and Replication (2 of 2)
Replication means sharing information to ensure consistency between
redundant resources, to improve reliability, fault-tolerance or accessibility.
Replication is widely used in many database management systems (DBMS),
usually with a master-slave relationship between the original and the copies.
The master gets all the updates, which then replicate through to the slaves. In a
typical setup Master-Slave Reeplication is asynchronous.
MySQL Replication
SQL vs NoSQL (1 of 2)
SQL also referred to as Relational databases have a fixed structure and predefined schemas. NoSQL
also referred to as Non-relational databases are unstructured, distributed and have a dynamic schema.
SQL databases store data in rows and columns. Each row contains all the information about one entity
and each column contains all the separate data points. (Eg. MySQL)
Common types of NoSQL
●Key-Value Store (Eg. Redis)
●Document Database (Eg. MongoDB)
●Wide-Column Database (Eg. Cassandra)
●Graph Database (Eg. Neo4J)
SQL vs NoSQL (2 of 2)
Differences
In most common situations, SQL databases are vertically scalable. We need to
add more Memory and CPU to scale a SQL database. On the other hand NoSQL
databases are horizontally scalable, meaning we can add more servers easily in
our NoSQL database infrastructure.
ACID vs BASE: ACID stands for Atomicity, Consistency, Isolation and Durability.
BASE stands for Basically Available Soft State Eventually Consistent.
Messaging - Queue vs Topics (1 of 2)
In modern cloud architecture, applications are decoupled into smaller, independent
building blocks that are easier to develop, deploy and maintain. Messaging allows
different parts of a system to communicate and process operations asynchronously.
To send a message, a component called a producer adds a message to the queue.
The message is stored on the Queue until another component called a consumer
retrieves the message and does “something” with it. Many producers and consumers
can use the Queue, but each message is processed only once, by a single consumer.
For this reason, this messaging pattern is often called point-to-point.
FIFO Queue
Messaging - Queue vs Topics (2 of 2)
When a message needs to be processed by more than one consumer, message
queues can be combined with Publisher/Subscriber messaging. In a
Publisher/Subscriber model, any message published to a topic is immediately
received by all of the subscribers to the topic. The Pub/sub model allows messages
to be broadcast to different parts of a system asynchronously. To broadcast a
message, a component called a publisher simply pushes a message to the topic. All
components that subscribe to the topic will receive every message that is broadcast.
The subscribers to the message topic often perform different functions, and can each
do “something” different with the message in parallel.
Pub/Sub Topic
Long Polling vs Websockets vs Server-Sent
Events (1 of 3)
Short/Ajax Polling
Polling is a standard technique used by the vast majority of AJAX applications. The basic idea is that the client
repeatedly polls (or requests) a server for data. The client makes a request and waits for the server to respond
with data. If no data is available, an empty response is returned. The problem with Polling is that the client has to
keep asking the server for any new data. As a result, a lot of responses are empty, creating HTTP overhead.
Long Polling
This is a variation of the traditional polling technique that allows the server to push information to a client
whenever the data is available. With Long-Polling, the client requests information from the server exactly as in
normal polling, but with the expectation that the server may not respond immediately. That’s why this technique is
sometimes referred to as a “Hanging GET”.
Long Polling vs Short Polling
Long Polling vs Websockets vs Server-Sent
Events (2 of 3)
Websockets
WebSocket provides Full duplex communication channels over a single TCP
connection. It provides a persistent connection between a client and a server
that both parties can use to start sending data at any time. The client
establishes a WebSocket connection through a process known as the
WebSocket handshake. If the process succeeds, then the server and client can
exchange data in both directions at any time. In this way, a two-way
(bi-directional) ongoing conversation can take place between a client and a
server.
Web Sockets
Long Polling vs Websockets vs Server-Sent
Events (3 of 3)
Server-Sent Events
Under SSEs the client establishes a persistent and long-term connection with the server. The server uses this
connection to send data to a client. If the client wants to send data to the server, it would require the use of
another technology/protocol to do so.
1. Client requests data from a server using regular HTTP.
2. The requested web page opens a connection to the server.
3. The server sends the data to the client whenever there’s new information available.
SSEs are best when we need real-time traffic from the server to the client or if the server is generating data in a
loop and will be sending multiple events to the client.