Data models in NoSQL

2,246 views 29 slides Mar 15, 2021
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

Data Models in NoSQL


Slide Content

Dr. Dipali P. Meher MCS, M.Phil , NET, Ph.D Modern College of Arts, Science and Commerce, Ganeshkhind , Pune 16 [email protected]/[email protected] DATA MODELS

NOSQL had ability to run databases on large cluster . When size of data increases it will become difficult to scale up with data – always we need to buy bigger server as data increases. One solution to this is to run the databases on cluster of servers. Running databases on server increases complexity of databases

2 paths for DATA DISTRIBUTION Replication and Sharding Replication

Data Distribution Sharding

Replication Replication takes the same data and copies it over multiple nodes. Replication  copies data across multiple servers, so each bit of data can be found in multiple places

Replication : Two Forms master-slave and peer-to-peer

Replication: Master Slave Replicate data across multiple nodes One node is designated as the master , or primary . MASTER It is the authoritative source for the data It is usually responsible for processing any updates to that data. other nodes are slaves, or secondaries . A replication process synchronizes the slaves with the master. It can be appointed manually or automatically. SLAVE A replication process synchronizes the slaves with the master . After a failure of the master, a slave can be appointed as new master very quickly.

MASTER Can be appointed automatically or manually Manually: Manual appointing typically means that when you configure your cluster, you configure one node as the master. Automatically: you create a cluster of nodes and they elect one of themselves to be the master. automatic appointment means that the cluster can automatically appoint a new master when a master fails, reducing downtime.

Replication: Master Slave Replication

Pros and cons of Master-Slave Replication PROS More read requests Add more slave nodes Ensure that all read requests are routed to slaves Should the master fail, the slaves can still handle read requests Good for datasets with a read intensive dataset ( read resilience) CONS The master is a bottleneck Limited by its ability to process updates and to pass those updates on Its failure does eliminate the ability to handle writes until: the master is restored or a new master is appointed Inconsistency due to slow propagation of changes to the slaves Bad for datasets with heavy write traffic

Read resilience More and More Read requests In order to get read Resilience  user has to ensure that read and write paths in your application are different. In case of failure in write path that can be handled separately and read can continue Read Path Write Path

Master slave replication is good for read resilience. Does not scale well for write resilience. It also faces bottleneck problem for updates (write requests). To solve above issues peer-to-peer replication is there.

Replication: peer-to-peer All the replicas have equal weight, All replicas process write requests Loss of any one replica does not prevent access to data store.

Of any ode will fail then working is continued with other nodes. i.e user can ride over node failures without losing access to data. Nodes can be easily added to improve the performance(complications may increase).

Complications in peer-to-peer Biggest complication: consistency When you can write to two different places you run the risk that two people will attempt to update the same record at the same time—a write-write conflict . Inconsistencies on read lead to problems but at least they are relatively transient. Inconsistent writes are forever.

Peer-to-peer replication

Peer-to-peer replication we can ensure that whenever we write data, the replicas coordinate to ensure we avoid a conflict. We don’t need all the replicas to agree on the write , just a majority , so we can still survive losing a minority of the replica nodes. we can decide to cope with an inconsistent write.

Sharding A busy data store is busy because different people are accessing different parts of the dataset. In these circumstances we can support horizontal scalability by putting different parts of the data onto different servers —a technique that’s called sharding

Sharding Sharding puts different data on different nodes

Sharding : Ideal Case We have different users all talking to different server nodes. Each user only has to talk to one server, so gets rapid responses from that server. The load is balanced out nicely between servers—for example, if we have ten servers, each one only has to handle 10% of the load .

we have to ensure that data that’s accessed together is clumped together on the same node and that these clumps are arranged on the nodes to provide the best data access . How to clump the data?  using aggregate aggregates are used to m to combine data that’s commonly accessed together—so aggregates leap out as an obvious unit of distribution .) To increase performance of aggregates: 1) physical location where aggregates are stored is important 2) Even loading: try to arrange aggregates so they are evenly distributed across the nodes which all get equal amounts of the load

Replication and sharding are orthogonal techniques: You can use either or both of them

Many NoSQL databases offer auto- sharding , where the database takes on the responsibility of allocating data to shards and ensuring that data access goes to the right shard. This can make it much easier to use sharding in an application.

Sharding is particularly valuable for performance. It can improve both read and write performance. A way to horizontally scale writes.

Combining Sharding and Replication If we use both master-slave replication and sharding , this means that we have multiple masters, but each data item only has a single master Depending on your configuration, you may choose a node to be a master for some data and slaves for others, or you may dedicate nodes for master or slave duties.

Combining Sharding and Replication

Combining Sharding and Replication It is good for column-family databases. Example : tens or hundreds of nodes in a cluster with data sharded over them. A good starting point for peer-to-peer replication is to have a replication factor of 3, so each shard is present on three nodes. Should a node fail, then the shards on that node will be built on the other nodes

Peer to peer replication with sharding

Thank You