Relational and Non-Relational Database Dr. Zainab Namh Lect. 9 Relational vs Non – Relational Database
Introduction to NoSQL? 2
NoSQL NoSQL is a non-relational DMS, that does not require a fixed schema, avoids joins, and is easy to scale. NoSQL database is used for distributed data stores with humongous data storage needs. NoSQL is used for Big data and real-time web apps. For example, companies like Twitter, Facebook, Google that collect terabytes of user data every single day. NoSQL database stands for "Not Only SQL" or "Not SQL“ Though a better term would NoREL . Carl Strozz introduced the NoSQL concept in 1998. 3
What is Big Data? A term for data sets that are so large that traditional methods storage and processing are inadequate. Massive increase in data volume within the last decade or so Social networks, search engines, etc. Challenges in storage, capture, analysis, transfer, etc. 4
Advantages and Limitations of RDBMS Better for relational data Normalization Well known standard language (SQL) ACID Compliance However No flexibility in choosing the right DB for the right application Expensive if runs across a cluster Integrity is more important than availability and low latency 5
Why NoSQL? The concept of NoSQL databases became popular with Internet giants like Google, Facebook, Amazon, etc. who deal with huge volumes of data. The system response time becomes slow when you use RDBMS for massive volumes of data. To resolve this problem, we could "scale up" our systems by upgrading our existing hardware. This process is expensive. The alternative for this issue is to distribute database load on multiple hosts whenever the load increases. This method is known as "scaling out." 6
Horizontal vs Vertical scaling 7
Features of NoSQL Non-relational NoSQL databases never follow the relational model Never provide tables with flat fixed-column records Work with self-contained aggregates Doesn't require data normalization No complex features like query languages, referential integrity joins and ACID Schema-free NoSQL databases are either schema-free or have relaxed schemas Do not require any sort of definition of the schema of the data Offers heterogeneous structures of data in the same domain 8
NoSQL – Data Structure 9
Features of NoSQL Distributed Multiple NoSQL databases can be executed in a distributed fashion Offers auto-scaling and fail-over capabilities Often ACID concept can be sacrificed for scalability and throughput 10
What is the CAP Theorem? CAP theorem is also called brewer's theorem. It states that is impossible for a distributed data store to offer more than two out of three guarantees: Consistency Availability Partition Tolerance As soon as you partition a DB the data might either: Become inconsistent – one replication holds different data from another Become unavailable – to keep consistency, we must propagate updates, which takes time 11
What is the CAP Theorem? Consistency: The data should remain consistent even after the execution of an operation. This means once data is written, any future read request should contain that data. For example, after updating the order status, all the clients should be able to see the same data. 12
What is the CAP Theorem? Availability: The database should always be available and responsive. It should not have any downtime. Partition Tolerance: Partition Tolerance means that the system should continue to function even if the communication among the servers is not stable. For example, the servers can be partitioned into multiple groups which may not communicate with each other. Here, if part of the database is unavailable, other parts are always unaffected. 13
Eventual Consistency - BASE BASE: B asically A vailable, S oft state, E ventual consistency Basically available means DB is available all the time as per CAP theorem Soft state means even without an input; the system state may change Eventual consistency means that the system will become consistent over time 14
Eventual Consistency Example If you pay some money into your bank account, you don’t want some cash machines to know its there and others not to But if you post something on Facebook, you might put up with the fact that a friend in the UK can see it for a moment while an American friend, looking at a slightly older replication cannot. He’ll see it eventually 15
ACID vs BASE 16
NoSQL database types There are mainly four categories of NoSQL databases. Each of these categories has its unique attributes and limitations: Key-value Pair Based Column-oriented Graphs based Document-oriented No specific database is better to solve all problems. You should select a database based on your product needs. 17
Key Value Store NoSQL Database Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy load. Key-value pair storage databases store data as a hash table where each key is unique, and the value can be a JSON (JavaScript Object Notation), BLOB(Binary Large Objects), string, etc. For example, a key-value pair may contain a key like "Website" associated with a value like "Guru99". When we try and reflect back on the CAP theorem, it becomes quite clear that key value stores are great around the Availability and Partition aspects but definitely lack in Consistency. 18
Example 19
Column-based NoSQL Database Column stores in NoSQL are similar at first appearance to traditional relational DBMS. The concepts of rows and columns are still there. You also define column families before loading data into the database, meaning that the structure of data must be known in advance. However, column stores organize data differently than relational databases do. Instead of storing data in a row for fast access, data is organized for fast column operations. They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as the data is readily available in a column. 20
Column-based NoSQL Database 21
Document Store NoSQL Database Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is stored as a document. The document is stored in JSON or XML formats. 22
NoSQL - Example 23
NoSQL - Example 24
Graph-Based NoSQL Database A graph type database stores entities as well the relations amongst those entities. The entity is stored as a node with the relationship as edges. An edge gives a relationship between nodes. Every node and edge has a unique identifier. 25
Disadvantages of NoSQL No standardization rules Limited query capabilities RDBMS databases and tools are comparatively mature It does not offer any traditional database capabilities, like consistency when multiple transactions are performed simultaneously. When the volume of data increases it is difficult to maintain unique values as keys become difficult Doesn't work as well with relational data 26
NoSQL Example – Review DB Product: iPhone 5 Price: £500 Camera: Fine Screen: Very good Accessories: Headphone, Case, .... Product: iPhone 5 Price: £500 Camera: Excellent Screen: Poor in sunshine Operating system: Easy to use 27
Example Review in JSON { Id:847543, Name:iPhone5, Features:[ GPS,Retina Display,Siri ], Reviews:[ {Reviewer:458743,Date:12.4.1013,Speed:Slow}, {Reviewer:636534,Date:2.5.1013,Camera:Great}, ] } 28
Possible Review DB Design For our product review database, potential units to aggregate are: – Users – keep all a person’s reviews together – Products – keep all the reviews of a product together 29
Review Queries? What are Common Queries? Which design you choose depends on a few things, for example, what is the most common query? – List all reviews from a given customer – List all reviews of a given product – Find all products that are ‘fast’ 30
Transaction Concurrency ACID Recovery 31
ACID- Atomic, Consistent, Isolated, Durable Atomicity: A transaction is an atomic unit of processing; it is either performed in its entirety or not performed at all. Consistency preservation: A transaction is consistency preserving if its complete execution take(s) the database from one consistent state to another. Isolation: A transaction should appear as though it is being executed in isolation from other transactions. That is, the execution of a transaction should not be interfered with by any other transactions executing concurrently. Durability or permanency: The changes applied to the database by a committed transaction must persist in the database. These changes must not be lost because of any failure. 32
What is a Transaction? A transaction is a logical unit of database processing (or a logical unit of work on the database) that includes one or more database access operations: – insertion, – deletion, – modification, or – retrieval operations. 33
What is a Transaction? If the database operations in a transaction do not update the database but only retrieve data, the transaction is called a read-only transaction. Read-only transaction - do not changes the state of a database, only retrieves data. A database system interprets a transaction not as an application program but a logical sequence of read and write operations. 34
Concurrent and Interleaving Multiprogramming operating systems execute some commands from one process, then suspend this process and execute some commands from the next process, and so on. This type of process execution is called interleaving. After a while, the execution of the first process is resumed at the point where it was suspended. Interleaving keeps the CPU busy when a process requires an input or output operation. The CPU is switched to execute another process rather than remaining idle during I/O time. Interleaving also prevents a long process from delaying other processes. 35
Why Interleave? 36
Why concurrency control is needed? 37
Why Recovery Is Needed? Whenever a transaction is submitted to a DBMS for execution, the system is responsible for making sure that either (1) all the operations in the transaction are completed successfully and their effect is recorded permanently in the database, or (2) the transaction has no effect whatsoever on the database or on any other transactions. For the purpose of recovery and concurrency control, we are mainly interested in the read_item and write_item operations of the transactions, as well as the commit and abort operations. 38
Successful Schedule 39
References https://www.guru99.com/nosql-tutorial.html Introduction to NoSQL – Kevin Swingler – 2017. Advanced Database Lectures – Prof. Musbah Aqel – 2011. 40