Use a data parallel approach to process large volumes of data (typically terabytes or petabytes) known as big data.
Focus on reliability and availability of data
Size: 554.03 KB
Language: en
Added: Apr 30, 2024
Slides: 22 pages
Slide Content
Chapter 4 NoSQL Dr G Sudha Sadasivam Mrs R Thirumahal
Agenda SQL vs NoSQL Limitations and advantages of NoSQL Types of NoSQL Stores with example KV store Column family Document Graph Comparison of NoSQL stores Principles of NoSQL models CAP BASE Polyglot persistence in ecommerce application
Introduction Coined by Carlo Strozz i in 1998 Relational systems have ACID properties, are transactional and hence performance degradation Centralised control rigid schema resulting in lack of flexibility and scalability. NoSQL – Not only SQL Schema less and hence have simple and fast data access Can store voluminous data Can store unstructured data from multiple sources work with large volumes of distributed data. have high operational speed , great flexibility , horizontal scalability BASE properties with eventual consistency Possess shared-nothing architecture Supports auto- shardin g & replication; parallelism & distributed querying NoSQL systems are complementary to SQL systems
s
Limitations cannot be used for transactional applications that have constraints and consistency requirements Being schemaless necessitates use of constraints by app developer Multiple data stores makes interoperability difficult Eventual consistency: changes in data will be updated to all copies with a time lag Vendor lock-in : Each NoSQL data store exists as a silo resulting in high coupling between data store and the application. Lack of expertise in the usage of the NoSQL stores. NoSQL databases suffer from security issues based on authentication, authorization and storage security.
Key-value (KV) stores Associative arrays (dictionary) key-value pairs with unique ordered keys for every value. Good performance, so used for session management and caching RAM as in Memcached or secondary memory as in MemcacheDB . Document stores Organise data as a collection of documents with unique keys. information can be retrieved based on the contents of the document. Collections are analogous to tables & documents to records in a table. every document can have different fields. suitable to manage content and mobile data. MongoDB and Couch DB. Column family stores data is stored in columns instead of rows. columns with different types of data can be aggregated as a column family for querying. HBase and BigTable are column family data stores. Graph data stores Entities in social networks are connected by relationships represented by graphs ---- Neo4j TYPES OF NoSQL STORES
KV Store: Each record is stored in a row &read using RecordReader in HDFS Each attribute is separated by a comma & extracted using a comma separator. Column Family Store Customer Table has 2 col families – Name & Address along with orders with TS Order Table has Price and Item column families Document Store Two collections namely, Customer and Order. Customer has 2 documents (rows) while Order has 3 documents Graph Store: Entities are CustID with Name, Address, OrderID with Price and Items. EXAMPLE RELATIONAL
Logical organization in KV store Physical organization in KV store
KV stores are simple and powerful but cannot process a range of keys. Ordered KV stores can be used, but cannot model values. Column families model values as map-of-map-of-maps in terms of column families, aggregated from columns aggregated from timestamp values. Document stores can model values not only as aggregates but also schema of arbitrary complexity. They also provide indexing based on field names/keys. Graph data stores extend ordered KV systems by linking various keys as a graph rather than a hierarchical model
Comparison
Comparison
CAP Eric Brewer proposed the Consistency, Availability, Partition tolerance (CAP) theory in 2000 Consistency is the ability to obtain same data from multiple replicas. Consistency compliance ensures that all the cluster nodes should have access to the same data. Availability is the ability of a system to continue its operation even when some hardware/software components fail. Partition tolerance is the ability of the system to continue operation a partitioned network due to network failures. It guarantees independence of various data partitions. Replication facilitates the availability of data. Eventual consistency ensures that replicas are not stale. Partitioning ensures load distribution and scalability.
Only 2 can be satisfied at a time AP follows BASE properties with eventual consistency. eg . Amazon’s Dynamo DB without strict consistency CP : ACID properties with strict consistency. Pessimistic locking ensures consistency. eg . MongoDB and MemChache A CA system. CA : cannot operate under network partitions and hence it is neither ACID nor BASE. 2 phase commit protocol is used. For eg Relational and Big table
BASE Web 2.0 applications basically available, soft state and eventually consistent works basically all the time Due to eventual consistency, maintains softstate ACID BASE Atomicity, Consistency, Isolation, Durability Basically Available, Softstate , eventually consistent Strong consistency Weak consistency Consistency and Isolation first Availability first Nested Transactions Approximate Answers Conservative Simple Schema Schema-less
Case Study Polyglot persistence applies multiple data storage technologies to meet the needs of an application. Consider an e-commerce application with shopping cart, inventory, orders, catalogue and customer details. User sessions / activity logs require efficient read/write operations - KV stores 2. Point of Sales high ingestion rate with high volume of write operations. KV stores (storage) ; Column family (analytics) 3 . Shopping cart requires high availability, and aggregates information. Document Store. 4. Product Catalogue has frequent reads and infrequent writes. They must also support aggregation. Document stores 5. Product recommendations are made based on similar products or users. Graph Store Financial data is relational and requires transactional updates – RDBMS
Exercises to be completed Consider the case study of AAA coffee shop in test 1 - Identify the type of NoSQL stores that can be used for each and justify Consider a table with student details (Roll No, First Name, last name, Department, Programme, Year, Semester), and faculty details (FacultyId, First Name, Last Name, Department, Course handled1, Course Handled 2, Course handled 3). Design keyvalue, column family, document & graph databases for the same. Exercises in MongoDB. Create a data base in MongoDB for storing patient and doctor details. Insert patient details and doctor details. Establish connection between doctor and patient. Modify doctor details for a patient. Add 2 /more doctors for a patient named XXX. Identify count of patients under a doctor. If patints count > 4, allot a new dotor to the patient. Allot doctor to patient based on specialisation. If patients to a doctor becomes 0 generate an alert message. If a doctor leaves a hospital, then delete doctor from database, allot a new doctor based on speciality to his / her patients.
Neo4j Create a Neo4j database with 5 people giving their attributed, friendship relations. Create new persons with attaributes. Create relationships, modify relationships. Identify how many friends a person has. Identify friend-of-friend relationships.
Conclusion SQL vs NoSQL Limitations and advantages of NoSQL Types of NoSQL Stores with example KV store Column family Document Graph Comparison of NoSQL stores CAP BASE Polyglot persistence in ecommerce application Exercises in MongoDB & Neo4j