Database Systems - A Historical Perspective

244 views 51 slides Jun 22, 2023
Slide 1
Slide 1 of 51
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51

About This Presentation

Database Systems - A Historical Perspective

Investigate the history of database systems, from early days to the cloud area.

Covers RDBMS (SQL), NoSQL and other systems.

Topics Covered

* Historical databases
* Relational databases
* Non-relational databases
* Future directions


Slide Content

Database Systems
A Historical Perspective
_
Károly Kálmán
June 22, 2023

Topics Covered
Historical databases
Relational databases
Non-relational databases
Future directions

Historical Databases

Historical Databases (No Database)
All data is stored in memory
It's a start
✔ Fast
✔ Store anything in any format
✖ No persistent and durable storage

Historical Databases (Flat File)
Ted Scott ▫ $100 ▫ Apple ☷ Ai Joe ▫ $900 ▫ Peach ☷
◺ ◿ ↑ ↑
field │ │
value │ └─ record separator
└─ field separator
✔ Persistent
✔ Store anything (records can be different)
✖ Low-level access, programmer needed
✖ Complex queries are hard and slow
➤ Today: for small data sets in some domains

Historical Databases (Hierarchical)
CTO
╱ ╲
Head1 Head2
╱ ╲
Mngr1 Mngr2
✔ Defined structure
✔ Faster than flat file
✖ Navigation through the hierarchy only (up-down)
✖ "Programmer perspective" needed
➤ Today: LDAP, Active Directory

Historical Databases (Navigational)
John ── Alice ── Maggie Rob
│ │ │ │
Richard ── Scott Susie ── Nancy
✔ Relaxed navigation
✔ Very fast
✖ Still pre-determined navigation (no ad hoc queries)
✖ "Programmer perspective" needed
➤ Today: IBM Information Management System v15

Relational Databases

Relational Database Management System (RDBMS)
E. F. Codd in 1970 (IBM)
Relational model of data
Based on formal (math) rules
Optimal database design (NF)
Data access optimization
User friendly
Very popular
MySQL, Oracle, MS SQL, Sybase, MS Access, etc.

RDBMS Concepts (Table)
Database = tables + table cross-reference + keys

RDBMS Concepts (Keys)
Primary Key (PK) is an unique identifier for an entity
Keys are needed to make relations

Structured Query Language (SQL)
SQL = Structured Query Language
ANSI Standard
Declarative language
Focus on what to do, not how to do
User friendly
Abstractions for non-programmers
English like language
Pure SQL applications (MS Access)
Not fancy, but no programming needed

Structured Query Language (Table Operations)
Create table
CREATE TABLE families (f_name char, s_name char, id int);
Modify table (add column)
ALTER TABLE families ADD child_name char;
Delete table
DROP TABLE families;
DROP TABLE = NoSQL :)

Structured Query Language (Data Operations)
Insert new data
INSERT INTO families VALUES ("Philip", "Zimmer", 3);
Query for data
SELECT f_name, s_name FROM families WHERE child_count > 2;
Modify existing data
UPDATE families SET f_name ="Jonas" WHERE f_name = "Jhn";

Structured Query Language (Transaction)
Transaction
Multiple operations treated as a single unit of work
Either all operations succeed or all fail
Example
BEGIN TRANSACTION
INSERT INTO families VALUES ("Philip", "Zimmer", 3);
INSERT INTO families VALUES ("Hans", "Vogler", 347);
END TRANSACTION

ACID Model
ACID defines who sees what changes and when
ACID transaction control properties
Atomic: operations succeed or roll-back (state before)
Consistent: database is in correct state when trans. finished
Isolated: transactions do not disturb/effect another
4 isolation levels (speed vs consistency)
Durable: results are permanent, even if error'd

Typical 3-tier System Architecture

RDBMS Drawbacks
Scaling is hard (ACID)
Expensive
'Free' solutions are not mature for 9...9%
Non-structured data is hard to store
NoSQL for rescue
For majority of uses RDBMS is just enough

Distributed Databases
Multiple database servers
Data duplicated
Performance/availability increases
But complexity too

Distributed Databases (Replicas)
Master-Slave
Master serves r/w and replicates data to slaves
Slaves serve reads only
Master-Master
Multiple masters that serve r/w
Replication between masters

Distributed Databases (Sharding and Federation)
Sharding
Break data into smaller chunks by key
Store chunks on different servers
Federation
Databases by domain functions
No single monolith database
Query impact (linking tables)

Data Warehouse TODO!!!!
Current and historical data
Store structured data (schema)
Query focused (Business Analytic)
Large and central data store

Data Mart TODO!!!!
Specific views by business departments
Based on data warehouse
Multiple data marts, not a single monolith
More summarized than data warehouse

Data Lake TODO!!!!
Central location for all data
Store raw data (no schema)
Purpose of data is not defined
Data science

Data Pipeline TODO!!!!
Process to move or transform data between systems
...

Data Mesh TODO!!!!
Architectural pattern
Data ownership and distribution
Analytical data (optimizing the business)
Historical and aggregated view
Operational data (running the business)
Current and transactional state

Non-Relational Databases
(NoSQL)

NoSQL Databases
Schema/structure definition is optional
Store anything (mix data in collections)
Need to know major use cases before design
Performance
Very good for expected use cases
Bad for unexpected use cases
Varied transaction support (event-cons, quorum)
Query language complexities
Scalable distributed systems

Consistency Models TODO!!!!
When reader sees a system change TODO!!!!
Weak
Reader might or might not (at all) see the change
Eventual
Reader will see the change sometime
Strong
Reader sees the change immediately

CAP Theorem
Eric Brewer (~1997)
CAP theorem (Reliability)
Consistency: a read receives the most recent data or an error
Availability: a request receives a (non-error) response with
(maybe old) data
Partition tolerance: system operates when network is not
reliable
Choose two (but P shall be a must)
Some systems support configurable CAP modes

BASE Model
Similar to ACID, but for NoSQL
BASE model properties
Basically Available: system guarantees availability
Soft state: system state may change over time, even with no
input
Eventual consistency: system will be consistent over a period of
time, if no input received

NoSQL Databases (Historical)
XML Database
Wasteland
Object Store
Programmer's database

NoSQL Databases (Key-Value)
123 ↠ firstName = "Arthur" ⌁ surName = "Legend"
8874 ↠ color = "Black ⌁ make = "Ford"
Very Fast
Simple to use
Access by keys only
Caches (Infinispan, Redis, Memcached, Ignite, etc.)

NoSQL Databases (Document I.)
Store JSON structured data
Documents can have different fields
{ ⌲ Document 1 Start
name: { ⌲ Complex field
first: "John" ⌲ Simple field
last: "Dee"
}
birth: "2/2/1982" ⌲ Document 1 field (only)
} ⌲ Document 1 End
{ ⌲ Document 2 Start
fullName : ⌲ Simple field
"James Doe"
} ⌲ Document 2 End

NoSQL Databases (Document II.)
Effective document (text) store
Free-text search engine
Documents are JSON based
Various query format
Varied transaction support (single doc.)
Couchbase, Elasticsearch, MongoDB, etc.

NoSQL Databases (Wide Column I.)
Rows (keys) with many (~1000) columns
Write optimized (call logs, bank transactions, etc.)
SQL like query language
Limited ACID support
Heavy weight systems
HBase, Cassandra, etc.

NoSQL Databases (Wide Column II.)

NoSQL Databases (Graph I.)
Based on directed graph
Nodes, properties and relations
Replacement for complex relational models
High level query language
ACID transactions
Neo4j (Cypher), GraphDB (SparQL), etc.

NoSQL Databases (Graph II.)

NoSQL Databases (Time I.)
Data points (measurement) over time interval
Regular intervals (metrics)
Irregular intervals (events)
Data is more useful as aggregate (continuous queries)
SQL like query language with time related additions
No transaction concept
PK is time in high precision
Data modification is rare (append only)
InfluxDB, Kdb+, Prometheus, etc.

NoSQL Databases (Time II.)
Example measurement:
weather,location=us-midwest temperature=82 144488740
| ─────────┬────────── ──┬─────────── |
measurement tag field timestamp
measurement ≈ table
tag ≈ indexed field
field ≈ not indexed field

NoSQL Databases (Computing Grid)
Calculations performed in a computing grid
Move program logic to data, not the other way around
Ignite, Infinispan, etc.

NoSQL Drawbacks
Operational/developer experience needed
Complex Infrastructure
Planned usage drives database design
Data de-normalization might be needed (!)
ACID/BASE compliance varies
Complex queries can be hard
Large distributed systems are always in the state of partial failure

>> Distributed systems are hard <<

NoSQL and RDBMS
Term NoSQLRDBMS
ConsistencyWeak Strong
PerformanceVariesGood
Language CustomSQL
DevOps ComplexSimpler
Node Count>3 1
Scalability Good Poor

>> Use whatever is the best for the problem <<

Popular Database Combinations (2019)

A Distributed System Example

Future Directions
Cloud (hosted database)
Hybrid (multiple NoSQL modes)
NewSQL (SQL ↔ NoSQL convergence)
NIC/RDMA (cross memory access)
RAM Store (very fast)
FPGA (custom hardware)

Thank You!
Questions?

github.com/kk-sw
slideshare.net/kksw1/presentations
linkedin.com/in/károly-k-java/
kksw.nhely.hu