Data Storage by Immanuel Trummer part of DBMS

samimuhammadumar 15 views 28 slides Oct 19, 2024
Slide 1
Slide 1 of 28
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28

About This Presentation

Data Storage by Immanuel Trummer part of DBMS


Slide Content

Data Storage
Immanuel Trummer

[email protected]
www.itrummer.org

Slides by Immanuel Trummer, Cornell University
Database Management
Systems (DBMS)
DBMS Interface
Application 1
Application 2
...
Data
Connections, Security, Utilities, ...
Query Processor
Query Parser Query Rewriter
Query Optimizer Query Executor
Storage Manager
Data Access Buffer Manager
Transaction Manager Recovery Manager
[RG, Sec. 7, 8]

Slides by Immanuel Trummer, Cornell University
Outline
•Data storage hardware
•What devices to store data on?
•Data storage format
•How to represent relations?

Slides by Immanuel Trummer, Cornell University
Outline
•Data storage hardware
•What devices to store data on?
•Data storage format
•How to represent relations?

Slides by Immanuel Trummer, Cornell University
The Memory Hierarchy
Registers
CPU Cache
Main Memory
Flash/USB Memory
Hard Disk
Tape Backup
Faster Access
Higher Capacity

Slides by Immanuel Trummer, Cornell University
The Memory Hierarchy
Registers
CPU Cache
Main Memory
Flash/USB Memory
Hard Disk
Tape Backup
Faster Access
Higher Capacity
Volatile
Non-Volatile

Slides by Immanuel Trummer, Cornell University
Tape Storage
•Bits as magnetic information on tape
•Very slow access (10s of seconds)
•Moderate read speed (up to 300 MB/second)
•Very cheap (around $0.02 per Gigabyte [source])
•Used for long-term archival (e.g., by Google)
•More info: Why the future of data storage is (still)
magnetic tape, IEEE Spectrum, 2018.
Photo: Victor Prado

Slides by Immanuel Trummer, Cornell University
Hard Disk
•Bits as magnetic information on platter
•Patters spin under read/write heads
•Slow access (10s of milliseconds access time)
•Moderate read speed (around 200 MB/second)
•Cheap (around $0.035 per Gigabyte)
•Used for less frequently accessed data
Photo: Wikimedia Commons

Slides by Immanuel Trummer, Cornell University
Solid State Drives
•Bits as small electric charges
•Elevated price (around $0.25 per Gigabyte)
•Fast access (around 1 millisecond)
•Elevated speed (around 500 MB/second)
•Limited number of write cycles (memory wear)
Photo: Wikimedia Commons

Slides by Immanuel Trummer, Cornell University
Main Memory
•Bits as small electric charges
•Expensive (several dollars per Gigabyte)
•Very fast access (order of nanoseconds)
•High bandwidth (Gigabytes per second)
•Used to access hot data - all if economically feasible!
Photo: Wikimedia Commons

Slides by Immanuel Trummer, Cornell University
Caches
•Bits as small electric charges
•Typically organized as cache hierarchy
•Very expensive (hundreds of dollars per Gigabyte)
•Near-instantaneous access (few nanoseconds)
•Very high bandwidth (tens of Gigabytes per second)
•Used to store immediately relevant data
Photo: Wikimedia Commons

Slides by Immanuel Trummer, Cornell University
Relevance for DBMS
•Capacity limits force data to lower parts of hierarchy
•Data access speed may become bottleneck
•Design algorithms to minimize data movements
•Random data access is expensive
•Read data in larger chunks ("pages")
•Keep related data close together
•Take into account volatility for recovery considerations

Slides by Immanuel Trummer, Cornell University
Outline
•Data storage hardware
•What devices to store data on?
•Data storage format
•How to represent relations?

Slides by Immanuel Trummer, Cornell University
Logical Perspective vs.
Physical Storage
DBMS
User
110101...

Slides by Immanuel Trummer, Cornell University
Tables as Files
•Table schema information is stored in database catalog
•Table content is stored as collection of pages ("file")
•Each page typically stores a few KB of data
•Enough to store multiple rows but not entire table

Slides by Immanuel Trummer, Cornell University
Illustration: Table Storage
Database Catalog
Page 1
Page 2
Page 3

Slides by Immanuel Trummer, Cornell University
From Files to Pages
•Possibility 1: store pages as (doubly) linked list
•Each page contains pointers to next/prior page
•Can use separate lists for full/partially empty pages
•Reference to header page stored in DB catalog
•Possibility 2: directory with pointers to pages
•Directory pages reference data pages with meta-data

Slides by Immanuel Trummer, Cornell University
Files as Linked Lists
Header
Page
Data Page Data Page Data Page
Data Page Data Page Data Page
Full

Pages
Pages with 

Free Space

Slides by Immanuel Trummer, Cornell University
Files via Directories
Header
Page
Directory
Page
Directory
Page
Data PageData Page
Data PageData Page
Data Page
Data Page
Data PageData PageData Page

Slides by Immanuel Trummer, Cornell University
From Pages to Slots
•Pages are divided into slots
•Each slot stores one record (i.e., table row)
•Can refer to records via (pageID, slotID)
•Multiple ways to divide pages into slots
•Fixed-length vs. variable-length records

Slides by Immanuel Trummer, Cornell University
Fixed-Length Records
•Number of bytes per slot is determined a-priori
•Need to keep track of which slots are used (insertions ...)
•Packed representation uses consecutive slots
•Only keep track of number of slots used
•Unpacked representation allows unused slots in-between
•Need bitmap to keep track of used slots

Slides by Immanuel Trummer, Cornell University
Packed Representation
USED
USED
USED
FREE
FREE
FREE
FREE
FREE
FREE
3

Slides by Immanuel Trummer, Cornell University
What's the Problem
with Deletions ... ?

Slides by Immanuel Trummer, Cornell University
Unpacked Representation
USED
FREE
USED
FREE
USED
FREE
FREE
USED
FREE
101010010

Slides by Immanuel Trummer, Cornell University
Variable-Length Records
•E.g., records with variable-length text fields
•Number of bytes per slot is not fixed a-priori
•Each page maintains directory about used slots
•Store first byte and length of slots
•Flexibility to move around records on page
•Can use that for regular compaction

Slides by Immanuel Trummer, Cornell University
From Slots to Fields
•Must divide each slots into fields
•Fixed length vs. variable length fields
•Fixed length: store field sizes in DB catalog
•Variable length: store field sizes on page
•Option 1: use special delimiter symbol between fields
•Option 2: store "field directory" at beginning of record

Slides by Immanuel Trummer, Cornell University
Summary: Files
•Decomposing 

tables into pages, 

pages into slots, 

slots into fields
•Variable length versus fixed length content
•Variable length content can be handled via directories

Slides by Immanuel Trummer, Cornell University
Row Stores vs. 

Column Stores
•So far: have seen how to store data "row-wise"
•I.e., data for same row is close together
•This is done by traditional DBMS like Postgres
•Can also store data "column-wise"
•I.e., data for same column is close together
•Can help if queries access only few columns
•Will see corresponding systems later ...