SlidePub
Home
Categories
Login
Register
Home
General
Database management system chapter thirt
Database management system chapter thirt
cscmalligawad
11 views
32 slides
Sep 27, 2024
Slide
1
of 32
Previous
Next
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
About This Presentation
Dbms
Size:
725.63 KB
Language:
en
Added:
Sep 27, 2024
Slides:
32 pages
Slide Content
Slide 1
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 1
Slide 2
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Chapter 13
Disk Storage, Basic File Structures,
and Hashing
Slide 3
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 3
Chapter Outline
n Disk Storage Devices
n Files of Records
n Operations on Files
n Unordered Files
n Ordered Files
n Hashed Files
n Dynamic and Extendible Hashing Techniques
n RAID Technology
Slide 4
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 4
Disk Storage Devices
n Preferred secondary storage device for high
storage capacity and low cost.
n Data stored as magnetized areas on magnetic
disk surfaces.
n A disk pack contains several magnetic disks
connected to a rotating spindle.
n Disks are divided into concentric circular tracks
on each disk surface.
n Track capacities vary typically from 4 to 50 Kbytes
or more
Slide 5
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 5
Disk Storage Devices (contd.)
n A track is divided into smaller blocks or sectors
n because it usually contains a large amount of information
n The division of a track into sectors is hard-coded on the
disk surface and cannot be changed.
n One type of sector organization calls a portion of a track that
subtends a fixed angle at the center as a sector.
n A track is divided into blocks .
n The block size B is fixed for each system.
n Typical block sizes range from B=512 bytes to B=4096 bytes.
n Whole blocks are transferred between disk and main
memory for processing.
Slide 6
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 6
Disk Storage Devices (contd.)
Slide 7
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 7
Disk Storage Devices (contd.)
n A read-write head moves to the track that contains the
block to be transferred.
n Disk rotation moves the block under the read-write head for
reading or writing.
n A physical disk block (hardware) address consists of:
n a cylinder number (imaginary collection of tracks of same
radius from all recorded surfaces)
n the track number or surface number (within the cylinder)
n and block number (within track).
n Reading or writing a disk block is time consuming
because of the seek time s and rotational delay (latency)
rd.
n Double buffering can be used to speed up the transfer of
contiguous disk blocks.
Slide 8
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 8
Disk Storage Devices (contd.)
Slide 9
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 9
Typical Disk Parameters
(Courtesy of Seagate Technology)
Slide 10
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 10
Records
n Fixed and variable length records
n Records contain fields which have values of a
particular type
n E.g., amount, date, time, age
n Fields themselves may be fixed length or variable
length
n Variable length fields can be mixed into one
record:
n Separator characters or length fields are needed
so that the record can be “parsed.”
Slide 11
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 11
Blocking
n Blocking:
n Refers to storing a number of records in one block
on the disk.
n Blocking factor (bfr) refers to the number of
records per block.
n There may be empty space in a block if an
integral number of records do not fit in one block.
n Spanned Records:
n Refers to records that exceed the size of one or
more blocks and hence span a number of blocks.
Slide 12
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 12
Files of Records
n A file is a sequence of records, where each record is a
collection of data values (or data items).
n A file descriptor (or file header) includes information that
describes the file, such as the field names and their data
types, and the addresses of the file blocks on disk.
n Records are stored on disk blocks.
n The blocking factor bfr for a file is the (average) number
of file records stored in a disk block.
n A file can have fixed-length records or variable-length
records.
Slide 13
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Blocking Factor Calculation
n Name: 16 char 16B
n Age: int 4B
n Major: 4 char 4B
n GPA: float 4B
n Record size: 28B
n Block size: 4096B
n Bfr: floor (4096/28) = floor (146.28) = 146
Slide 13- 13
Slide 14
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 14
Files of Records (contd.)
n File records can be unspanned or spanned
n Unspanned: no record can span two blocks
n Spanned: a record can be stored in more than one block
n The physical disk blocks that are allocated to hold the
records of a file can be contiguous, linked, or indexed.
n In a file of fixed-length records, all records have the same
format. Usually, unspanned blocking is used with such
files.
n Files of variable-length records require additional
information to be stored in each record, such as
separator characters and field types.
n Usually spanned blocking is used with such files.
Slide 15
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 15
Operation on Files
n Typical file operations include:
n OPEN: Readies the file for access, and associates a pointer that will refer to a
current file record at each point in time.
n FIND: Searches for the first file record that satisfies a certain condition, and
makes it the current file record.
n FINDNEXT: Searches for the next file record (from the current record) that
satisfies a certain condition, and makes it the current file record.
n READ: Reads the current file record into a program variable.
n INSERT: Inserts a new record into the file & makes it the current file record.
n DELETE: Removes the current file record from the file, usually by marking the
record to indicate that it is no longer valid.
n MODIFY: Changes the values of some fields of the current file record.
n CLOSE: Terminates access to the file.
n REORGANIZE: Reorganizes the file records.
n For example, the records marked deleted are physically removed from the file
or a new organization of the file records is created.
n READ_ORDERED: Read the file blocks in order of a specific field of the file.
Slide 16
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 16
Unordered Files
n Also called a heap or a pile file.
n New records are inserted at the end of the file.
n A linear search through the file records is
necessary to search for a record.
n This requires reading and searching half the file
blocks on the average, and is hence quite
expensive.
n Record insertion is quite efficient.
n Reading the records in order of a particular field
requires sorting the file records.
Slide 17
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 17
Ordered Files
n Also called a sequential file.
n File records are kept sorted by the values of an ordering field .
n Insertion is expensive: records must be inserted in the correct order.
n It is common to keep a separate unordered overflow (or
transaction) file for new records to improve insertion efficiency;
this is periodically merged with the main ordered file.
n A binary search can be used to search for a record on its ordering
field value.
n This requires reading and searching log
2
of the file blocks on the
average, an improvement over linear search.
n Reading the records in order of the ordering field is quite efficient.
Slide 18
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Handling Overflow in Seq Files
n 1) rewrite the file from then on down (on avg ½
file) for each insertion
n 2) Have an overflow area (heap/pile file)
n Do binary search on sequential file
n If not found
n Do linear search in overflow file
n Efficient because sequential file is >> overflow file
n 10,000,000 in sq fi, 1,000 in overflow
n Periodically, reorganize: sort overflow and merge
to create larger seq file
Slide 13- 18
Slide 19
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Overflow
n Can append overflow records at end
n Bookeeping in config file or header to keep track of
where sorted area ends and unsorted overflow
starts
n Preallocate blank areas between records
n Record
n NewRecord
n AnotherNewRecord
n Record2
If no blanks available; rewrite file (rec, bl, rec, bl, …)
Slide 13- 19
Slide 20
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 20
Ordered Files (contd.)
Slide 21
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 21
Average Access Times
n The following table shows the average access
time to access a specific record for a given type
of file
Slide 22
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Example
n Block: 4096B; Rec_Size: 28B;
n Bfr: floor (4096/28) = 146 records/block
n If 100,000 records
n Numblocks = ceiling (100,000/146) = 685 blocks
n Linear search = ceiling(685/2) = 343 block reads
Binary Search = ceiling (log
2
685) = 10 block reads
n If 10,000,000 records
n Numblocks = ceiling (10,000,000/146) = 68,494
n Linear search = ceiling(68,494/2) = 34,247 block
reads
n Binary Search = ceiling (log
2
685) = 17 block reads
Slide 13- 22
Slide 23
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 23
Hashed Files
n Hashing for disk files is called External Hashing
n The file blocks are divided into M equal-sized buckets, numbered
bucket
0
, bucket
1
, ..., bucket
M-1
.
n Typically, a bucket corresponds to one (or a fixed number of) disk
blocks.
n One of the file fields is designated to be the hash key of the file.
n The record with hash key value K is stored in bucket i, where i=h(K),
and h is the hashing function.
n Search is very efficient on the hash key.
n Collisions occur when a new record hashes to a bucket that is already
full.
n An overflow file is kept for storing such records.
n Overflow records that hash to each bucket can be linked together.
Slide 24
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 24
Hashed Files (contd.)
n There are numerous methods for collision resolution, including the
following:
n Open addressing: Proceeding from the occupied position
specified by the hash address, the program checks the
subsequent positions in order until an unused (empty) position is
found.
n Chaining: For this method, various overflow locations are kept,
usually by extending the array with a number of overflow
positions. In addition, a pointer field is added to each record
location. A collision is resolved by placing the new record in an
unused overflow location and setting the pointer of the occupied
hash address location to the address of that overflow location.
n Multiple hashing: The program applies a second hash function if
the first results in a collision. If another collision results, the
program uses open addressing or applies a third hash function
and then uses open addressing if necessary.
Slide 25
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 25
Hashed Files (contd.)
Slide 26
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 26
Hashed Files (contd.)
n To reduce overflow records, a hash file is typically
kept 70-80% full.
n The hash function h should distribute the records
uniformly among the buckets
n Otherwise, search time will be increased because
many overflow records will exist.
n Main disadvantages of static external hashing:
n Fixed number of buckets M is a problem if the
number of records in the file grows or shrinks.
n Ordered access on the hash key is quite inefficient
(requires sorting the records).
Slide 27
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 27
Hashed Files - Overflow handling
Slide 28
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 28
Dynamic And Extendible Hashed
Files
n Dynamic and Extendible Hashing Techniques
n Hashing techniques are adapted to allow the dynamic
growth and shrinking of the number of file records.
n Both build a directory on top of the hash table buckets
n Both dynamic and extendible hashing use the binary
representation of the hash value h(K) in order to access
a directory.
Slide 29
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Dynamic Hashing
n Build a binary search tree on top of the hash table
n Each node in search tree points to a fixed size hash file
n As insertions cause number of buckets to
increase, grow the directory by adding nodes
n i.e., instead of a search tree based on first 2 bits of
the hash key (4 nodes), expand to a search tree
based on first 3 bits of the hash key (8 nodes)
Slide 13- 29
Slide 30
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Extendible Hashing
n In extendible hashing the search directory is an array of size 2
d
where d is called the global depth.
n i.e., if you index into the array with 2 bits, there are 2
2
elements in
the array (4)
n Each element points to a hashtable
n Expand by using first 3 bits of hash key 2
3
elements in
the array (8)
Slide 13- 30
Slide 31
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 31
Dynamic And Extendible Hashing
(contd.)
n The directories can be stored on disk, and they expand or
shrink dynamically.
n Directory entries point to the disk blocks that contain the
stored records.
n An insertion in a disk block that is full causes the block to
split into two blocks and the records are redistributed
among the two blocks.
n The directory is updated appropriately.
n Dynamic and extendible hashing do not require an
overflow area.
n Linear hashing does require an overflow area but does
not use a directory.
n Blocks are split in linear order as the file expands.
Slide 32
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13- 32
Extendible Hashing
Tags
Categories
General
Download
Download Slideshow
Get the original presentation file
Quick Actions
Embed
Share
Save
Print
Full
Report
Statistics
Views
11
Slides
32
Age
432 days
Related Slideshows
22
Pray For The Peace Of Jerusalem and You Will Prosper
RodolfoMoralesMarcuc
32 views
26
Don_t_Waste_Your_Life_God.....powerpoint
chalobrido8
33 views
31
VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf
JaiJai148317
31 views
14
Fertility awareness methods for women in the society
Isaiah47
30 views
35
Chapter 5 Arithmetic Functions Computer Organisation and Architecture
RitikSharma297999
27 views
5
syakira bhasa inggris (1) (1).pptx.......
ourcommunity56
29 views
View More in This Category
Embed Slideshow
Dimensions
Width (px)
Height (px)
Start Page
Which slide to start from (1-32)
Options
Auto-play slides
Show controls
Embed Code
Copy Code
Share Slideshow
Share on Social Media
Share on Facebook
Share on Twitter
Share on LinkedIn
Share via Email
Or copy link
Copy
Report Content
Reason for reporting
*
Select a reason...
Inappropriate content
Copyright violation
Spam or misleading
Offensive or hateful
Privacy violation
Other
Slide number
Leave blank if it applies to the entire slideshow
Additional details
*
Help us understand the problem better