HAshing ppt to describe tha pplications in DS

Department of CSE & IT
C.V. Raman College of Engineering
Bhubaneswar
1
Data Structure using 'C'
Hashing
Department of CSE and IT

Department of CSE and IT

2

Introduction and Definition

Hash Tables as a Data Structure

Choosing a hash Function

Truncation Method

Mid Square Method

Folding Method

Modular Method

Collision Resolution(Open Hashing)

Separate Chaining

Closed Hashing(Open Addressing)

Linear Probing

Quadratic Probing

Double Hashing

Storage Management

Garbage Collection

Dynamic memory management

Buddy System

Binary Buddy System

Fibonacci Buddy System

Application

Department of CSE and IT3
Introduction to Hashing
 Hashing is the technique used for performing almost constant time search in case of
insertion, deletion and find operation.
 The essence of hashing is to facilitate the next level searching method when
compared with the linear or binary search.
 It is the process of mapping large amount of data item to a smaller table with the help
of a hashing function.
 The advantage of this searching method is its efficiency to hand vast amount of data
items in a given collection (i.e. collection size).
 Due to this hashing process, the result is a Hash data structure that can store or
retrieve data items in an average time disregard to the collection size.

Continued…
Let ‘n’ distinct record Table is one of the important data structure used
for information retrieval.
Records with keys K1,K2-----Kn are stored in a file and we want to
retrieve a record with a given key value K.
One way is to start from the first record by comparing K with its key
value, if found then stop ,otherwise proceed to the next record. The
searching time is directly proportional to the number of records in the
file.
Another way is to use an access table, where the searching time is
significantly reduced.
Department of CSE and IT4

Continued…
Let us assume a function f and apply this function to K, then it returns ‘i’ that
is f(k)=i. The ith entry of the access table gives the location of record with key
value K.
One type of such access table is known as Hash table.
The hash table is an array which contains key values with pointers to the
corresponding records.
The basic idea here is to place a key inside the hash table, and the
location/index of that key will be calculated from the given key value itself.
The one to one correspondence between a key value and index in the hash
table is known as hashing.
Department of CSE and IT5

Department of CSE and IT6
Example: The Search Problem
Find items with keys matching a given search key
Given an array A, containing n keys, and a search key x, find the index i
such as x=A[i]
As in the case of sorting, a key could be part of a large record.

Department of CSE and IT7
Applications
Keeping track of customer account information at a bank
Search through records to check balances and perform transactions
Keep track of reservations on flights
Search to find empty seats, cancel/modify reservations
Search engine
Looks for all documents containing a given word

Department of CSE and IT8
Hash Table
 There are some type of tables which helps to retrieve information in
efficient manner.
The hash table is one of them.
This table is an array of constant size which depends on applications.
The hash table contains key values with pointers to the corresponding
records.
The basic idea is to place key value into location in the hash table and this
location will be calculated from the key value itself.
The one to one correspondence between the key value and index in the
hash table is known as hashing.

Department of CSE and IT9
The ideal hash table structure is merely an array of some fixed size, containing
the items.
A stored item needs to have a data member, called key, that will be used in
computing the index value for the item.
Key could be an integer, a string, etc
e.g. a name or Id that is a part of a large employee structure
The size of the array is TableSize.
The items that are stored in the hash table are indexed by values from 0 to
TableSize – 1.
Each key is mapped into some number in the range 0 to TableSize – 1.
The mapping is called a hash function.
Continued…
A hash table, or a hash map, is a data structure that associates keys
(names) with values (attributes).
• Look-Up Table
• Dictionary
• Cache
• Extended Array

Department of CSE and IT10
Continued…
Example: HASH TABLE
U
(universe of keys)
K
(actual
keys)
0
m - 1
h(k
3
)
h(k
2) = h(k
5)
h(k
1
)
h(k
4)
k
1
k
4k
2
k
5
k
3

Department of CSE and IT11
Choosing a Hash Function
 A hash function maps keys to small integers (buckets). An ideal hash function
maps the keys to the integers in a random-like manner, so that bucket values are
evenly distributed even if there are regularities in the input data.
 This process can be divided into two steps:
 Map the key to an integer.
 Map the integer to a bucket.

Department of CSE and IT12
Example
Hash
Function
mary 28200
dave 27500
phil 31250
john 25000
Items
Hash
Table
key
key
0
1
2
3
4
5
6
7
8
9
mary 28200
dave 27500
phil 31250
john 25000

Department of CSE and IT13
 The hash function:
 must be simple to compute.
 must distribute the keys evenly among the cells.
Different Approaches to generate hash function
1. Truncation Method
2. Mid Square Method
3. Folding Method
4. Modular Method

Department of CSE and IT14
1.Truncation Method
Ignore a part of the key and use the remaining part directly as the index

Ex:1 If a hash table contains 999 entries at the most or 999 different key indexes
may be kept, then a hash function may be defined such that from an eight digit
integer 12345678, first, second and fifth digits from the right may be used to
define a key index i.e. 478, which is the key position in the hash lookup table
where this element will be inserted.
Any other key combination may be used.
Ex:2 If students have an 9-digit identification number, take the last 3 digits as the
table position
e.g. 925371622

becomes

622

Department of CSE and IT15
2. Mid Square Method
In Mid-Square method, the key element is multiplied by itself to yield
the square of the given key.

If the hash table contains maximum 999 entries, then three digits from
the result of square may be chosen as a hash key for mapping the key
element in the lookup table.

 It generates random sequences of hash keys, which are generally key
dependent.

Mid Square method is a flexible method of selecting the hash key
value.
Example: If the input is the number 4567, squaring yields an 8-digit
number, 20857489. The middle two digits of this result are 57. All digits
of the original key value (equivalently, all bits when the number is
viewed in binary) contribute to the middle two digits of the squared
value. Thus, the result is not dominated by the distribution of the
bottom digit or the top digit of the original key value.

Department of CSE and IT16
3. Folding Method
 Partition the key into several parts and combine the parts in a convenient
way (often addition or multiplication) to obtain the index.
 Ex-1:
An eight digit integer can be divided into groups of three, three and two digits
(or any other combination) the groups added together and truncated if
necessary to be in the proper range of indices.

Hence 12345678 maps to 123+456+78 = 657, since any digit can affect the
index, it offers a wide distribution of key values.
 Ex-2: Split a 9-digit number into three 3-digit numbers, and add them
e.g. 925371622 becomes 925 + 376 + 622 = 1923
4. Modular Method
 For mapping a given key element in the hash table, mod operation of
individual key is calculated. The remainder denotes particular address position
of each element.

 The result so obtained is divided by an integer, usually taken to be the size
of the hash table to obtain the remainder as the hash key to place that element
in the lookup table.

Department of CSE and IT17
5. The Division Method
Idea:
Map a key k into one of the m slots by taking the remainder of k divided by
m
h(k) = k mod m
Advantage:
fast, requires only one operation
Disadvantage:
Certain values of m are bad, e.g.,
power of 2
non-prime numbers

Department of CSE and IT18
Example - The Division Method
If m = 2
p
, then h(k) is just the least significant p bits
of k
p = 1  m = 2
 h(k) = , least significant 1 bit of k
p = 2  m = 4
 h(k) = , least significant 2 bits of k
Choose m to be a prime, not close to a
power of 2
Column 2:
Column 3:
{0, 1}
{0, 1, 2, 3}
k mod 97
k mod 100
m
97
m
100

Department of CSE and IT19
Collision
Two or more keys hash to the same slot!!
For a given set K of keys
If |K| ≤ m, collisions may or may not happen, depending on the
hash function
If |K| > m, collisions will definitely happen (i.e., there must be at
least two keys that have the same hash value)
Avoiding collisions completely is hard, even with a good hash
function

Department of CSE and IT20
Collision Resolution(Open Hashing)
If, when an element is inserted, it hashes to the same value as
an already inserted element, then we have a collision and need
to resolve it.
There are several methods for dealing with this:
Separate chaining
Open addressing
Linear Probing
Quadratic Probing
Double Hashing

21
Separate Chaining
The idea is to keep a list of all elements that hash to the same
value.
The array elements are pointers to the first nodes of the lists.
A new item is inserted to the front of the list.
Advantages:
Better space utilization for large items.
Simple collision handling: searching linked list.
Overflow: we can store more items than the hash table size.
Deletion is quick and easy: deletion from the linked list.
Department of CSE and IT
21

22
Example
0
1
2
3
4
5
6
7
8
9
0
81 1
64 4
25
36 16
49 9
Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81
hash(key) = key % 10.
Department of CSE and IT
22

Department of CSE and IT23
Operations
Initialization: all entries are set to NULL
Find:
locate the cell using hash function.
sequential search on the linked list in that cell.
Insertion:
Locate the cell using hash function.
(If the item does not exist) insert it as the first item in the list.
Deletion:
 Locate the cell using hash function.
Delete the item from the linked list.
23

Department of CSE and IT24
24
Handling Collisions Using Separate Chaining
Idea:
Put all elements that hash to the same slot into a linked list
Slot j contains a pointer to the head of the list of all elements that
hash to j

Department of CSE and IT25
Closed Hashing(Open Addressing)
If we have enough contiguous memory to store all the keys (m > N) 
store the keys in the table itself
No need to use linked lists anymore
Basic idea:
Insertion: if a slot is full, try another one,
until you find an empty one
Search: follow the same sequence of probes
Deletion: more difficult ... (we’ll see why)
Search time depends on the length of the
probe sequence!
e.g., insert 14

Department of CSE and IT26
Generalize hash function notation:
A hash function contains two arguments now:
(i) Key value, and (ii) Probe number

h(k,p), p=0,1,...,m-1
Probe sequences
<h(k,0), h(k,1), ..., h(k,m-1)>
Must be a permutation of <0,1,...,m-1>
There are m! possible permutations
Good hash functions should be able to produce all m! probe
sequences
insert 14
<1, 5, 9>
Example

Department of CSE and IT27
Common Open Addressing Methods
Linear probing
Quadratic probing
Double hashing
Note: None of these methods can generate more than m
2
different
probing sequences!

Department of CSE and IT28
Linear probing: Inserting a key
Idea: when there is a collision, check the next available
position in the table (i.e., probing)
h(k,i) = (h
1
(k) + i) mod m
i=0,1,2,...
First slot probed: h
1
(k)
Second slot probed: h
1
(k) + 1
Third slot probed: h
1(k)+2, and so on
Can generate m probe sequences maximum, why?
probe sequence: < h1(k), h1(k)+1 , h1(k)+2 , ....>
wrap around

Department of CSE and IT29
Linear probing: Searching for a key
Three cases:
(1) Position in table is occupied with an element of
equal key
(2) Position in table is empty
(3) Position in table occupied with a different
element
Case 2: probe the next higher index until the
element is found or an empty position is found
The process wraps around to the beginning of the
table
0
m - 1
h(k
3
)
h(k
2
) = h(k
5
)
h(k
1
)
h(k
4
)

Department of CSE and IT30
Linear probing: Deleting a key
Problems
Cannot mark the slot as empty
Impossible to retrieve keys inserted after that slot
was occupied
Solution
Mark the slot with a sentinel value DELETED
The deleted slot can later be used for insertion
Searching will be able to find all the keys
0
m - 1

Department of CSE and IT31
Quadratic Probing
i=0,1,2,...

Department of CSE and IT32
Double Hashing
(1) Use one hash function to determine the first slot
(2) Use a second hash function to determine the increment
for the probe sequence
h(k,i) = (h
1
(k) + i h
2
(k) ) mod m, i=0,1,...
Initial probe: h
1
(k)
Second probe is offset by h
2
(k) mod m, so on ...
Advantage: avoids clustering
Disadvantage: harder to delete an element
Can generate m
2
probe sequences maximum

Department of CSE and IT33
Double Hashing: Example
h
1
(k) = k mod 13
h
2(k) = 1+ (k mod 11)
h(k, i) = (h
1(k) + i h
2(k) ) mod 13
Insert key 14:
h
1(14,0) = 14 mod 13 = 1
h(14,1) = (h
1(14) + h
2(14)) mod 13
= (1 + 4) mod 13 = 5
h(14,2) = (h
1(14) + 2 h
2(14)) mod 13
= (1 + 8) mod 13 = 9
79
69
98
72
50
0
9
4
2
3
1
5
6
7
8
10
11
12
14

Example Linear (Closed) Hashing
D=8, keys a,b,c,d have hash values h(a)=3, h(b)=0, h(c)=4,
h(d)=3
0
2
3
4
5
6
7
1
b
a
c
Where do we insert using linear hashing:
Probe sequence 4%8 = 4
ady fillh
1
(d) = (h(d)+1)%8 = t d? 3 alreed
h
2(d) = (h(d)+2)%8 = 5%8 = 5*
7, 0,1,2
Wraps around the beginning of the table!
d
Already Filled

Department of CSE and IT35
Analysis of Open Addressing
a
1a
(load factor)
k=0

36
 Compilers use hash tables to keep track of declared variables.
A hash table can be used for on-line spelling checkers — if misspelling
detection (rather than correction) is important, an entire dictionary can
be hashed and words checked in constant time.
Game playing programs use hash tables to store seen positions,
thereby saving computation time if the position is encountered
again.
 Hash functions can be used to quickly check for inequality — if
two elements hash to different values they must be different.
Storing sparse data
Applications of Hashing
Department of CSE and IT

Department of CSE and IT37
Storage Management
An executing program uses memory (storage) for many different purposes, such
as for the machine instructions that represent the executable part of the program,
the values of data objects, and the return location for a function invocation.
Static Allocation :
 allocation during translation that remains fixed throughout execution.
 does not allow recursive subprograms
Dynamic Allocation: (Heap Storage Management)

 Memory used for dynamic allocation of data objects in somewhat
unstructured manner is called heap storage.

Memory Management
Memory Areas and their use
Memory Manager Tasks:
acquire
release
Free List Implementations
Singly Linked List
Doubly Linked List
Buddy Systems
38 Department of CSE and IT

Continued…
Memory areas: In languages like C or Java, the memory used by
a program can be allocated from three different areas:
Static: laid out at compilation time, and allocated when the
program starts.
Used for Global variables and constants
Stack: memory is allocated and freed dynamically, in LIFO
order.
Used for Local variables and parameters
Heap: memory is allocated and freed dynamically, in any
order.
Used for data outliving the method which created them. In
Java all objects are stored in the heap
39 Department of CSE and IT

Memory Manager
The memory manager is part of the Operating System.
It must keep track of which parts of the heap are free, and which are
allocated.
A memory manager supports the following operations:
acquire: allocates memory needed by programs
release: deallocates memory no longer needed by programs
It also defragments memory when needed
40 Department of CSE and IT

Problems faced in memory allocation
Memory fragmentation:
External fragmentation: Memory wasted outside allocated
blocks
Internal fragmentation: Memory wasted inside allocated block.
Results when memory allocated is larger than memory requested.
Overhead: Additional memory that must be allocated, above and beyond
that requested by programs, in order to provide for the management of the
heap.
41 Department of CSE and IT

Free List
Memory manager uses a free list data structure that keeps track of free
memory blocks in a scheme for dynamic memory allocation.
Common implementations for free list:
Singly-linked list
Doubly-linked list
Buddy systems: an array of doubly-linked lists
Allocation Policies:
First fit chooses the first block in the free list big enough to satisfy the
request, and split it.
Next fit is like first fit, except that the search for a fitting block will start
where the last one stopped, instead of at the beginning of the free list.
Best fit chooses the smallest block bigger than the requested one.
Worst fit chooses the biggest, with the aim of avoiding the creation of too
many small fragments – but doesn’t work well in practice.
42 Department of CSE and IT

Singly-linked list implementation of free-list
Each node represents a free block of memory
Nodes must be sorted according to start addresses of free blocks so that
adjacent free memory blocks can be combined.
acquire( ) and release( ) operations are O(n); where n is the number of
blocks in the heap.
In order to acquire a block, a node is searched following one of the
allocation policy. If the block is bigger than requested, it is divided into two.
One part is allocated and one remains in the list.
In order to release a block,
a new node must be inserted (if the adjacent block is not on the free list)
or a node, which contains the adjacent free block, must be modified.
Searching for the place of the new or existing node has complexity O(n).
43 Department of CSE and IT

Doubly-linked list implementation of free-list
In this implementation
Nodes are not sorted according to start addresses of free blocks.
All memory blocks have boundary tags between them. The tag has
information about the size and status (allocated/free)
Each node in the doubly linked list represents a free block. It keeps
size & start address of the free block and start addresses & sizes of
the previous and next memory blocks. The adjacent blocks may be or
may not be free
The release operation does not combine adjacent free blocks. It simply
pretends a node corresponding to a released block at the front of the
free list. This operation is thus O(1). Adjacent free blocks are combined
by acquire().
The acquire operation traverses the free list in order to find a free area of
a suitable size. As it does so it also combines adjacent free blocks.
44 Department of CSE and IT

Doubly Linked List Example
Node structure:
Initial state of memory (shaded=allocated, grayed=boundary tags)
The corresponding free list
45 Department of CSE and IT

Doubly Linked List Example (Cont.)
The operation release(400, 4000) will result in:
The node corresponding to the freed block is appended at the
front of the free-list. The nodes x, y, and z correspond to the
three free blocks that have not yet been combined.
46 Department of CSE and IT

Doubly Linked List Example (Cont.)
The operation acquire(600) using the first-fit allocation policy
will first result in the combination of the three adjacent free
blocks:
At this point the corresponding free list is:
47 Department of CSE and IT

Doubly Linked List Example (Cont.)
The required 600 bytes are then allocated, resulting in:
The corresponding free list is:
48 Department of CSE and IT

Buddy Systems implementation of free-list
Instead of having a single free list, it has an array of free lists;
each element of the array holding blocks of the same size. One
type of buddy systems is the binary buddy system.
For a memory of size m, there are free-lists of size 2
0
, 2
1
,
2
2
, . . . , 2
k
, where m  2
k

The heap is viewed as one large block which can be split into
two equal smaller blocks, called buddies. Each of these
smaller blocks can again be split into two equal smaller
buddies, and so on. Each memory block has its “buddy”. The
“buddy” of a block of size 2
k
that starts at address x is the
block of size 2
k
that start at address y = complementBit_k(x),
where the address bits are numbered from right to left starting
with 0.
49 Department of CSE and IT

Buddies
If each block is of size 8 bytes (i.e., 2
3
bytes); then
the buddy of a block is obtained by complementing
bit 3 of its starting address. If each block is of size 4
bytes (i.e., 2
2
bytes); then the buddy of a block is
obtained by complementing bit 2 of its starting
address.
Example: What is the starting address of the buddy
of a block that starts at address 1100101010101101
if each block is 16 bytes?
Solution: 16 = 2
4
; the starting address of the buddy
is obtained by complementing bit 4:
1100101010111101
50 Department of CSE and IT

Binary Buddy System implementation of free-list
Each array element is a list of free blocks of same size. The size of
each block is a power of 2.
51 Department of CSE and IT

Binary Buddy System Algorithms
acquire(x): x <= 2
k
, the corresponding free list is searched
If there is a block in this list, it is allocated;
otherwise a block of size 2
k+1
, 2
k+2
, and so on is searched and
taken off the free list. The block is divided into two buddies. One
buddy is put on the free list for the next lower size and the other
is either allocated or further splinted if needed.
release(x): The block is placed back in the free list of its size, and
if its buddy is also free they are combined to form a free block of
size 2
k+1
. This block is then moved to the corresponding free list.
If its buddy is free they are combined to form a free block of size
2
k+2
, which is then moved to the appropriate free list and so on.
52 Department of CSE and IT

Buddy Systems Advantages/Disadvantages
•Advantage:
–Both acquire( ) and release( ) operations are fast.
•Disadvantages:
–Only memory of size that is a power of 2 can be allocated 
internal fragmentation if memory request is not a power of 2.
–When a block is released, its buddy may not be free, resulting in
external fragmentation.
53 Department of CSE and IT

Department of CSE and IT54
QUIZ Test
1. Technique for direct search is
(A)Binary Search (B) Linear Search (C) Tree Search (D) Hashing
2. If h is any hashing function and is used to hash n keys in to a table of size m,
where n<=m, the expected number of collisions involving a particular key x is :
(A)less than 1. (B) less than n. (C) less than m. (D) less than n/2.
3. The OS of a computer may periodically collect all the free memory space to
form contiguous block of free space. This is called
(A)Concatenation (B) Garbage collection (C) Collision (D) Dynamic Memory Allocation
4. What is the best definition of a collision in a hash table?
A. Two entries are identical except for their keys.
B. Two entries with different data have the exact same key.
C. Two entries with different keys have the same exact hash value.
D. Two entries with the exact same key have different hash values.
5. In an open-address hash table there is a difference between those spots which
have never been used and those spots which have previously been used but no
longer contain an item. Which function has a better implementation because of
this difference?
A. insert B. is present C. remove E. size F. Two or more of the above
functions
-

Department of CSE and IT
55
Thank You
55

HAshing ppt to describe tha pplications in DS

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

HAshing ppt to describe tha pplications in DS

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx