Hashing a searching technique in data structures

shiks1234 20 views 28 slides Oct 19, 2024
Slide 1
Slide 1 of 28
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28

About This Presentation

searching technique


Slide Content

Course Name: Data Structure and Algorithm Topic: Hashing Course code : CS 2103 Credits : 4 Mode of delivery : Hybrid (Power point presentation) Faculty : Dr. Atul Kumar Verma Email-id : [email protected] B.TECH III Sem CSE Academic YEAR: 2024-2025 1 8/2/2024 Satpal Singh Kushwaha

Assessment criteria’s Assignment quiz Mid term examination End term Examination 8/2/2024 Satpal Singh Kushwaha 2

Hashing

Why do we need hashing? Many applications deal with lots of data Search engines and web pages There are myriad look ups. The look ups are time critical. Typical data structures like arrays and lists, may not be sufficient to handle efficient lookups In general : When look-ups need to occur in near constant time. O(1)

Why do we need hashing? Consider the internet(2002 data): By the Internet Software Consortium survey at http://www.isc.org/ in 2001 there are 125,888,197 internet hosts, and the number is growing by 20% every six month! Using the best possible binary search it takes on average 27 iterations to find an entry. By an survey by NUA at http://www.nua.ie/ there are 513.41 million users world wide.

Why do we need hashing? We need something that can do better than a binary search, O(log N). We want, O(1). Solution: Hashing In fact hashing is used in: Web searches Spell checkers Databases Compilers passwords Many others

Bu ildi ng a n i n d ex u si ng H as hM ap s WORD N D O C S PTR jezebel 20 jezer 3 jezerit 1 jeziah 1 jeziel 1 jezliah 1 jezoar 1 jezrahliah 1 jezreel 39 j e z o a r 34 6 1 118 2087 3922 3981 5002 44 3 215 2291 3010 56 4 5 22 134 992 D O C I D O CCU R POS 1 POS 2 . . . 566 3 203 245 287 67 1 132 . . .

The concept Suppose we need to find a better way to maintain a table (Example: a Dictionary) that is easy to insert and search in O(1) .

Big Idea in Hashing Let S={a 1 ,a 2,… a m } be a set of objects that we need to map into a table of size N. Find a function such that H:S [1…n] Ideally we’d like to have a 1-1 map But it is not easy to find one Also function must be easy to compute It is a good idea to pick a prime as the table size to have a better distribution of values

Finding a hash Function Assume that N = 5 and the values we need to insert are: cab, bea, bad etc. Let a=0, b=1, c=2, etc Define H such that H[data] = ( ∑ characters) Mod N H[cab] = (2+0+1) Mod 5 = 3 H[bea] = (1+4+0) Mod 5 = H[bad] = (1+0+3) Mod 5 = 4

Collisions What if the values we need to insert are “abc”, “cba”, “bca” etc… They all map to the same location based on our map H ( obviously H is not a good hash map) This is called “ Collision ” When collisions occur, we need to “handle” them Collisions can be reduced with a selection of a good hash function

Choosing a Hash Function A good hash function must Be easy to compute Avoid collisions How do we find a good hash function? A bad hash function Let S be a string and H(S) = Σ S i where S i is the i th character of S Why is this bad?

Collisions Hash functions can be many-to-1 They can map different search keys to the same hash key. hash1(`a`) == 9 == hash1(`w`) Must compare the search key with the record found If the match fails, there is a collision

Collision Resolving strategies Separate chaining Open addressing Linear Probing Quadratic Probing Double Probing Etc.

Separate Chaining Collisions can be resolved by creating a list of keys that map to the same value

Linear Probing The idea: Table remains a simple array of size N On insert(x), compute f(x) mod N, if the cell is full, find another by sequentially searching for the next available slot Go to f(x)+1, f(x)+2 etc.. On find(x), compute f(x) mod N, if the cell doesn’t match, look elsewhere. Linear probing function can be given by h(x, i) = (f(x) + i) mod N (i=1,2,….)

Linear Probing f(i) = i Probe sequence: th probe = h(k) mod TableSize 1 th probe = (h(k) + 1) mod TableSize 2 th probe = (h(k) + 2) mod TableSize . . . i th probe = (h(k) + i) mod TableSize

Figure 20.4 Linear probing hash table after each insertion

Linear Probing Example Consider H(key) = key Mod 6 (assume N=6) H(11)=5, H(10)=4, H(17)=5, H(16)=4,H(23)=5 Draw the Hash table 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5

Clustering Problem As long as table is big enough, a free cell can always be found, but the time to do so can get quite large. Worse, even if the table is relatively empty, blocks of occupied cells start forming. This effect is known as primary clustering . Any key that hashes into the cluster will require several attempts to resolve the collision, and then it will add to the cluster.

Linear Probing – Clustering no collision no collision collision in small cluster collision in large cluster Primary Clustering. Nodes hash to same key clump. Nodes hash to same area clump.

Linear Probing How about deleting items from Hash table? Item in a hash table connects to others in the table Deleting items will affect finding the others “ Lazy Delete ” – Just mark the items as inactive rather than removing it.

Lazy Delete Naïve removal can leave gaps! I n se rt f Re m o ve e a b c 3 e 5 d 8 j 8 u 10 g 8 s a b c 5 d 3 f 8 j 8 u 10 g 8 s a b c 3 e 5 d 3 f 8 j 8 u 10 g 8 s F i nd f a b c 5 d 3 f 8 j 8 u 10 g 8 s “3 f” means search key f and hash key 3

Lazy Delete Clever removal I n se rt f Re m o ve e a b c 3 e 5 d 8 j 8 u 10 g 8 s a b c g o n e 5 d 3 f 8 j 8 u 10 g 8 s a b c 3 e 5 d 3 f 8 j 8 u 10 g 8 s F i nd f a b c g o n e 5 d 3 f 8 j 8 u 10 g 8 s “3 f” means search key f and hash key 3

Quadratic Probing Quadratic probing is a scheme in computer programming for resolving collisions in hash tables. Quadratic probing operates by taking the original hash value and adding successive values of an arbitrary quadratic polynomial to the starting value.

26 Quadratic Probing f(i) = i 2 Probe sequence: th probe = h(k) mod TableSize 1 th probe = (h(k) + 1) mod TableSize 2 th probe = (h(k) + 4) mod TableSize 3 th probe = (h(k) + 9) mod TableSize . . . i th probe = (h(k) + i 2 ) mod TableSize f ( i +1) = f ( i ) + 2 i + 1 Less likely to encounter Primary Clustering

Quadratic Probing Problem: We may not be sure that we will probe all locations in the table (i.e. there is no guarantee to find an empty cell if table is more than half full.) If the hash table size is not prime this problem will be much severe. Quadratic probing has not yet been mathematically analyzed. Although quadratic probing eliminates primary clustering, elements that hash to the same location will probe the same alternative cells. This is know as secondary clustering.

CENG 213 Data Structures 28 Hashing Applications Compilers use hash tables to implement the symbol table (a data structure to keep track of declared variables). Game programs use hash tables to keep track of positions it has encountered ( transposition table ) Online spelling checkers.
Tags