hashing in data strutures advanced in languae java
ishasharma835109
111 views
47 slides
May 09, 2024
Slide 1 of 47
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
About This Presentation
hashing
Size: 1.03 MB
Language: en
Added: May 09, 2024
Slides: 47 pages
Slide Content
Hashing
Hashing Hashing is a technique of converting an element or a value into a fixed size key, that is used to uniquely identify that element from the group of similar elements. In other words, hashing is a process of mapping an element or a value with a unique key. In hashing, a hash function is used, which takes the value(element) as an input and generates the fixed size key also known as hash code as an output. The generated hash code is thus used as the index position at which the value associated with the key is stored.
Examples of Hashing in Data Structure The following are real-life examples of hashing in the data structure – In schools, the teacher assigns a unique roll number to each student. Later, the teacher uses that roll number to retrieve information about that student. A library has an infinite number of books. The librarian assigns a unique number to each book. This unique number helps in identifying the position of the books on the bookshelf.
Need for Hash data structure Every day, the data on the internet is increasing multifold and it is always a struggle to store this data efficiently. In day-to-day programming, this amount of data might not be that big, but still, it needs to be stored, accessed, and processed easily and efficiently. A very common data structure that is used for such a purpose is the Array data structure. Now the question arises if Array was already there, what was the need for a new data structure! The answer to this is in the word “ efficiency “.
Though storing in Array takes O(1) time, searching in it takes at least O(log n) time. This time appears to be small, but for a large data set, it can cause a lot of problems and this, in turn, makes the Array data structure inefficient. So now we are looking for a data structure that can store the data and search in it in constant time, i.e. in O(1) time. This is how Hashing data structure came into play. With the introduction of the Hash data structure, it is now possible to easily store data in constant time and retrieve them in constant time as well.
Components of Hashing Key: A Key can be anything string or integer which is fed as input in the hash function the technique that determines an index or location for storage of an item in a data structure. Hash Function: The hash function receives the input key and returns the index of an element in an array called a hash table. The index is known as the hash index . Hash Table: Hash table is a data structure that maps keys to values using a special function called a hash function. Hash stores the data in an associative manner in an array where each data value has its own unique index.
Hash Table Hash table is a data structure which uses hashing to store the data in the form of key value pairs, such that the basic operations i.e. the insertion, deletion, and searching can be performed on the data in O(1) time. Consider a hash table as an array, and whenever we need to insert, delete or search an element in it, first the hash code corresponding to that element is computed and then treating the hash code as the index of the array, the required operation is performed on the given element. A simple hashing approach would be to use the modulo(%) operator as a hash function and generate the key for a given value(assuming its numerical). hash= value % hashTableSize
How does Hashing work? Suppose we have a set of strings {“ab”, “cd”, “ efg ”} and we would like to store it in a table. Our main objective here is to search or update the values stored in the table quickly in O(1) time and we are not concerned about the ordering of strings in the table. So the given set of strings can act as a key and the string itself will act as the value of the string but how to store the value corresponding to the key? Step 1: We know that hash functions (which is some mathematical formula) are used to calculate the hash value which acts as the index of the data structure where the value will be stored.
Step 2: So, let’s assign “a” = 1, “b”=2, .. etc , to all alphabetical characters. Step 3: Therefore, the numerical value by summation of all characters of the string: “ab” = 1 + 2 = 3, “cd” = 3 + 4 = 7 , “ efg ” = 5 + 6 + 7 = 18 Step 4: Now, assume that we have a table of size 7 to store these strings. The hash function that is used here is the sum of the characters in key mod Table size. We can compute the location of the string in the array by taking the sum(string) mod 7. Step 5: So we will then store “ab” in 3 mod 7 = 3, “cd” in 7 mod 7 = 0, and “ efg ” in 18 mod 7 = 4.
The above technique enables us to calculate the location of a given string by using a simple hash function and rapidly find the value that is stored in that location. Therefore the idea of hashing seems like a great way to store (key, value) pairs of the data in a table.
int hashTable [10]; int hashTableSize = 10; int hashFunction (int value) { return (value); } void Insert(int value) { //Compute the index using the hash function int index= hashFunction (value) % hashTablesize ; //insert the value in the hashtable at the computed index hashTable [index]=value; } void search(int value) { //Compute the index uing the Hash function int index= hashFunction (value) % hashTableSize ; //Check the value in the hashTable present the computed index if ( hashTable [index]==value) printf(“found”); else printf("not found"); }
What is a Hash function? The hash function creates a mapping between key and value, this is done through the use of mathematical formulas known as hash functions. The result of the hash function is referred to as a hash value or hash. The hash value is a representation of the original string of characters but usually smaller than the original. Types of Hash functions: Division Method. Mid Square Method. Folding Method. Multiplication Method A good hash function should have the following properties: Efficiently computable. Should uniformly distribute the keys (Each table position is equally likely for each. Should minimize collisions. Should have a low load factor(number of items in the table divided by the size of the table).
Problem with Hashing If we consider the above example, the hash function we used is the sum of the letters, but if we examined the hash function closely then the problem can be easily visualized that for different strings same hash value is begin generated by the hash function. For example: {“ab”, “ ba ”} both have the same hash value, and string {“ cd”,”be ”} also generate the same hash value, etc. This is known as collision and it creates problem in searching, insertion, deletion, and updating of value.
What is Collision? The hashing process generates a small number for a big key, so there is a possibility that two keys could produce the same value. The situation where the newly inserted key maps to an already occupied, and it must be handled using some collision handling technology.
How to handle Collisions? There are mainly two methods to handle collision: Separate Chaining: Open Addressing:
Separate Chaining The idea is to make each cell of the hash table point to a linked list of records that have the same hash function value. Chaining is simple but requires additional memory outside the table. Example: We have given a hash function and we have to insert some elements in the hash table using a separate chaining method for collision resolution technique. Hash function = key % 5, Elements = 12, 15, 22, 25 and 37.
step by step approach to how to solve the above problem: Step 1: First draw the empty hash table which will have a possible range of hash values from 0 to 4 according to the hash function provided.
Step 2: Now insert all the keys in the hash table one by one. The first key to be inserted is 12 which is mapped to bucket number 2 which is calculated by using the hash function 12%5=2. Step 3: Now the next key is 22. It will map to bucket number 2 because 22%5=2. But bucket 2 is already occupied by key 12.
Step 4: The next key is 15. It will map to slot number 0 because 15%5=0. Step 5: Now the next key is 25. Its bucket number will be 25%5=0. But bucket 0 is already occupied by key 25. So separate chaining method will again handle the collision by creating a linked list to bucket 0.
Complexity Analysis Insert –O(1) Search > Best :O(1) > Avg: O(N/M) >Worst: O(N) Delete- depends on Time complexity of searching M = Number of slots in hash table N = Number of keys to be inserted in hash table Load factor α = N /M
Open Addressing Open Addressing, which is also known as closed hashing is a technique of collision resolution in hash tables. The main idea of open addressing is to keep all the data in the same table to achieve it, we search for alternative slots in the hash table until it is found. The three Major collision resolution strategies Linear Probing Quadratic Probing Double hashing When using open addressing, a collision is resolved by probing (searching) alternative cells in the hash table until our target cell (empty cell while insertion, and cell with value x while searching x ) is found.
Linear Probing In linear probing, collisions are resolved by searching the hash table consecutively (with wraparound) until an empty cell is found. The definition of collision function f is quite simple in linear probing. As suggested by the name it is a linear function of i or simply f ( i )= i Algorithm of linear probing Insert( x ) - Find the hash value, k of x from the hash function hash ( x ) . Iterate consecutively in the table starting from the k , till you find a cell that is currently not occupied. Place x in that cell.
Search( x ) - Find the hash value k of x from the hash function hash ( x ) . Iterate consecutively in the table starting from the , k , till you find a cell that contains x or which is never been occupied. If we found x , then the search is successful and unsuccessful in the other case. Delete( x ) - Repeat the steps of Search( x ). If element x does not exist in the table then we can't delete it. If x exists in the cell (say k ), put ∞ in cell k to denote it has been occupied some time in the past, but now it is empty.
Pseudocode for Linear Probing class Hashing: size, table[] Hash(x): return x%size Insert(x): k=Hash(x) while(table[k] is not empty): k=(k+1)%size table[k]=x Search(x): k=Hash(x) while(table[k] != x): if(table[k] has never been occupied): return false k=(k+1)%size return table[k]==x Delete(x): k=Hash(x) while(table[k]!=x): if(table[k] has never been occupied): return k=(k+1)%size if(table[k]==x): table[k] = -Infinity
Example of linear probing - Table Size = 7 Hash Function - hash ( key )= key %7 Insert - 16,40,27,9,75 Step 1 - Make an empty hash table of size 7. Step 2 - Inserting 16,40, 27 ℎ(16)=16%7=2 ℎ(40)=40%7=5 ℎ(27)=27%7=6
Step 3 - Inserting 9 and 75. hash (9)=9%7=2 But at index 2 already 16 is placed and hence collision occurs so as per linear probing we will search for consecutive cells till we find an empty cell. So we will probe for hash (9)+1 i . e . cell 3, since the next cell i . e . 3 is not occupied we place 9 in cell 33 . hash (75)=75%7=5 Again collision happens because 40 is already placed in cell 5 . So will search for the consecutive cells, so we search for cell 6 which is also occupied then we will search for cell ( hash (75)+2)%7 i . e . 0 which is empty so we will place 75 there.
Search - 75,21 Step 4 - Search 75 and 21 hash (75)=75%7=5 But at index 5,75 is not present so we search for consecutive cells until we found an empty cell or a cell with a value of 75 . So we search in cell 6 but it does not contain 75 , so we search for 75 in cell and we stop our search here as we have found 75 . h (21)=21%7=0 We will search for 21 in cell but it contains 75 so we will search in the next cell hash (21)+1, i . e . 1 since it is found empty it is clear that 21 do not exist in our table.
Problem With Linear Probing Even though linear probing is intuitive and easy to implement but it suffers from a problem known as Primary Clustering . It occurs because the table is large enough therefore time to get an empty cell or to search for a key k is quite large. w orst case time complexity of searching , insertion and deletion operations to be O ( n ) , where n is the size of the table.
Quadratic Probing The working of quadratic probing involves taking the initial hash value and probing in the hash table by adding successive values of an arbitrary quadratic polynomial. As suggested by its name, quadratic probing uses a quadratic collision function f . One of the most common and reasonable choices for f ( i )= i^ 2
For inserting x we search for the cells hash ( x )+0, hash ( x )+1^2, hash ( x )+2^2,... until we find an empty cell to insert x . For searching x we again search for the cells hash ( x )+0, hash ( x )+1^2, hash ( x )+2^2,... until we find a cell with value x . If we find an empty cell that has never been occupied it means x is not present in the hash table. For deletion, we repeat the search process if a cell is found with value x we replace the value x with a predefined unique value to denote that this cell has contained some value in past.
Example of Quadratic Probing Table Size = 7 Insert = 15, 23, and 85. Search & Delete = 85 Hash Function - Hash ( x )= x %7 Collision Resolution Strategy - f ( i )= i^ 2 Step 1 - Create a table of size 7 . Step 2 - Insert 15 and 23 hash (15)=15%7=1 Since the cell at index 1 is not occupied we can easily insert 15 at cell 1. hash (23)=23%7=2 Again cell 2 is not occupied so place 23 in cell 2. After performing this step our hash table will look like
Step 3 - Inserting 85 hash (85)=85%7=1 In our hash table cell 1 is already occupied so we will search for cell 1+1^2, i . e . cell 2 . Again it is found occupied so we will search for cell 1+2^2 , i . e . cell 5 . It is not occupied so we will place 85 in cell 5 . After performing all these 3 insertions in our hash table it will look like -
Double Hashing Double hashing offers us one of the best techniques for hashing with open addressing. The permutation formed by double hashing is like a random permutation therefore it almost eliminates the chances of cluster forming as it uses a secondary hash function as an offset to deal with collision condition. The hash function that is used in double hashing is of the form - h ( k , i )= hash 1( k )+ i × hash 2( k ), i =0,1,2,3,...
Example of double hashing Table size = 7 hash 1( k ) = k %7 and hash 2( k )=1+ k %6 Insert = 37,25,12,40 and 75 Search & Delete = 75 Step 1 - Create a table of size 77 . Step 2 - Insert 37,25 and 12 hash1(37)=37%7=2 hash1(25)=25%7=4 hash1(12)=12%7=5 There is no collision at any point during inserting these three values so we will simply place these elements at their respective positions. After which, our hash table will look like -
Step 3 - Insert 40 and 74 hash 1(40)=40%7=5, hash 2(40)=1+40%6=5 But at the cell at index 5 is already occupied, so we will check for next cell i . e . hash 1(40)+1× hash 2(40) which will evaluate to (5+5)%7=3 . Cell at index 3 is not occupied so we will place 40 in cell 3. hash 1(74)=74%7=4, hash 2(74)=1+74%6=3 But at the cell at index 4 is already occupied, so we will check for next cell i . e . hash 1(74)+1× hash 2(74) which will evaluate to (4+3)%7=0 . Cell at index 0 is not occupied so we will place 74 in cell 0.
Load Factor Total number of elements per bucket= N/M The load factor of the hash table can be defined as the number of items the hash table contains divided by the size of the hash table. Load factor is the decisive parameter that is used when we want to rehash the previous hash function or want to add more elements to the existing hash table. It helps us in determining the efficiency of the hash function i.e. it tells whether the hash function which we are using is distributing the keys uniformly or not in the hash table.
Rehashing? R ehashing means hashing again. Basically, when the load factor increases to more than its predefined value (the default value of the load factor is 0.75), the complexity increases. So to overcome this, the size of the array is increased (doubled) and all the values are hashed again and stored in the new double-sized array to maintain a low load factor and low complexity.
Hashing Data Structure in JAVA HashMap HashMap class implements the Map interface which allows us to store key and value pair, where keys should be unique. If you try to insert the duplicate key, it will replace the element of the corresponding key Java HashMap contains values based on the key. Java HashMap contains only unique keys. Java HashMap may have one null key and multiple null values. Java HashMap is non synchronized. Java HashMap maintains no order. The initial default capacity of Java HashMap class is 16 with a load factor of 0.75.
Make a HashMap HashMap< keyType , valueType > myMap = new HashMap< keyType , valueType >(); Put and get values into a map myMap.put (key, value); myMap.get (key) // returns the corresponding value Some useful other methods int size = myMap.size ();//returns the size of pairs added into map myMap.containsKey (key); // returns true or false if key is in map myMap.keySet (); //returns the keys in the Map in collection set myMap.values ();//returns the values in the Map in collection set myMap.remove (key); // remove the key value pairs myMAp . isEmpty() ;// returns true if this map contains no key-value mappings. Iterate using a foreach loop for( keyType key : myMap.keySet ()) { // not ordered myMap.get (key); // do something with the key/value pair }
Methods inherited from class java.util.AbstractMap METHOD DESCRIPTION equals() Compares the specified object with this map for equality. hashCode () Returns the hash code value for this map. toString () Returns a string representation of this map.
HashSet in Java A few important features of HashSet are mentioned below: Implements Set Interface . The underlying data structure for HashSet is Hashtable . As it implements the Set Interface, duplicate values are not allowed. Objects that you insert in HashSet are not guaranteed to be inserted in the same order. Objects are inserted based on their hash code. NULL elements are allowed in HashSet. . HashSet does not store duplicate items
Internal Working of a HashSet HashSet uses HashMap for storing its object internally. You must be wondering that to enter a value in HashMap we need a key-value pair, but in HashSet, we are passing only one value. Storage in HashMap: Actually the value we insert in HashSet acts as a key to the map Object and for its value, java uses a constant variable. So in the key-value pair, all the values will be the same.
Make a HashSet HashSet<E> hs = new HashSet<E>(); build an empty HashSet object in which the default initial capacity is 16 and the default load factor is 0.75. Put values into a hashset hs . add (key);// Used to add the specified element if it is not present, if it is present then return false. Some useful other methods hs.remove (key);// It is used to remove the specified element from this set if it is present. Key should be of object type hs.clear ();// It is used to remove all of the elements from the set. hs.contains (key);// used to return true if this set contains the specified element. hs.size ();//returns the number of elements in set
Iterating through set using Iterator () method Iterator<E> i = Hash_Set.iterator (); while ( i.hasNext ()) { // Iterating over elements // using next() method System.out.println( i.next ()); }