UNIT V Searching Sorting Hashing Techniques [Autosaved].pptx

kncetaruna 14 views 75 slides Oct 14, 2024
Slide 1
Slide 1 of 75
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75

About This Presentation

Sorting and Hashing


Slide Content

UNIT V SORTING AND HASHING TECHNIQUES

Sorting — Merge sort — Quick sort — Insertion sort — Shell sort — Radix sort; Hashing — Hash Functions — Collision resolution techniques — Linear Probing — Separate Chaining — Open Addressing — Rehashing — Extendible Hashing.

Linear Search It sequentially checks each element of the list for the target value until a match is found  The time complexity is commonly estimated by counting the number of elementary steps performed by any algorithm to finish execution.  Time complexity of linear search is O(n)

Routine void linear( int x, int a[], int n) { int i ; for( i =0;i< n;i ++) { if(a[ i ]==x) { printf ("element is found at position %d\n", i ); exit(0); } } printf ("element not found"); }

Binary search Each step the algorithm compares the input key with the value at the middle element of array In binary search, the list is scanned from left to right. It X is the middle element(X== searchkey ), the element is found Otherwise, if X is smaller than the middle element, apply the strategy to the sorted sub array to the left of the middle element If X is greater than the middle element apply the same strategy to the right half

Binary search

Binary search

SORTING INTRODUCTION Sorting is a technique for arranging data in a particular order The sorting order can be ascending or descending. Internal Sorting Data resides on main memory of computer. It is applicable when the number of elements in the list is small. Eg . Bubble Sort, Insertion Sort, Shell Sort, Quick Sort., Selection sort, Radix sort External Sorting huge amount of data and it resides on secondary device Eg . Merge Sort

Bubble sort Consecutive adjacent pairs of elements in the array are compared with each other. If the element at the lower index is greater than the element at the higher index the two elements are interchanged so that the element is placed before the bigger one. This process will continue till the list of unsorted elements exhausts

Example

Routine void bubble_sort ( int A[ ], int n ) { int temp; for( int k = 0; k< n-1; k++) { // (n-k-1) is for ignoring comparisons of elements which have already been compared in earlier iterations for( int i = 0; i < n-k-1; i ++) { if(A[ i ] > A[ i+1] ) { temp = A[ i ]; A[ i ] = A[ i+1 ]; A[ i + 1] = temp; } } } }

Merge Sort Merge sort is a sorting technique based on divide and conquer technique. Merge sort first divides the array into equal halves and then combines them in a sorted manner. Algorithm Merge sort keeps on dividing the list into equal halves until it can no more be divided. By definition, if it is only one element in the list, it is sorted. Then, merge sort combines the smaller sorted lists keeping the new list sorted too. Step 1 − if it is only one element in the list it is already sorted, return. Step 2 − divide the list recursively into two halves until it can no more be divided. Step 3 − merge the smaller lists into new list in sorted order.

Divide and Conquer Divide:  This involves dividing the problem into smaller sub-problems. Conquer:  Solve sub-problems by calling recursively until solved. Combine:  Combine the sub-problems to get the final solution of the whole problem.

Quick Sort Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of data into smaller arrays. A large array is partitioned into two arrays one of which holds values smaller than the specified value, say pivot, based on which the partition is made and another array holds values greater than the pivot value. Divide:  In Divide, first pick a pivot element. After that, partition or rearrange the array into two sub-arrays such that each element in the left sub-array is less than or equal to the pivot element and each element in the right sub-array is larger than the pivot element. Conquer:  Recursively, sort two subarrays with Quicksort . Combine:  Combine the already sorted array.

Quicksort picks an element as pivot, and then it partitions the given array around the picked pivot element. In quick sort, a large array is divided into two arrays in which one holds values that are smaller than the specified value (Pivot), and another array holds the values that are greater than the pivot. After that, left and right sub-arrays are also partitioned using the same approach. It will continue until the single element remains in the sub-array.

Choosing the pivot Picking a good pivot is necessary for the fast implementation of quicksort . However, it is typical to determine a good pivot. Some of the ways of choosing a pivot are as follows - Pivot can be random, i.e. select the random pivot from the given array. Pivot can either be the rightmost element of the leftmost element of the given array. Select median as the pivot element.

In the given array, we consider the leftmost element as pivot. So, in this case, a[left] = 24, a[right] = 27 and a[pivot] = 24. Since, pivot is at left, so algorithm starts from right and move towards left. Now, a[pivot] < a[right], so algorithm moves forward one position towards left, i.e. -

Now, a[left] = 24, a[right] = 19, and a[pivot] = 24. Because, a[pivot] > a[right], so, algorithm will swap a[pivot] with a[right], and pivot moves to right Now, a[left] = 19, a[right] = 24, and a[pivot] = 24. Since, pivot is at right, so algorithm starts from left and moves to right. As a[pivot] > a[left], so algorithm moves one position to right

Now, a[left] = 9, a[right] = 24, and a[pivot] = 24. As a[pivot] > a[left], so algorithm moves one position to right Now, a[left] = 29, a[right] = 24, and a[pivot] = 24. As a[pivot] < a[left], so, swap a[pivot] and a[left], now pivot is at left, i.e. -

Since, pivot is at left, so algorithm starts from right, and move to left. Now, a[left] = 24, a[right] = 29, and a[pivot] = 24. As a[pivot] < a[right], so algorithm moves one position to left, as - Now, a[pivot] = 24, a[left] = 24, and a[right] = 14. As a[pivot] > a[right], so, swap a[pivot] and a[right], now pivot is at right,

Now, a[pivot] = 24, a[left] = 14, and a[right] = 24. Pivot is at right, so the algorithm starts from left and move to right. Now, a[pivot] = 24, a[left] = 24, and a[right] = 24. So, pivot, left and right are pointing the same element. It represents the termination of procedure. Element 24, which is the pivot element is placed at its exact position. Elements that are right side of element 24 are greater than it, and the elements that are left side of element 24 are smaller than it.

Now, in a similar manner, quick sort algorithm is separately applied to the left and right sub-arrays. After sorting gets done, the array will be -

Insertion Sort (Straight insertion sort) The array of values to be sorted is divided into two sets. One that stores sorted values and another that contains unsorted values. Initially, the element with index 0 (assuming LB = 0) is in the sorted set. Rest of the elements are in the unsorted set. The first element of the unsorted partition has array index 1 (if LB = 0).

Insertion Sort During each iteration of the algorithm, the first element in the unsorted set is picked up and inserted into the correct position in the sorted set. The sorting algorithm will proceed until there are elements in the unsorted set.

Insertion Sort

Routine for( i =1; i <n; i ++) { key = a[ i ]; j=i-1; while(j>=0 && a[ j ]>key) { a[ j+1 ]=a[ j ]; j=j-1; } a[ j+1 ]=key; }

Selection Sort The algorithm maintains two subarrays in a given array. The subarray which is already sorted. Remaining subarray which is unsorted. The selection sort algorithm sorts an array by repeatedly finding the minimum element from unsorted part and putting it at the beginning. In every iteration of selection sort, the minimum element from the unsorted subarray is picked and moved to the sorted subarray

Selection Sort

Routine void selection_sort (int A[ ], int n) //n=8 { // temporary variable to store the position of minimum element int minimum; for(int i = 0; i < n-1 ; i++) { //finds the minimum element minimum = i ; for(int j = i+1; j < n ; j++ ) { if(A[ j ] < A[ minimum ]) { minimum = j ; } } // putting minimum element on its proper position. swap ( A[ minimum ], A[ i ]) ; } }

Shell Sort Improved version of straight insertion sort In shell sort, elements at a specific interval are sorted. The interval between the elements is gradually decreased based on the sequence used. Sequence –Shell's original sequence: N/2 , N/4 , …, 1 The performance of the shell sort depends on the type of sequence used for a given input array Gap=floor(N/2) Gap1=floor(Gap/2) Gap2=floor(Gap1/2)

Example 1 Shell Sort with Increments of Three

Shell Sort

int shellSort (int arr [], int n) { for (int gap = n/2; gap > 0; gap /= 2) { for (int i = gap; i < n; i += 1) { int temp = arr [ i ]; int j; for (j = i ; j >= gap && arr [j - gap] > temp; j -= gap) arr [j] = arr [j - gap]; arr [j] = temp; } } return 0; }

Radix Sort First, sort elements based on the value of the unit place. Then, sort elements based on the value of the tenth place. This process goes on until the last significant place

Example [121, 432, 564, 23, 1, 45, 788]

Radix Sort Find the largest element in the array, i.e. max. Let X be the number of digits in max. –X is calculated because we have to go through all the significant places of all elements. In this array [121, 432, 564, 23, 1, 45, 788], –we have the largest number 788. –It has 3 digits. –Therefore, the loop should go up to hundreds place (3 times)

Radix Sort • Step 1 - Define 10 queues each representing a bucket for each digit from 0 to 9. • Step 2 - Consider the least significant digit of each number in the list which is to be sorted. • Step 3 - Insert each number into their respective queue based on the least significant digit. • Step 4 - Group all the numbers from queue 0 to queue 9 in the order they have inserted into their respective queues. • Step 5 - Repeat from step 3 based on the next least significant digit. • Step 6 - Repeat from step 2 until all the numbers are grouped based on the most significant digit.

Time Analysis

Adaptive and Non-Adaptive Sorting Algorithm • Adaptive sorting algorithm –It takes advantage of already 'sorted' elements in the list that is to be sorted. –while sorting if the source list has some element already sorted •Adaptive algorithms will take this into account and will try not to re-order them. –Eg: Insertion Sort, selection sort, bubble sort •Non-Adaptive sorting algorithm –which does not take into account the elements which are already sorted. –Force every single element to be re-ordered to confirm their sortedness – Eg : Shell sort , radix sort

Application of sorting • Searching –Binary search enables you to test whether an item is in a dictionary in time, once the keys are all sorted. • Closest pair –Given a set of n numbers –Find the pair of numbers that have the smallest difference between them –After the numbers are sorted, the closest pair of numbers will lie next to each other somewhere in sorted order. •Element uniqueness –Are there any duplicates in a given a set of n items? –The most efficient algorithm is to sort them and then do a linear scan though them checking all adjacent pairs.

Application of sorting •Selection –What is the kth largest item in the set? –If the keys are placed in sorted order in an array, the kth largest can be found in constant time by simply looking at the kth position of the array •Government organizations, financial institutions, and commercial enterprises –Organize much of this information by sorting it •Load-balancing problem –M identical processors and N jobs to complete –Goal is to schedule all of the jobs on the processors so that the time when the last job completes as early as possible.

Generation of Hash address Using hash function or Hash algorithm generation of hash address is carried out. Its methods are: Truncation method Folding method Mid square method Division method

Truncation Method Example: Hash table size 10000 Keys 987456 125978 611790 Hash Address 987 125 611

Folding method Given Keys are broken down into groups of digits (maybe 3) and add them up to form hash address. Example: Hash table size 1000 Keys 612111979 Grouping 612 + 111 + 979 = 1702 Note: if table size extends then apply truncation method to the folding method result.

Mid square method It choose the middle of key and square it generate the hash address. Example: Hash table size: 1000 Keys: 456 978 294 Squared value: 207936 956484 86436 Mid Squared Value: 79 64 43

Division Method Modular division method is applied and result is taken as hash address. H(key) = key % tablesize Example: Hash table size: 11 Keys: 43 10 36 H(43) = 43%11 = 10, H(10) = 10%11 = 10, H(36) = 36%11 = 3

Collision Resolution Techniques Collision (crash) Collision occurs when two or more elements are hashed to same value When tow items hash to the same slot, there is a systematic method for placing the second item in the hash table. This process is called collision resolution.

Separate chaining When a collision occurs, elements with the same hash key will be chained together. A chain is simply a linked list of all the elements with the same hash key .

Advantage: More number of elements can be inserted as it uses array of linked lists Disadvantage It requires pointers, which occupies more memory space, It takes more effort to perform a search

Open addressing Open addressing also called closed hashing, which is alternative to resolve the collisions with linked lists. In this system, if a collision occurs , alternative cells are tried until an empty cell is found . Three common collision resolution strategies are Linear probing Quadratic probing Double hashing

Linear probing The simplest approach to resolve a collision is linear probing. In this technique, if a value is already stored at a location generated by h(k), it means collision occurred then we do a sequential search to find the empty location. Here the idea is to place a value in the next available position. Because in this approach searches are performed sequentially so it’s known as linear probing. Here array or hash table is considered circular because when the last slot reached an empty location not found then the search proceeds to the first location of the array.

If the location is empty then store value otherwise find the next location. Following hash function is used to resolve the collision in: h(k, i ) = [h(k) + i ] mod m Where m  = size of the hash table, i  = the probe number that varies from  0 to m–1 .

Insert 42,39,69,21,71,55,33

REHASHING As the name suggests,  rehashing means hashing again . Basically, when the load factor increases to more than its pre-defined value So to overcome this, the size of the array is increased (doubled) and all the values are hashed again and stored in the new double sized array to maintain a low load factor and low complexity. Builds the new table with twice the size

Extendible hashing dynamic hashing method wherein directories, and buckets are used to hash data.  Directories:  These containers store pointers to buckets Buckets:  They store the hashed keys. Global Depth - It is associated with the Directories. Global Depth = Number of bits in directory id. Local Depth- Local Depth is associated with the buckets and not the directories. Bucket Splitting:  When the number of elements in a bucket exceeds a particular size, then the bucket is split into two parts. Directory Expansion:  Directory Expansion Takes place when a bucket overflows. Directory Expansion is performed when the local depth of the overflowing bucket is equal to the global depth

Extendible hashing to be performed in two disk accesses Data consists of six bit integers. The root of the tree contains four pointers by the leading two bits of the data. (D) Each leaf has upto M=4 elements D will be represent the number of bits used by the root The number of entries in the directory is 2 D

Suppose we want to insert the key 100100. This would go into the third leaf. But the third leaf is already full, there is no room. So split this leaf into two leaves Now the directory size is increased to 3.

Advantage Provide quick access for insertion and searching Disadvantage The algorithm does not work if there are more than M duplicates.
Tags