Lecture Outline Iter a tive sortin g algo r ithms (com p arison base d ) Selection Sort Bubble Sort Inse r tion Sort Recursive sortin g algo r ithms (com p arison base d ) Mer g e Sort Quick Sort Radix sort (non-comparison base d ) Prope r ties of Sorting In-place sort, stable sort Comp a rison of sortin g algo r ithms Note: we only consid e r sortin g dat a in asc e nd i ng order
Why Study Sorting? When an input is sorted, many problems become easy (e.g. searching , min , max , k-th smallest ) Sorting has a variety of interesting algorithmic solutions that embody many ideas Comp a rison vs no n -comparison based Iter a tive Recursive Recursive Divide-and-conquer Best/worst / average-case bou n ds Rand o mized algo r ithms
Applications of Sorting Uniqueness testing Deleting duplicates Prioritizing events Frequency counting Reconstructing the original order Set intersection/union Finding a target pair x, y such tha t x+y = z Ef f icient searching
Sele c tion Sort Sele c tion Sort
Selection Sort: Idea Given an array of n items 1. Find the larg e st item x , in the ran g e of [ … n − 1 ] 2. Swap x with the ( n − 1 ) th item 3. Reduc e n by 1 and go to Step 1
Selection Sort: Illustration 29 10 14 37 13 37 is the largest, swap it with the last element, i.e. 13 . Q: How to find the largest? 29 10 14 13 37 13 10 14 29 37 13 10 14 29 37 x x x Unsorted items Largest item for current iteration Sorted items 13 10 14 29 37 10 13 14 29 37 Sorted! W e can also find the smallest and put it the front instead http://visualgo.net/sorting?create=29,10,14,37,13&mode=Selection
Selection Sort: Implementation vo i d se l ectio n So r t (int a[ ] , int n) { fo r (int i = n -1; i >= 1; i--) { int maxIdx = i; fo r (int j = 0; j < i; j++) Step 1: fo r (int j = 0; j < i; j++) Search for if (a [ j] >= a [ ma x Id x ]) maxIdx = j; // swap r ou t ine i s i n STL <a l gori t hm> sw a p(a[i], a[maxId x ]); } } Search for ma x imum elem e nt Step 2: Swap 8 } Swap ma x imum elem e nt with the last item i
vo i d se l ectio n So r t (int a[ ] , int n) { fo r (int i = n -1; i >= 1; i--) { int maxIdx = i; fo r (int j = 0; j < i; j++) Selection Sort: Analysis n−1 n−1 Number of t imes exe c uted fo r (int j = 0; j < i; j++) if (a [ j] >= a [ ma x Id x ]) maxIdx = j; // swap r ou t ine i s i n STL <a l gori t hm> sw a p(a[i], a[maxId x ]); } } (n−1 ) +(n−2)+… + 1 = n(n−1)/2 n−1 } T otal • c 1 and c 2 are cost of st a tem e nts in out e r and inne r blocks T otal = c 1 (n−1) + c 2 *n*(n−1)/2 = O(n 2 )
Bubble Sort Bubble Sort
Bubble Sort: Idea Given an array of n items 1. Comp a re pair of a d j a cent items 2. Swap if the items are out of order 3. Repe a t until the end of a r ray Th e largest item will b e at the last position 4. Reduc e n by 1 and go to Step 1 Analogy Lar g e item is like “bu b ble” that floats to the end of t h e arr a y
Bubble Sort: Illustration At the end of Pass 2 , the second largest item 29 is at the second At the end of Pass 1 , the largest item 37 is at the last position. largest item 29 is at the second last position. x x Sorted Item Pair of items under comparison
Bubble Sort: Implementation vo i d bu b bleSo r t (int a[ ] , int n) { fo r (int i = n-1; i >= 1; i--) { fo r (int j = 1; j <= i; j+ + ) { if (a [ j-1] > a[ j ]) swap( a [j ] , a[ j -1] ) ; } } } Step 2: Swap if the items are out Comp a re adjac e nt pairs of num b ers 29 10 14 37 13 items are out of ord e r http://visualgo.net/sorting?create=29,10,14,37,13&mode=Bubble
Bubble Sort: Analysis 1 iteration of the inner loop (test and swap) requires time bounded by a constant c Two nested loops Two nested loops Oute r loop: exactly n itera t ions Inn e r loop: when i=0, ( n − 1 ) itera t ions when i=1, ( n − 2 ) itera t ions … … when i=(n − 1), itera t ions Total number of iterations = 0+1+…+(n − 1) = n(n − 1)/2 Total t ime = c n(n − 1)/2 = O(n 2 )
Bubble Sort: Early Termination Bubble Sort is inef f icient with a O(n 2 ) time complexity However, i t has an interesting property However, i t has an interesting property Given the following arr a y, how many times will the inne r loop swap a p a ir of item? Idea 3 6 1 1 25 39 Idea If we go thro u gh the inne r loop with no swapping the arr a y is sort e d can stop early!
Bubble Sort v2.0 : Implementation vo i d bu b bleSo r t2 (int a[ ] , int n) { fo r (int i = n -1; i >= 1; i--) { bo o l is _ so r ted = t ru e ; fo r (int j = 1; j <= i; j+ + ) { Assume the array is sorted before the inner loop fo r (int j = 1; j <= i; j+ + ) { if (a [ j-1] > a[j ] ) { swap( a [j ] , a[ j -1] ) ; is_so r ted = f al s e; } } // en d of inner lo o p if ( is_so r te d ) ret u rn; the inner loop Any swapping will invalidate the assumption If the flag if ( is_so r te d ) ret u rn; } } If the flag remains true after the inner loop sorted!
Bubble Sort v2.0 : Analysis Worst-case Inp u t is in desce n ding ord e r Runnin g time remai n s the same : O(n 2 ) Runnin g time remai n s the same : O(n 2 ) Best-case Inp u t is already in ascending ord e r Th e algorithm ret u rns afte r a singl e out e r itera t ion Runnin g time: O(n)
In s ertion Sort In s ertion Sort
Insertion Sort: Idea Similar to how most people arrange a hand of poker cards Start with on e card in your hand Start with on e card in your hand Pick t h e next card and insert it into its pr o per sort e d ord e r Repe a t previ o us step for all cards 1 st card : 10♠ 10♠ K♠ 1 5 ♠ ♠ 5♠ 10♠ 2 nd car d : 5♠ 3 rd card : K ♠ … … … …
Insertion Sort: Illustration Start 40 13 20 8 x Sorted x x Unsorted Unsorted T o be inserted Iter a tion 1 13 40 20 8 Iter a tion 2 13 20 40 8 Iter a tion 3 8 13 20 40 http://visualgo.net/sorting?create=40,13,20,8&mode=Insertion
Insertion Sort: Implementation void insertionSort (int a[], int n) { for (int i = 1; i < n; i++) { int next = a[i]; next is the item to be inserted int next = a[i]; int j; for (j = i-1; j >= 0 && a[j] > next; j--) a[j+1] = a[j]; a[j+1] = next; } inserted Shift sorted items to make place for next } } 29 10 14 37 13 Insert next to the correct location http://visualgo.net/sorting?create=29,10,14,37,13&mode=Insertion
Insertion Sort: Analysis Outer-loop executes ( n − 1 ) times Number of times inner-loop is executed depends on the input the input Best-cas e : the arr a y is alre a dy sort e d and ( a[j] > ne x t ) is always false No shifting of dat a is necess a ry Worst-cas e : the arr a y is reve r sely sort e d and ( a[j] > ne x t ) is always true Inse r tion always occ u r at the fro n t Inse r tion always occ u r at the fro n t Therefore, t he best-case time is O(n) And t he worst-case time is O(n 2 )
Merge Sort Merge Sort
Merge Sort: Idea Suppose we only know how to merge two sorted sets o f elements into one Mer g e {1, 5, 9} with {2, 11} {1, 2, 5, 9, 11} Mer g e {1, 5, 9} with {2, 11} {1, 2, 5, 9, 11} Question Where do we get the two sort e d sets in the first pl a ce? Idea (use merge to sort n items) Mer g e each pair of el e men t s into sets o f 2 Mer g e each pair of se t s of 2 into sets o f 4 Repe a t previ o us step for sets o f 4 … Final step: merge 2 sets of n/2 eleme n ts to o b tain a fully sor t ed set
Divide-and-Conq u er Method A powerful problem solving technique Divide-and-conquer method solves problem in the f ollowing steps the f ollowing steps Divid e step Divide the larg e pro b lem into smaller pro b lems Recursively solve th e smaller pro b lems Con q uer step Combin e the results of the smaller problems to prod u ce Combin e the results of the smaller problems to prod u ce the r esult of the larg e r pro b lem
Divide and Conquer: Merge Sort Merge Sort is a divide-and-conquer sorting algorithm Divide step Divide step Divide the arr a y into two (equ a l) halves Recursively sort the two halves Conquer step Mer g e the two halves to form a sorted arr a y
Merge Sort: Illustration 7 2 6 3 8 4 5 Divid e into 7 7 2 2 6 6 3 3 8 8 4 4 5 5 2 3 6 7 4 5 8 Divid e into two hal v es Recursi v ely sort the hal v es Mer g e them 2 3 4 5 6 7 8 Mer g e them 2 3 4 5 6 7 8 Question How should we sort the halves in the 2 nd step?
Merge Sort: Implementation vo i d me r geSor t (int a[ ] , int lo w , int hi g h) { if (l o w < h i gh ) { int mid = (low+h i gh) / 2; me r geSor t (a, low , m i d ); Merge sort on a[lo w ...high] me r geSor t (a, low , m i d ); Di v ide a[ ] into two me r geSor t (a, mid + 1, high); me r ge(a, lo w , mid , h i gh); } } Conquer: merge the Function to merge two sorted halves a[low…mid] and Di v ide a[ ] into two halves and recursi v ely sort them a[low…mid] and a[mid+1…high] into a[low…high] Note mergeSort() is a recursive functi o n low >= hi g h is t he base case, i.e. ther e is 0 or 1 item
Merge Sort: Example mer g eSo r t ( a[l o w…m i d ] ) mer g eSo r t ( a[m i d+1 … h i gh]) mer g e(a [ l o w.. m id], a [ m i d+1 . .hi g h ] ) 38 16 2 7 39 1 2 27 38 16 27 39 1 2 27 a [ m i d+1 . .hi g h ] ) 38 16 38 16 16 38 27 39 1 2 39 12 12 3 9 27 Di v ide Phase Recursive call to mergeSort() Conquer Phase 16 27 3 8 12 2 7 3 9 12 16 2 7 27 3 8 39 Conquer Phase Merge steps http://visualgo.net/sorting?create=38,16,27,39,12,27&mode=Merge
Merge Sort: Merge 3 7 8 a[ .. 2 ] a[ 3 .. 5 ] b[ .. 5 ] 2 4 5 3 7 8 3 7 8 3 7 8 3 7 8 2 4 5 2 4 5 2 4 5 2 4 5 2 2 3 2 3 4 2 3 4 5 3 7 8 3 7 8 2 4 5 2 4 5 2 3 4 5 2 3 4 5 7 8 x x x Unmerged items Items used for comparison Merged items T w o sorted halves to be merged Merged result in a temporar y array
Merge Sort: Merge Implementation vo i d me r ge (int a[], int low, int mid, int high ) { int n = h i gh-low+1; b is a temporary PS: C + + ST L <algorithm> has merge subroutine too in t * b = ne w int[n ] ; int left= l ow , righ t =m i d+1, bI d x=0; wh i le (le f t < = mid && righ t < = h i gh) { if (a[le f t] <= a[ r ig h t]) b [ bIdx++ ] = a[ l ef t ++]; el s e Normal Merging Where both temporary array to store result el s e b [ bIdx++ ] = a[ r ig h t++]; } // co n tinue on next s li d e Where both halves have unmerged items
Merge Sort: Merge Implementation // co n tinue d from p re v ious sl i de wh i le (le f t < = mi d ) b [bIdx++ ] = a[le f t+ + ]; wh i le (ri g ht <= hi g h) b [bIdx++ ] = a [ ri g ht + +]; fo r (int k = 0; k < n; k++) a [ low+ k ] = b [ k]; de l et e [] b ; } Merged result are copied back into a[] Remaining items are copied into b[] } Remember to free allocated memory Question Why do we need a tempo r ary arr a y b[] ?
Merge Sort: Analysis In me r ge S ort() , the bulk of work is done in the merge step For me r ge ( a, lo w , mi d , hi g h) For me r ge ( a, lo w , mi d , hi g h) Let t otal items = k = (hig h − low + 1) Numb e r of compa r isons ≤ k − 1 Numb e r of moves fro m origin a l arr a y to tempo r ary arr a y = k Numb e r of moves fro m tem p ora r y arr a y back to original arr a y = k In t otal, number of operations ≤ 3k − 1 = O(k) The important question is How many times is merge() called?
Merge Sort: Analysis Level 0: mergeSort n items Level 1: mergeSort n/2 items n n/2 n/2 Level 0: 1 call to mergeSort Level 1: mergeSort n/2 items 2 calls to mergeSort Level 2: mergeSort n/2 2 items Level (lg n): mergeSort 1 item n/2 n/2 n/2 2 n/2 2 n/2 2 n/2 2 … 1 1 . . . 1 1 Level 2: 2 2 calls to mergeSort Level (lg n): 2 lg n (= n) calls to mergeSort … … n/(2 k ) = 1 n = 2 k k = lg n
Merge Sort: Analysis Level : 0 call to merge() Level 1 : 1 calls to merge() with n/2 items in each half, O(1 x 2 x n/2) = O(n) time O(1 x 2 x n/2) = O(n) time Level 2 : 2 calls to merge() with n/2 2 items in each half, O(2 x 2 x n/2 2 ) = O(n) time Level 3 : 2 2 calls to merge ( ) with n/2 3 items in eac h half, O( 2 2 x 2 x n/2 3 ) = O(n) time … Level ( lg n) : 2 lg(n) − 1 (= n/2) calls to merg e () with n/2 lg(n) (= 1) item in each half, O(n) time Tot a l time compl e xity = O(n lg(n)) Opt i mal comp a rison-based sortin g met h od
Merge Sort: Pros and Cons Pros Th e perfor m ance is gua r anteed, i.e. unaff e cted by origin a l ord e ring of the input origin a l ord e ring of the input Suitable for extr e mely larg e num b er of in p uts Can o p era t e on the input portion by portion Cons Not e a sy to im p lement Not e a sy to im p lement Requir e s additional stor a ge duri n g mer g ing ope r ation O(n) e x tra mem o ry stor a ge nee d ed
Quick Sort Quick Sort
Quick Sort: Idea Quick S ort is a divide-and-conquer algorithm Divide step Choos e an item p (know n as piv o t ) and partition the Choos e an item p (know n as piv o t ) and partition the items o f a[i...j] into two pa r ts Items that are smaller t han p Items that are gre a ter t han or equ a l to p Recursively sort the two p a rts Con q uer step Do not h ing! Do not h ing! In comparison, Merge Sort spends most o f the time in conquer step but very l i ttle time in divide step
Quick Sort: Divide Step Example 27 38 12 39 27 1 1 6 9 Pivot Choose first element as pivot 12 16 27 39 27 38 Pivot 12 16 27 27 38 39 Pivot Partition a[] about the pivot 27 Recursively sort the two parts the two parts 12 16 27 27 38 39 Notice an y thing special about the position of pi v ot in the final sorted items?
Quick Sort: Implementati o n vo i d qu i ckSor t (int a[ ] , int lo w , int hi g h) { if (l o w < h i gh ) { int pivot I dx = pa r ti t ion(a, l ow, h ig h ); Partition a[low...high] and return the qu i ckSor t (a, low , pi v otId x -1); qu i ckSor t (a, pivot I dx + 1, h ig h ); } } and return the index of the pivot item Recursively sort the two portions partiti o n() splits a[l o w...h i gh] into two portio n s a[l o w ... piv o t–1] and a[ p ivot+1 ... hig h ] Pivot item does not par t icipate in any further sorting
Quick Sort: Partition Algorithm To partition a[i.. . j] , we choose a[i] as the pivot p Why choos e a[i] ? Are th e re oth e r choices? The remaining i t ems (i.e. a[i+1.. . j] ) are divided into 3 The remaining i t ems (i.e. a[i+1.. . j] ) are divided into 3 regions S1 = a[i+1. . .m] wher e items < p S2 = a[m+ 1 ...k-1] wher e item ≥ p Unkn o wn (un p rocessed) = a[k.. . j] , wher e items are yet to be assign e d to S1 or S2 p < p p ? i m k j S1 S2 Unknown
Quick Sort: Partition Algorithm Initially, regions S1 and S2 are empty All ite m s excluding p are in the unkn o wn regi o n For each item a[k] in the unknown region For each item a[k] in the unknown region Comp a re a[k] with p If a[k] >= p , put it into S2 Othe r wise, put a[k] into S1 p ? p ? i k j Unknown
Quick Sort: Partition Algorithm Case 1: if a[k] >= p S1 S2 If a[k] = y p , p < p p ? i m k j x y S1 S2 S1 S2 Incr e ment k p < p > p ? i m k j
Quick Sort: Partition Algorithm Case 2: if a[k] < p If a[k] = y < p p < p x p y ? S1 S2 If a[k] = y < p p < p p ? i m k j Incr e ment m x y p < p y p x ? p < p p ? i m k j x y p < p p ? i m k j y x Swap x and y p < p p ? i m k j Incr e ment k y x
Quick Sort: Partition Implementation int pa r titio n (int a[ ] , int i, int j) { int p = a [ i]; int m = i; p is the pivot S1 and S2 empty PS: C + + ST L <algorithm> has partition subroutine too int m = i; fo r (int k = i + 1; k < = j ; k+ + ) { if (a[k] < p ) { m++; swap( a [k ] , a[m ] ); } el s e { S1 and S2 empty initially Go through each element in unknown region Case 1: Do nothing! Case 2 } } sw a p( a [i], a[ m ]); re t ur n m; } Case 1: Do nothing! Swap pivot with a[m] m is the index of pivot
Quick Sort: Partition Example 46 http://visualgo.net/sorting?create=27,38,12,39,27,16&mode=Quick
Quick Sort: Partition Analysis There is only a single for-loop Numb e r of ite r ations = n u mber of ite m s, n , in the unkn o wn regi o n unkn o wn regi o n n = high − low Complexity is O(n) Similar to Merge Sort, t he complexity is then dependent on the number of t imes partition() is dependent on the number of t imes partition() is called
Quick Sort: Worst Case Analysis When the array is already in ascending order 5 18 23 39 44 19 57 What is the pivot index returned by partition() ? 5 18 23 39 44 19 57 S1 = a[i + 1...m] emp t y when m = i S2 = a[m+1 . ..j] p = a[i] What is the pivot index returned by partition() ? What is t h e effect of swap( a , i, m ) ? S1 is empty, while S2 contains every item except the pivot
Quick Sort: Worst Case Analysis n 1 n-1 T otal n o. of levels = n 1 n-1 1 n-2 1 1 …… As each partition takes linear time, the 1 1 linear time, the algo r ithm in its worst case has n levels and henc e it takes time n+(n-1) + ...+1 = O(n 2 ) cont a ins the pivot only!
Quick Sor t : Best/Average Case Analysis Best case occurs when partition always splits the array into two equal halves Depth of recursion is log n Depth of recursion is log n Each lev e l takes n or fewer comp a risons, so th e time compl e xity is O(n log n) In practice, worst case is rare, and on the average we get some good splits and some bad ones (details in CS3230 :O) ones (details in CS3230 :O) Averag e time is also O(n log n)
Lower Bound: Comparison-Based Sort It is known that All comp a riso n -based sortin g algo r ithms have a compl e xity lower bo u nd of n log n compl e xity lower bo u nd of n log n Therefore, any comparison-based sorting algorithm with worst-case complexity O(n log n) is optimal
Properti e s of Sorting Properti e s of Sorting
In-Place Sorting A sort algorithm is said to be an in-place sort If it re q uires only a co n sta n t amou n t (i.e. O(1) ) of extra sp a ce duri n g the sortin g proc e ss extra sp a ce duri n g the sortin g proc e ss Questions Merge Sort is not in-place, why? Is Qui c k Sort in-place? Is Radi x Sort in-place? Is Radi x Sort in-place? [ CS1020E AY1617S1 Lecture 10 ] 60
Stable Sorting A sorting algorithm is stable if the relative order of elements with the same key value is preserved by the algorithm preserved by the algorithm Example application of stable sort Assume that names have been sort e d in alphabetical ord e r Now, if this list is sort e d agai n by tut o rial grou p Now, if this list is sort e d agai n by tut o rial grou p numb e r, a stable sort algorit h m would ensu r e that all stud e nts in the same tuto r ial gro u ps still app e ar in alph a betical ord e r of their nam e s [ CS1020E AY1617S1 Lecture 10 ] 61
Non-Stable Sort Selection Sort 1285 5 a 4746 602 5 b (8356) 1285 5 5 602 (4746 8356) 1285 5 a 5 b 602 (4746 8356) 602 5 a 5 b (1285 4746 8356) 5 b 5 a (602 1285 4746 8356) Quick Sort 1285 5 150 4746 602 5 8356 (p i vo t =1 2 85) 1285 5 a 150 4746 602 5 b 8356 (p i vo t =1 2 85) 1285 (5 a 150 602 5 b ) (4746 8356) 5 b 5 a 150 602 1285 4746 8356 [ CS1020E AY1617S1 Lecture 10 ] 62
Sorting Algorithms: Summary W orst Case Best Case In-place? Stable? Se l ec t ion So r t O(n 2 ) O(n 2 ) Y es No So r t In s er t ion So r t O(n 2 ) O(n) Y es Y es Bu b bl e Sort O(n 2 ) O(n 2 ) Y es Y es Bu b bl e Sort 2 O(n 2 ) O(n) Y es Y es [ CS1020E AY1617S1 Lecture 10 ] 63 Me r ge Sort O(n lg n) O(n lg n) No Y es Qu i ck Sort O(n 2 ) O(n lg n) Y es No
Summary Comp a rison-Based Sorting Algorit h ms Iter a ti v e Sorting Selection Sort Bubble Sort Inse r tion Sort Recursi v e Sorti n g Mer g e Sort Quick Sort Non-Comparison-Based Sorting Algorith m s Non-Comparison-Based Sorting Algorith m s Radi x Sort Prope r ties of Sorting Algorith m s In-Place Stable [ CS1020E AY1617S1 Lecture 10 ] 64