Sorting algorithms

EleonoraCiceri 2,269 views 82 slides Nov 20, 2015
Slide 1
Slide 1 of 82
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82

About This Presentation

Sorting algorithms in C++
An introduction to sorting algorithm, with details on bubble sort and merge sort algorithms
Computer science principles course


Slide Content

Sorting algorithms Eleonora Ciceri , Politecnico di Milano Email: [email protected]

What is sorting Sorting is any process of arranging items systematically , with two distinct meanings: Ordering : arranging items in a sequence ordered by some criterion Categorizing : grouping items with similar properties

What is a sorting algorithm A sorting algorithm is an algorithm that puts elements of a list in a certain order 78 26 4 12 90 Sorting algorithm 4 12 26 78 90

Why do we need to sort? Sorted lists / sequences are useful in the following cases: 1) Efficient lookup and search 56 31 2 47 54 19 64 85 23 56 31 2 47 54 19 64 85 23 56 31 2 47 54 19 64 85 23 56 31 2 47 54 19 64 85 23 56 31 2 47 54 19 64 85 23 56 31 2 47 54 19 64 85 23 Finding “19” Non-ordered list: 6 accesses

Why do we need to sort? Sorted lists / sequences are useful in the following cases: 1) Efficient lookup and search 2 19 23 31 47 54 56 64 85 2 19 23 31 47 54 56 64 85 Finding “19” Ordered list: 2 accesses

Why do we need to sort? Sorted lists / sequences are useful in the following cases: 2) Merge sequences 2 5 9 13 20 1 4 8 21 1 2 4 5 8 9 13 20 21

What do we sort? “Workers sort parcels in a postal facility” (from: Wikipedia, the free encyclopedia)

What do we sort? We are going to order collections of data A couple of examples: Arrays Linked lists

Compare and swap to sort elements

Common building block of sorting algorithms There are several algorithms one could use to order a list of elements Although they are all different, they share a common building block: T hey compare pairs of elements to decide which is their relative order W hen needed, they swap the elements to restore their order 6 8 3 5 9 Compare: 6 5 3 8 9 Swap:

How to compare elements: Lexicographic order Elements are usually ordered according to the lexicographic (or, equivalently, lexicographical ) order Also known as dictionary order The lexicographic order is a generalization of the way the alphabetical order of words is based on the alphabetical order of their component letters

How to compare elements: Some examples Some examples Order numbers from the smallest to the largest Order words alphabetically Order people ( implemented using a struct that contains name, surname, age ) alphabetically by surname People having the same surname are ordered according to their name People having the same name and surname are ordered according to their age Thus: this proves that comparisons could be based on complex rules that sort out what to do in case of ties

Sorting algorithms time complexity

Which is the difference between sorting algorithms? Each sorting algorithm is characterized by a particular sorting strategy Are they all equal? NO ! Some of them are naïve, some of them aren’t Naïve algorithms require a larger number of comparisons A larger number of comparisons amount to a larger time frame spent to order elements in the collection

(Time) complexity of an algorithm In computer science, the time complexity of an algorithm quantifies the amount of time taken by the algorithm to run The time complexity: is expressed as a function of the length of the input is commonly expressed using big-O notation

Big-O notation “The big-O notation describes the limiting behavior of a function when the number of elements it has to process tends to infinity” We’ll try to simplify the concept in this way: number of elements complexity N O ( N 2 ) e.g., the number of elements in the array we want to order e.g., the number of comparisons we need to perform to order the array

Big-O notation: Which algorithm is “the best”? To us, an algorithm is “good” if it allows us to save time Let N be the number of elements we want to process E.g., number of elements in an array What if we have four algorithms whose time complexities are as follows? Algorithm 1 O(N) Algorithm 2 O(N 2 ) Algorithm 3 O(log(N)) Algorithm 4 O(N*log(N))

Big-O notation: Which algorithm is “the best”? Time complexity can be treated as if it were a mathematical function Being mathematical functions, these curves can be drawn on a graph Algorithm 1 O(N) f(N) = N Algorithm 2 O(N 2 ) f(N) = N 2 Algorithm 3 O(log(N)) f(N) = log(N) Algorithm 4 O(N*log(N)) f(N) = N*log(N)

Big-O notation: Which algorithm is “the best”? 2

Big-O notation: Which algorithm is “the best”? 2 The higher the number of required operation, the more the time, the worse the algorithm!

So, how do we select a sorting algorithm? We can identify several “families” of algorithms: Simple sorts Efficient sorts BubbleSort and variants Distribution sorts

So, how do we select a sorting algorithm? We can identify several “families” of algorithms: Simple sorts Efficient sorts BubbleSort and variants Distribution sorts Simple sorting algorithms are efficient on small data amounts (due to low overhead), but generally do not perform well on large lists Insertion sort Selection sort

So, how do we select a sorting algorithm? We can identify several “families” of algorithms: Simple sorts Efficient sorts BubbleSort and variants Distribution sorts Efficient sorting algorithms are those algorithms whose average complexity is the best you can find ( O(N*log(N)) ) Merge sort Heap sort Quick sort

So, how do we select a sorting algorithm? We can identify several “families” of algorithms: Simple sorts Efficient sorts BubbleSort and variants Distribution sorts Bubble sort algorithm is very simple, and the same characteristics is inherited by all its variants. However, it is highly inefficient (i.e., its time complexity is very high: O(N 2 ) ) Bubble sort Shell sort Comb sort

So, how do we select a sorting algorithm? We can identify several “families” of algorithms: Simple sorts Efficient sorts BubbleSort and variants Distribution sorts These algorithms distribute the input to intermediate structures, which are gathered and placed on the output. They are useful in case of very large data sets that do not fit in memory (since the intermediate structures can be deployed on different machines ) Counting sort Bucket sort Radix sort

One of the most inefficient algorithms ( O(N 2 ) ) One of the most efficientalgorithms ( O(N*log(N)) )

A quick note on time complexity A n algorithm performance may vary with different input of the same sizes [1 2 3 4 5] does not require any swap [5 4 3 2 1] requires a lot of swaps! We commonly attribute to each algorithm: A worst-case complexity (maximum amount of time it may require) An average-case complexity (averaged on all possible inputs) We won’t go into the details, since this is just an introduction to sorting algorithms 

An inefficient sorting algorithm: the Bubble Sort algorithm

Bubble sort: the idea The bubble sort algorithm repeatedly steps through the list of elements to be sorted: Comparing each pair of adjacent elements Swapping them if they are in wrong order The algorithm stops when, by going through the whole array, we do not require any swap (i.e., the array is fully ordered)

Bubble sort: the running example Array: [ 5 1 4 2 8] 5 > 1, thus: swap required Swapped something? False End of iteration? False

Bubble sort: the running example Array: [1 5 4 2 8] 5 > 4, thus: swap required Swapped something? Yes End of iteration? False

Bubble sort: the running example Array: [1 4 5 2 8] 5 > 2, thus: swap required Swapped something? Yes End of iteration? False

Bubble sort: the running example Array: [1 4 2 5 8 ] 5 < 8 , thus: swap not required Swapped something? Yes End of iteration? Yes

Bubble sort: the running example Array: [1 4 2 5 8] We need to proceed, since in the current iteration we swapped something Following operations: Exclude “8” from next iteration Restart with a new iteration Swapped something? Yes End of iteration? Yes

Bubble sort: the running example Array: [ 1 4 2 5 8] 1 < 4, thus: swap not required Swapped something? No End of iteration? No

Bubble sort: the running example Array: [1 4 2 5 8] 4 > 2, thus: swap required Swapped something? No End of iteration? No

Bubble sort: the running example Array: [1 2 4 5 8] 4 < 5, thus: swap not required Swapped something? Yes End of iteration? Yes

Bubble sort: the running example Array: [1 2 4 5 8] We need to proceed, since in the current iteration we swapped something Following operations: Exclude “5” from next iteration Restart with a new iteration Swapped something? Yes End of iteration? Yes

Bubble sort: the running example Array: [ 1 2 4 5 8] 1 < 2, thus: swap not required Swapped something? No End of iteration? No

Bubble sort: the running example Array: [1 2 4 5 8] 2 < 4, thus: swap not required Swapped something? No End of iteration? Yes

Bubble sort: the running example Array: [1 2 4 5 8] In the current iteration nothing was swapped The algorithm terminates Swapped something? No End of iteration? Yes

Bubble sort: the pseudo-code procedure bubbleSort ( A : list of sortable items ) n = length(A) repeat swapped = false for i = 1 to n-1 inclusive do if A[i-1] > A[ i ] then swap (A[i-1], A[i]) swapped = true end if end for n = n - 1 until not swapped end procedure (from: Wikipedia, the free encyclopedia)

Bubble sort in C+ +: Iterate over the array We will use the concept of iterators over the array To iterate over the array: int * first int * last while (first < last) { // do what you want first++; }

Bubble sort in C+ +: The algorithm void bubble_sort_int ( int * first, int * last) { bool swapped; do { swapped = false ; int * current = first + 1 ; while ( current < last ) { if (*(current- 1 ) > *current) { swap(current- 1 , current); swapped = true ; } current++; } last--; } while (swapped); }

Bubble sort in C++: Swapping elements void swap ( int * first, int * second) { int temp = *first; *first = *second; *second = temp; }

An efficient sorting algorithm: the Merge Sort algorithm

Merge sort: the idea The merge sort is a divide-and-conquer algorithm where: The unsorted list of N elements is divided into N sub-lists , each containing one element Sub-lists are repeatedly merged to produce new sorted sub-lists

How to merge two sub-lists Sub-lists are merged so as produce a longer sorted sub-list At each step, we: compare the first element of sub-list A and the first element of sub-list B select the smallest element insert it into the resulting list

How to merge two sub-lists: An example How to merge these: to obtain this? 1 6 10 4 5 27 1 4 5 6 10 27

How to merge two sub-lists: An example 1 6 10 4 5 27 1 1)

How to merge two sub-lists: An example 1 6 10 4 5 27 1 6 10 4 5 27 1 4 1) 2 )

How to merge two sub-lists: An example 1 6 10 4 5 27 1 6 10 4 5 27 1 4 6 10 5 27 1 4 5 1) 2 ) 3 )

How to merge two sub-lists: An example 1 6 10 4 5 27 1 6 10 4 5 27 1 4 6 10 5 27 1 4 5 6 10 27 1 4 5 6 1) 2 ) 3 ) 4 )

How to merge two sub-lists: An example 1 6 10 4 5 27 1 6 10 4 5 27 1 4 6 10 5 27 1 4 5 6 10 27 1 4 5 6 10 27 1 4 5 6 10 1) 2 ) 3 ) 4 ) 5 )

How to merge two sub-lists: An example 1 6 10 4 5 27 1 6 10 4 5 27 1 4 6 10 5 27 1 4 5 6 10 27 1 4 5 6 10 27 1 4 5 6 10 27 / 1 4 5 6 10 27 1) 2 ) 3 ) 4 ) 5 ) 6 )

How to merge two sub-lists: The pseudocode function merge(left, right) // merge lists until at least one of them is empty while notempty (left) and notempty (right) if first(left) <= first(right) append first(left) to result left = rest( left ) else append first(right) to result right = rest(right ) // left has elements left while notempty (left) append first(left) to result left = rest( left ) / / right has elements left while notempty (right) append first(right) to result right = rest(right ) return result (from: Wikipedia, the free encyclopedia)

Merge sort: the step-by-step algorithm We start with this list… …and, according to the algorithm, we need to keep dividing it in “ left sub-list ” and “ right sub-list ”…

Merge sort: the step-by-step algorithm …and split again, since sub-lists length is greater than 1…

Merge sort: the step-by-step algorithm … and again.

Merge sort: the step-by-step algorithm Now it’s time to merge the lists, in pairs, so that the resulting list is ordered We start from the bottom, i.e., from the shorter sub-lists we extracted before This is now ordered

Merge sort: the step-by-step algorithm Merge… This is now ordered

Merge sort: the step-by-step algorithm …and merge. This is now ordered

Trying to generalize: Which operations are needed at each iteration? Split phase: If the current list has length 1, stop with the split phase If the current list has length > 1, split it in two parts (left, right) Merge phase: Merge the left and right parts into a unique, ordered list

Trying to generalize: What happens to a list? Given a list: Split in two parts (left and right) 1 Split in two parts (left and right) 1 Do something to order the sub-list Do something to order the sub-list 2 2 Merge the parts in a single list 3 Merge the parts in a single list 3

Trying to generalize: What happens to a list? Algorithmically : [left, right] = split(list) left = order(left) right = order(right) list = merge(left, right)

Trying to generalize: What happens to a list? Algorithmically : [left, right] = split(list) left = mergeSort (left) right = mergeSort (right) list = merge(left, right) This is a recursive call to the merge sort function

Trying to generalize: What happens to a list? Now we finalize the algorithm : function mergeSort (list) { if (length(list) == 1) return list [left, right] = split(list) left = mergeSort (left) right = mergeSort (right) list = merge(left, right) return list }

Trying to generalize: What happens to a list? Now we finalize the algorithm : function mergeSort (list) { if (length(list) == 1) return list [left, right] = split(list) left = mergeSort (left) right = mergeSort (right) list = merge(left, right) return list } This function splits the list in two parts having the same length This function merges the sub-lists so that the result is ordered

Merge sort in C+ +: Iterate over the array We will use the concept of iterators over the array To iterate over the array: int * first int * last while (first < last) { // do what you want first++; }

Merge sort in C+ +: Iterate over the array We will use the concept of iterators over the array To iterate over the array: int * first int * last while (first < last) { // do what you want first++; } Remember: last always points outside the array

Merge sort in C++: Split the array in two parts If the length of the array is even: The sub-lists are as follows: Left: from first (included) to middle (excluded) Right: from middle (included) to last (excluded) int * first int * last int * middle

Merge sort in C++: Split the array in two parts If the length of the array is even: The sub-lists are as follows: Left: from first (included) to middle (excluded) Right: from middle (included) to last (excluded) int * first int * last int * middle LEFT

Merge sort in C++: Split the array in two parts If the length of the array is even: The sub-lists are as follows: Left: from first (included) to middle (excluded) Right: from middle (included) to last (excluded) int * first int * last int * middle RIGHT

Merge sort in C++: Split the array in two parts If the length of the array is odd: The sub-lists are as follows: Left: from first (included) to middle (excluded) Right: from middle (included) to last (excluded) In this case: right is longer than left int * first int * last int * middle

Merge sort in C++: The recursive algorithm void merge_sort_int ( int * first, int * last) { int N = last - first; if (N <= 1 ) return ; int * middle = first + N/ 2 ; merge_sort_int ( first, middle); merge_sort_int ( middle, last); merge (first, middle, last); }

Merge sort in C++: The merge function void merge ( int * first, int * middle, int * last) { std :: size_t N = last - first; int *result = new int [N], * result_current = result, * left_current = first, * right_current = middle; while ( left_current < middle && right_current < last) { if (* left_current <= * right_current ) { * result_current = * left_current ; left_current ++; } else { * result_current = * right_current ; right_current ++; } result_current ++; } fill_with_remaining_elements ( result_current , left_current , middle ); fill_with_remaining_elements ( result_current , right_current , last); for ( std :: size_t i = ; i < N; i++) first[i] = result [i]; delete [] result ; }

Merge sort in C++: The merge function void merge ( int * first, int * middle, int * last) { std :: size_t N = last - first; int *result = new int [N], * result_current = result, * left_current = first, * right_current = middle; while ( left_current < middle && right_current < last) { if (* left_current <= * right_current ) { * result_current = * left_current ; left_current ++; } else { * result_current = * right_current ; right_current ++; } result_current ++; } fill_with_remaining_elements ( result_current , left_current , middle ); fill_with_remaining_elements ( result_current , right_current , last); for ( std :: size_t i = ; i < N; i++) first[i] = result [i]; delete [] result ; } result is a temporary buffer that contains the ordered array At the end of the function, result is restored in the original array and deleted

Merge sort in C++: The merge function void fill_with_remaining_elements ( int *& result_current , int * sub_list_current , int * sub_list_last ) { while ( sub_list_current < sub_list_last ) { * result_current = * sub_list_current ; sub_list_current ++; result_current ++; } }

References

References https://en.wikipedia.org/wiki/ Sorting https://en.wikipedia.org/wiki/ Sorting_algorithm https://en.wikipedia.org/wiki/ Time_complexity https://en.wikipedia.org/wiki/ Big_O_notation https://en.wikipedia.org/wiki/ Bubble_sort https://en.wikipedia.org/wiki/ Merge_sort

References http://stackoverflow.com/questions/24650626/how-to-implement-classic-sorting-algorithms-in-modern-c