Software Engineering Program DATA STRUCTURES AND ALGORITHMS 1
Chapter 1 2
Introduction to Data Structures, Algorithms and Algorithms Analysis 3
Objective At the end of the ses s i on st u dent s shoul d ha v e b as i c understanding of Data structure Classification of data structure Algorithm Characteristics of algorithms Algorithm analysis 4
Why Data Structures? They are essential ingredients in creating fast and powerful algorithms. They help to manage and organize data. They make code cleaner and easier to understand. 7
“Bad programmers worry about the code. Good programmers worry about data structures and their relationships .” Linus Torval ds 8
Introduction Program = Algorithm + Data structure Data Structure: The way data is organized in a computer’s memory so that it can be used effectively. Algorithm : The sequence of computational steps to solve a problem. 9
Con’ t … Most of you have programmed; already been exposed to algorithms and data structure. Perhaps you didn't see them as separate entities; you saw data structures as simple programming constructs (provided by STL--standard template library). However, data structures are quite distinct from algorithms, and very important in their own right. 10
Con’ t … Model defines an abstract view to the problem define the properties of the problem Properties of the problem include The data which are affected and The operations that are involved in the problem. Define Data structure Properties 11
Abstract Data Types Vs. Data Structures 12
Abstract Data Type (ADT) is an abstraction of a data structure which provides only the interface to which a data structure must follow. The interface doesn’t give any specific details about how something should be implemented or in what programming language. 13
Abstract Data Type (ADT) Consists of an abstract data structure and operations Specifies: What can be stored in the ADT What operations can be done on/by the ADT E.g., if we are going to model employees of an organization : ADT stores employees with their relevant attributes and discarding irrelevant attributes. ADT supports hiring, firing, retiring, … operations. 14
Data structure (DS) is a language construct that the programmer has defined in order to implement an abstract data type . There are lots of formalized and standard Abstract data types such as Stacks, Queues, Trees, etc. Do all characteristics need to be modeled? Not at all it depends on the scope of the model the reason for developing the model Abstraction is a process of classifying characteristics as relevant and irrelevant for the particular purpose at hand and ignoring the irrelevant ones. 15
Abstract Data Types Vs. Data Structures Exa m ples 16 Abstraction (ADT) Implementation (DS) List Dynamic array Linked list Queue Linked list Based Array Based Map Tree Map Hash Table Vehicle B i c y c l e Car Train
CLASSIFICATION OF DATA STRUCTURE What is data structure : is basically a technique of organizing and storing of different types of data items in computer memory. It is considered as not only the storing of data elements but also the maintaining of the logical relationship existing between individual data elements . also be defined as a mathematical or logical model , which relates to a particular organization of different data elements. 17
CLASSIFICATION OF DATA STRUCTURE Primitive data structure Non-primitive data structure 18
CLASSIFICATION OF DATA STRUCTURE Primitive data structure: also known as basic data structures. predefined types of data, which are supported by the programming language. Non-Primitive data structure: are highly developed complex data structures. are not defined by the programming language, but are instead created by the programmer. Are responsible for organizing the group of homogeneous and heterogeneous data elements. 19
CLASSIFICATION OF DATA STRUCTURE Non-Primitive data structure: 20
CLASSIFICATION OF DATA STRUCTURE i s t he The linear and non-linear data structure subclassification of Non-primitive data structure. Difference: li n e a r d a t a s tructure arran g es t he da ta int o a sequence and follow some sort of order. non - li near d a t a s t r u c t u r e doe s not org a n i z e the da ta i n a sequential manner. 21
CLASSIFICATION OF DATA STRUCTURE 22
CLASSIFICATION OF DATA STRUCTURE 23
CLASSIFICATION OF DATA STRUCTURE 24
Algo r ithm An algorithm can be thought of as a set of instructions that specifies how to solve a particular problem. is a well-d efin ed computational procedure that takes some value or a set of values as input and produces some value or a set of values as output An algorithm transforms data structures from one state to another state in two ways: An algorithm may change the value held by a data structure An algorithm may change the data structure itself 25
Properties of algorithm Finiteness Def i ni t ene s s Sequence Feasibility Cor r ec t ne ss Language Independent …. 26
Algorithm Analysis Refers to the process of determining the amount of computing time and storage space required by different algorithms. A process of predicting the resource requirement of algorithms in a given environment. Main resources are: Running Time Memory Usage Communication Bandwidth 28
Approaches to measure the efficiency of algorithms Empirical: competing algorithms and trying them on different instances. Theoretical: Determining the quantity of resources required mathematically (Execution time, memory space, etc.) needed by each algorithm. 29
Specific processor speed, Current processor load Specific data for a particular run of the program Input Size, Input Properties Operating Environment 30 However, it is difficult to use actual clock-time as a consistent measure of an algorithm‘s efficiency, because clock-time can vary based on many things. For example ,
Complexity Analysis Is the systematic study of the cost of computation, measured either in time units or in operations performed, or in the amount of storage space required The goal is to have a meaningful measure that permits comparison of algorithms independent of operating platform. 31
Example Algorithms Two algorithms for computing the Factorial Which one is better? int factorial (int n) { if (n <= 1) return 1; else return n * factorial(n-1); } int factorial (int n) { return 1; if (n<=1) else { fact = 1; for (k=2; k<=n; k++) fact *= k; return fact; } } 32
Complexity: A measure of the performance of an algorithm An algorithm’s performance depends on internal and external factors Complexity measures the internal factors (usually more interested in time than space) Internal The algorithm’s efficiency, in terms of: Time required to run Space (memory storage) required to run External Size of the input to the algorithm Speed of the computer on which it is run Quality of the compiler 33
Con’ t … Two things to consider: Time Complexity : Determine the approximate number of operations required to solve a problem of size n. Space Complexity: Determine the approximate memory required to solve a problem of size n. 34
Ben e fits Provides the short hand method, to find how much time algorithm takes (approx.) Its useful way of estimating the efficiency of an algorithm, without having to measure actual run times. Gives the choice to select among the best algorithms available 36
Program running time – Why? When is the running time (waiting time for user) noticeable/important? 37
Program running time – Why? When is the running time (waiting time for user) noticeable/important? web search database search real-time systems with time constraints 38
How to Measure Algorithm Performance What metric should be used to judge algorithms? Length of the program (lines of code) Ease of programming (bugs, maintenance) Memory required Running time Running time is the dominant standard. Quantifiable and easy to compare Often the critical bottleneck 39
Con’t… An algorithm may run differently depending on: the hardware platform (PC , Sun) the programming language (C, Java, C++) the programmer 40
Complexity analysis involves two distinct phases Algorithm Analysis : Analysis of the algorithm or data structure to produce a function T (n) that describes the algorithm in terms of the operations performed in order to measure the complexity of the algorithm. Order of Magnitude Analysis : Analysis of the function T (n) to determine the general complexity category to which it belongs. 41
Analysis Rules: We assume an arbitrary time unit. Execution of one of the following operations takes time 1: Assignment Operation Single Input/output Operation Single Boolean Operations Single Arithmetic Operations Function Return 42
Con’t… 3. Running time of a selection statement (if, switch) is the time for the condition evaluation + the maximum of the running times for the individual clauses in the selection. 43
Con’ t …. 4. Loops: Running time for a loop is equal to the ru n ning t i m e fo r t he s t a te men t s insi d e t he l o o p * number of iterations. Remark: The total running time of a statement inside a group of nested loops is the r u n n in g t i m e o f the s t a te men t s m ulti p l i e d by the product of the sizes of all the loops. For nested loops, analyze inside out. Always assume that the loop executes the maximum number of iterations possible. 44
Con’t….. 5. Running time of a function call is 1 for setup + the time for any parameter calculations + the time required for the execution of the function body. 45
Example int count(){ int k=0; cout <<“Enter an integer”; cin >>n; for ( i =0;i< n;i ++) k=k+1; return 0; } Time Units to Compute 1 for the assignment statement: int k=0 1 for the output statement 1 for the input statement. In the for loop: 1 assignment, n+1 tests, and n increments. n loops of 2 units for an assignment, and an addition. 1 for the return statement. T (n)= 1+1+1+(1+n+1+n)+2n+1 = 4n+6 = O(n) 46
void func () { int x=0; int i =0; int j=1; cout << ”Enter an Integer value”; cin >>n; while ( i <n){ x++; i ++; } while (j<n) { j ++ ; } } Time Units to Compute ------------------------------------------------- 1 for the first assignment statement: x=0; 1 for the second assignment statement: i=0; 1 for the third assignment statement: j=1; 1 for the output statement. 1 for the input statement. In the first while loop: n+1 tests n loops of 2 units for the two increment (addition) operations In the second while loop: n tests n-1 increments ------------------------------------------------------- T (n)= 1+1+1+1+1+n+1+2n+n+n-1 = 5n+5 = O(n) 47
Exam p le for ( i =1; i <n-1; i ++) { for (j=n; j>= i+1; j--) { if (A(j-1) > A(j)) { temp = A(j-1); A(j-1) = A(j); A(j) = tmp ; } } } Complexity Analysis is ? 48
Asymptotic Analysis The term asymptotic means approaching a value or curve arbitrarily closely (i.e., as some sort of limit is taken). Asymptotic analysis refers to computing the running time of any operation in mathematical units of computation. Asymptotic analysis of an algorithm refers to defining the mathematical boundary/framing of its run-time performance. Asymptotic Notations are languages that allow us to analyze an algorithm's running time by identifying its behavior as the input size for the algorithm increases. This is also known as an algorithm's growth rate. 49
Simplifying the Bound T(n) = ck nk + ck-1 nk-1 + ck-2 nk-2 + … + c1 n + co too complicated too many terms Difficult to compare two expressions, each with 10 or 20 terms Do we really need that many terms? 50
Simplifications Keep just one term! the fastest growing term (dominates the runtime) No constant coefficients are kept Constant coefficients affected by machines, languages, etc. Asymptotic behavior (as n gets large) is determined entirely by the leading term. Example . T(n) = 10 n 3 + n 2 + 40n + 800 If n = 1,000, then T(n) = 10,001,040,800 error is 0.01% if we drop all but the n 3 term 51
T( n ) keep one drop coef 3n 2 +4n+1 3 n 2 n 2 101 n 2 +102 101 n 2 n 2 15 n 2 +6n 15 n 2 n 2 a n 2 +bn+c a n 2 n 2 Asymptotic They all have the same “growth” rate 52
Asymptotic Notations Big Oh Notation, Ο The notation Ο(n) is the formal way to express the upper bound of an algorithm's running time. It measures the worst case time complexity or the longest amount of time an algorithm can possibly take to complete. Omega Notation, Ω The notation Ω(n) is the formal way to express the lower bound of an algorithm's running time. It measures the best case time complexity or the best amount of time an algorithm can possibly take to complete. Theta Notation, θ The notation θ(n) is the formal way to express both the lower bound and the upper bound of an algorithm's running time. It measures the average case time complexity. 53
Asymptotic Notations Big-O, “bounded above by”: T(n) = O(f(n)) For some c and N, T(n) c·f (n) whenever n > N. Big-Omega, “bounded below by”: T(n) = (f(n)) For some c>0 and N, T(n) c·f (n) whenever n > N. Same as f(n) = O(T(n)). Big-Theta, “bounded above and below”: T(n) = (f(n)) T(n) = O(f(n)) and also T(n) = (f(n)) 54
By Pictures Big-Oh ( most commonly used ) bounded above Big-Omega bounded below Bi g - Th e ta exactly 55
Average, Best, and Worst-Case The time required by an algorithm falls under the following three types − Average case: — Average time required for program execution Real world distributions difficult to predict Best case: — Minimum time required for program execution Seems unrealistic Worst case: — Maximum time required for program execution Gives an absolute guarantee On which input instances should the algorithm’s performance be Judged? We will use the worst-case measure. 56
Common Asymptotic Notations 57
Summary (Why O(n)?) n + c o T(n) = c k n k + c k-1 n k-1 + c k-2 n k-2 + … + c 1 Too complicated O(n k ) a single term with constant coefficient dropped Much simpler, extra terms and coefficients do not matter asymptotically Other criteria hard to quantify 58
Runtime Analysis (summary) Useful rules simple statements (read, write, assign) O(1) (constant) simple operations (+ - * / == > >= < <= O(1) sequence of simple statements/operations rule of sums for, do, while loops rules of products 59
Runtime Analysis (cont.) Two important rules Rule of sums if you do a number of operations in sequence, the runtime is dominated by the most expensive operation Rule of products if you repeat an operation a number of times, the total runtime is the runtime of the operation multiplied by the iteration count 60
Runtime Analysis (cont.) O(1) T 1 (n) T 2 (n) if (cond) then body 1 else bod y 2 endif T(n) = O(max (T 1 (n), T 2 (n)) 61
Runtime Analysis (cont.) Method calls A calls B B calls C etc. A sequence of operations when call sequences are flattened T(n) = max(T A (n), T B (n), T C (n)) 62
Example int count() { int k=0; cout<< “Enter an integer”; cin>>n; for (i=0;i<n;i++) k=k+1; return 0; } T (n)= 1+1+1+(1+n+1+n)+2n+1 = 4n+6 = O(n) In the for loop: 1 assignment, n+1 tests, and n increments. n loops of 2 units for an assignment, and an addition. 1 for the return statement. Time Units to Compute ------------------------------------------------- 1 for t h e a ss i g n me n t s t a t e ment: i n t k=0 1 for the output statement. 1 for the input statement. 63
Exam p le for ( i =1; i <n; i ++) if A( i ) > maxVal then maxVal = A( i ); maxPos = i ; Asymptotic Complexity: ? 64
Exam p le for (i=1; i<n; i++) if A(i) > maxVal then maxVal= A(i); maxPos= i; Asymptotic Complexity: O(n) 65
Exam p le for ( i =1; i <n-1; i ++) for (j=n; j>= i+1; j--) if (A(j-1) > A(j)) then temp = A(j-1); A(j-1) = A(j); A(j) = tmp ; endif E ndfor endfor Asymptotic Complexity is ? 66
Exam p le for ( i =1; i <n-1; i ++) for (j=n; j>= i+1; j--) if (A(j-1) > A(j)) then temp = A(j-1); A(j-1) = A(j); A(j) = tmp ; endif e ndfor endfor Asymptotic Complexity is O(n 2 ) 67
Run Time for Recursive Programs T(n) is defined recursively in terms of T(k), k<n The recurrence relations allow T(n) to be “unwound” recursively into some base cases (e.g., T(0) or T(1)). Examples: Factorial Hanoi towers 68
Example: Factorial int factorial (int n) { if (n<=1) return 1; else return n * factorial(n-1); } factorial (n) = n*n-1*n-2* … *1 n * factorial(n-1) n - 1 * factorial(n-2) n-2 * factorial(n-3) … 2 *factorial(1) T(n) T(n-1) T(n - 2) T(1) T ( n ) T ( n 1) d T ( n 2) d d T ( n 3) d d d .... T (1) ( n 1)* d c ( n 1)* d O ( n ) 69
Example: Factorial (cont.) int factorial1(int n) { if (n<=1) return 1; else { fact = 1; for (k=2;k<=n;k++) fact *= k; return fact; } } Both algorithms are O(n). O ( 1 ) O ( 1 ) O ( n ) 70
Example: Hanoi Towers Hanoi(n,A,B,C) = Hanoi(n-1,A,C,B)+Hanoi(1,A,B,C)+Hanoi(n-1,C,B,A) T ( n ) 2 T ( n 1) c 2 2 T ( n 2) 2 c c 2 3 T ( n 3) 2 2 c 2 c c .... 2 n 1 T (1) (2 n 2 ... 2 1) c (2 n 1 2 n 2 ... 2 1) c O (2 n ) 71