Introduction to Data Structure and Algorithms 1 CHAPTER-ONE
This Chapter Covers: 2 What is Data Structure Algorithm ? Classification of Data structure Abstract Data Types Algorithm Analysis Properties of algorithms Algorithm Analysis Concepts and categories Order Of Magnitude: 1. Big-O Notation(Upper bound) 2. Big – Omega (Ω ) Lower bound) 3 . Big –Theta (Θ ) Tight bound) 4 . Little o (small o)
introduction to Data Structure(DS) & Algorithm Analysis Introduction: A program is written in order to solve a problem. A solution to a problem actually consists of two basic concepts: A way to organize the data in memory of computer DS Sequence of steps to solve the problem Algorithm A D ata S tructure is a way of storing data in a computer so that it can be used efficiently. Algorithm : is the sequence of computational steps to solve a problem. Therefore, a program is nothing but data structures plus algorithms . Program=Algorithm + Data structure Data structure is the design of both the structural and functional aspect of the program . The main aim of data structure is to increase the efficiency of the program and decrease the storage requirement. 3
Data Structure It is the structural representation of logical relationship between elements of data. In other word it is the way of organizing data items by considering its relationship to each other. Consider as an example books in a library . It can be stored in a shelf arbitrarily or using some defined orders such as sorted by Title, Author and so on. It can be said that the study of data structures involves the storage, retrieval and manipulation of information. In other words, the possible logical and physical arrangement of data items to provide an efficient solution for a problem is called Data Structure . 4
Classification of Data structure 1. Primitive data structures : These are the basic data structure and are directly operated up on the machine instruction , which is in primitive level. E.g.. integers , floating point numbers, characters, string constants , pointers etc. These primitive data structure are the basis for the discussion of more sophisticated (non-primitive) data structures. 2. Non primitive Data structures : It is more sophisticated data structure emphasizing on structuring of a group of homogenous (same type) or heterogeneous (different type) data items. E.g. Array , list, files, linked list, trees and graph fall in this category . 5
… 6 Primitive Data structures are directly supported by the language ie ; any operation is directly performed in these data items. Ex: integer, Character, Real numbers etc. Non-primitiv e data types are not defined by the programming language, but are instead created by the programmer. Linear data structures organize their data elements in a linear fashion, where data elements are attached one after the other. Linear data structures are very easy to implement, since the memory of the computer is also organized in a linear fashion. Some commonly used linear data structures are arrays, linked lists, stacks and queues. In nonlinear data structures, data elements are not organized in a sequential fashion. Data structures like multidimensional arrays, trees, graphs, tables and sets are some examples of widely used nonlinear data structures.
Classification of Data structure … 7
Operations on the Data Structures 8 Following operations can be performed on the data structures: Traversing- It is used to access each data item exactly once so that it can be processed. Searching - It is used to find out the location of the data item if it exists in the given collection of data items. Inserting - It is used to add a new data item in the given collection of data items. Deleting - It is used to delete an existing data item from the given collection of data items. Sorting - It is used to arrange the data items in some order i.e. in ascending or descending order in case of numerical data and in dictionary order Merging - It is used to combine the data items of two sorted files into single file in the sorted form.
Abstract Data Types An entity with the properties just described is called an abstract data type (ADT). An ADT consists of an abstract data structure and operations. Put in other terms, an ADT is an abstraction of a data structure. The ADT specifies: What can be stored in the Abstract Data Type What operations can be done on/by the Abstract Data Type . For example, if we are going to model employees of an organization: This ADT stores employees with their relevant attributes and discarding irrelevant attributes. This ADT supports hiring, firing, retiring, operations . 9
Abstraction Is a process of classifying characteristics as relevant and irrelevant for the particular purpose at hand and ignoring the irrelevant ones. Applying abstraction correctly is the essence of successful programming. to solve the problem is obtaining ones own abstract view, or model , of the problem. This process of modeling is called abstraction. 10
Algorithm Analysis Algorithm is a step by step procedure to solve a problem. E.g. baking cake, industrial activities, Student Registration, etc., all need an algorithm to follow. The purpose of an algorithm is to accept input values, to change a value hold by a data structure, to re-organize the data structure itself (e.g. sorting), to display the content of the data structure, and so on. An algorithm transforms data structures from one state to another state in two ways: An algorithm may change the value held by a data structure An algorithm may change the data structure itself. More than one algorithm is possible for the same task. 11
Properties of algorithms Finiteness : any algorithm should have finite number of steps to be followed. Definiteness : ( un ambiguity ) the algorithm should have one and only one interpretation during execution. Sequential : it should have a start step and a halt step for a final result Correctness : It must compute correct answer all possible legal inputs. Feasibility : It must be possible to perform each instruction Generality : Algorithm should be valid on all possible inputs. Input/Output : zero or more input, one or more output. etc 12
Algorithm Analysis Concepts Algorithm analysis is the process of determining how much computing time and storage that algorithms will require. In order to solve a problem, there are many possible algorithms. One has to be able to choose the best algorithm for the problem at hand using some scientific method. Complexity analysis is concerned with determining the efficiency of algorithms. How do we measure the efficiency of algorithms? The precise ways of analyzing them are: Running Time Memory Usage Communication Bandwidth 13
Space and Time Complexity 14 Space complexity : is the process of determining a formula for predicting how much memory space will be required for a program execution. Time complexity : is the process of determine a formula for predicting the time required for a program execution. Asymptotic notation : to make meaningful statement about the efficiency of algorithms about time and space complexity.(Big-O, Big- Omega, Big-Theta) Two major steps to do complexity analysis: 1, producing a function f(n) that define algorithms 2, Order of magnitude determine O(f(n)) that define the general complexity category
Cont.. Running time is usually treated as the most important since computational time is the most precious resource in most problem domains. There are two approaches to measure the efficiency of algorithms: Empirical/ Expermental : Programming competing algorithms and trying them on different instances. Theoretical : Determining the quantity of resources required mathematically (Execution time, memory space, etc.) needed by each algorithm. 15
Analysis of Algorithms An algorithm is a finite set of precise instructions for performing a computation or for solving a problem. What is the goal of analysis of algorithms? To compare algorithms mainly in terms of running time but also in terms of other factors (e.g., memory requirements, programmer's effort etc.) What do we mean by running time analysis? Determine how running time increases as the size of the problem increases. 16
Algorithm Analysis Categories Algorithm must be examined under different situations to correctly determine their efficiency for accurate comparisons In order to determine the running time of an algorithm it is possible to define three functions T best (n) , T avg (n) and T worst (n) We need to distinguish a number of different cases: Best case (minimum number of operations) Worst case (maximum number of operations) Average case (average number of operations) 17
Best Case Analysis Assumes that data are arranged in the most advantageous order . It also assumes the minimum input size . E.g. For sorting – the best case is if the data are arranged in the required order. For searchin g – the required item is found at the first position. Note: Best Case computes the lower boundary of T(n) It causes fewest number of executions Best Case ( T best ): The amount of time the algorithm takes on the smallest possible set of inputs. 18
Worst case analysis Assumes that data are arranged in the disadvantageous order . It also assumes that the input size is infinite. Eg . For sorting – data are arranged in opposite required order For searching – the required item is found at the end of the item or the item is missing It computes the upper bound of T(n) and causes maximum number of execution. Worst Case ( T worst ): The amount of time the algorithm takes on the worst possible set of inputs. 19
Average case Analysis Assumes that data are found in random order. It also assumes random or average input size. E.g. For sorting – data are in random order. For searching – the required item is found at any position or missing. Average Case ( T avg ): The amount of time the algorithm takes on an " average " set of inputs. It also causes average number of execution. Best case and average case can not be used to estimate (determine) complexity of algorithms Worst case is the best to determine the complexity of algorithms. 20
Order Of Magnitude Order Of Magnitude refers to the rate at which the storage or time grows as function of problem size (function (n)). It is expressed in terms of its relationship to some known functions – asymptotic analysis. Asymptotic complexity of an algorithm is an approximation of the computational complexity that holds for large amounts of input data. Types of Asymptotic Notations: 1. Big-O Notation(Upper bound) 2. Big – Omega (Ω) (Lower bound) 3. Big –Theta (Θ ) (Tight bound) 4. Little o (small o) 21
1. Big-O Notation Definition : The function T(n) is O(F(n)) if there exist constants c and N such that T(n) ≤ c.F (n) for all n ≥ N. As n increases, T(n) grows no faster than F(n) or in the long run (for large n) T grows at most as fast as F It computes the tight upper bound of T(n) Describes the worst case analysis . 0<f(n)<c g(n) i.e f(n) lies below g(n) 22
Examples Time c.g(n) f(n) f(n)<=c.g(n) f(n)=O(g(n)) n>= n c>0 n n n >=1 f (n)=O(g(n)) Example f (n) =3n+2 g (n) =n f (n)<=c.g(n) 3n+2<=c.n let’s take n=3, c=4 3n+2<=4.n then n>= 2 8<=12 23
Examples Find F(n) such that T(n) = O(F(n)) for function f(n) = 3n+5 let n=6 , c=2 Solution: 3n+5 ≤ c n c=6, N=2 F(n)=n because c=6 for 3n+5 ≤ 6n, for All n >=2 3(2)+5 < 6*(2) 6+5 < 12 11 < 12 hence, f(n)<=g(n) T(n) = O(n ) 24
Example T(n) = n 2 + 5n T(n) ≤ c.F (n) n 2 + 5n ≤ c. n 2 1 + (5/n) ≤ c F(n) = n 2 T(n) = O(n 2 ) Therefore if we choose N=5 , then c= 2; if we choose N=6, then c=1.83 , and so on. So what are the ‘correct’ values for c and N ? The answer to this question, it should be determined for which value of N a particular term in T(n) becomes the largest and stays the largest. In the above example, the n 2 term becomes larger than the 5n term at n>5 , so N=5, c=2 is a good choice. 25
2. Big – Omega Notation Definition : The function T(n) is Ω(F(n)) if there exist Constants c and N such that , T(n) ≥ c.F (n) for all n ≥ N. As n increases T(n) grows no slower than F(n) or in the long run (for large n) T grows at least as fast as F It computes the tight lower bound of T(n). Describes the best case analysis Example: Find F(n) such that T(n) = Ω(F(n)) for T(n) = 3n+5 F(n) = n as 3n+5>= c.n for c=1 and n=0 26
… 27 The notation Ω(n) is the formal way to express the lower bound of an algorithm's running time. It measures the best case time complexity or the best amount of time an algorithm can possibly take to complete. For example , for a function f (n) Ω( f (n)) ≥ { g (n) : there exists c > 0 and n0 such that g (n) ≤ c. f (n) for all n > n0. } 0< c g(n)< f(n) i.e f(n) lies below g(n )
3. Big Theta Notation Definition: The function T(n) is Θ(F(n)) if there exist constants c1, c2 and N such that c1.g(n) ≤ f(n) ≤ c2.g(n ) for all n ≥ N. The notation θ(n) is the formal way to express both the lower bound and the upper bound of an algorithm's running time. It is represented as follows θ( f (n)) = { g (n) if and only if g (n) = Ο( f (n)) and g (n) = Ω( f (n)) for all n > n0. } As n increases, T(n) grows as fast as F(n) It computes the tight optimal bound of T(n). Describes the average case analysis . Example : Find F(n) such that T(n) = Θ(F(n)) for T(n)=2n 2 . Solution: c1n 2 ≤ 2n 2 ≤ c2n 2 F(n)=n 2 because for c1=1 and c2=3, n 2 ≤ 2n 2 ≤ 3n 2 for all n. 28
4. Little o (small o) notation Defn : The function T(n) is o(F(n)) if T(n) is O(F(n)) but T(n) is not Θ(F(n) or for some constants c, N: if T(n) < cF (n) for all n>=N As n increases F(n) grows strictly faster than T(n). It computes the non-tight upper bound of T(n) Describes the worst case analysis Example : Find F(n) such that T(n) = o (F(n)) for T(n)=n 2 . Solution: n 2 <c n 2 F(n)=n 2 because for c=2, n 2 <2n 2 for all n>=1 29