RAW DATA The unclassified data is called raw data CLASSIFICATION OF DATA The process of grouping data according to their characteristics is known as classification of data Objectives of Classification: a] To simplify complex data b] To eliminate unnecessary details c] To help comparison d] To make analysis and interpretation easy. e] To arrange the data according to their common characteristics
Chronological organization : The organization based on time is called chronological organisation.The chronologically organised data is called time series . Example- Year, month, week, day etc • Spatial or Geographical organisation The data are classified with reference to geographical locations such as countries, States, cities, districts ,etc. • Quantitative organisation : The data organised with reference to quantities are called quantitative classification. Example, height, weight, age, income, marks of students, etc., • Qualitative organisation : The data organised with reference to qualities are called qualitative classification. Example, nationality, literacy, religion, gender, marital status, etc
VARIABLES: variables are anything their values are changing. They are broadly classified into two types. Continuous variable Discrete variable A continuous variable can take any numerical value. It may take integral values (1, 2, 3, 4, ...), fractional values (1/2, 2/3, 3/4, ...), and values that are not exact fractions.( eg . height, weight, mark etc) A discrete variable can take only certain values. Its value changes only by finite “jumps”.( eg . Number of children in a family, number of books in a library etc
FREQUENCY DISTRIBUTION A frequency distribution is a comprehensive way to classify raw data of a quantitative variable. It shows how different values of a variable are distributed in different classes along with their corresponding class frequencies . CLASS FREQUENCY 0-10 1 10-20 3 20-30 5 30-40 1 40-50 2
CLASS LIMITS Each class in a frequency distribution table is bounded by Class Limits. Class limits are the two ends of a class. The lowest value is called the Lower Class Limit and the highest value the Upper Class Limit. Example :10-20 here 10 is lower limit and 20 is upper limit MID-POINT The Class Mid-Point or Class Mark is the middle value of a class. It lies halfway between the lower class limit and the upper class limit of a class. Class Mid-Point or Class Mark = (Upper Class Limit + Lower Class Limit)/2
CLASS INTERVAL Class interval: Difference between upper limit and lower limit Class interval=upper limit-Lower limit Example 10-20 Class interval=20-10=10 FREQUENCY Number of items [observations] falling within a particular class.
while preparing a frequency distribution ,the following four questions need to be addressed • How many classes should we have? • What should be the size of each class? • How should we determine the class limits? • How should we get the frequency for each class? • How many classes should we have? Number of classes can be the calculated by dividing the range (the difference between the largest and the smallest values of variable) by the size of the class intervals.
• What should be the size of each class? Given the range of the variable, we can determine the number of classes once we decide the class interval. Thus, we find that these two decisions are interlinked. We cannot decide on one without deciding on the other.
• How should we determine the class limits? The lower and upper class limits should be determined in such a manner that frequencies of each class tend to concentrate in the middle of the class intervals. Class intervals are of two types: ( i ) Inclusive class intervals : In this case, values equal to the lower and upper limits of a class are included in the frequency of that same class. . Here upper limit of a class and lower limit of the next class are not same (ii ) Exclusive class intervals : In this case, an item equal to the upper class limit is excluded from the frequency of that class. Here upper limit of a class and lower limit of the next class are same
EXCLUSIVE INCLUSIVE
• How should we get the frequency for each class? Frequency of an observation means how many times that observation occurs in the raw data. The counting of class frequency is done by tally marks(/) against the particular class. The counting of tally is made easier when four of them are put as //// and the fifth tally is placed across them as ////
Prepare a frequency distribution from the following data by using exclusive method 2 , 4 ,17, 9 ,10,20,32,50,23,41, 6 , 9 ,48,11,58,12, 8 , 3 , 36,40 Range =L-S=58-2=56. No of classes=56/10=5.6 CLASS TALLY FREQUENCY 0-10 //// // 7 10-20 20-30 30-40 40-50 50-60
STEPS REQUIRED FOR THE CONVERSION OF INCLUSIVE INTO EXCLUSIVE: 1. Find the difference between the lower limit of the second class and the upper limit of the first class. 2. Divide the difference obtained in by two. 3. Subtract the values obtained in from lower limits of all classes . 4. Add the value obtained in to upper limits of all classes.
CLASS F 0-4 1 5-9 2 10-14 3 15-19 4 20-24 5 CLASS F 0-4.5 1 4.5-9.5 2 9.5-14.5 3 14.5-19.5 4 19.5-24.5 5
Loss of information: Statistical calculations are based only on the values of class mark and not on the values of the observations in that class. so other values of the class are not considered .It is called loss of information
Frequency distribution with unequal classes CLASS F 0-10 5 10-20 7 20-30 25 30-40 30 40-50 3 CLASS F 0-10 5 10-20 7 20-25 10 25-30 15 30-35 17 35-40 13 40-50 3
FREQUENCY ARRAY For a discrete variable, the classification of its data is known as a Frequency Array. Since a discrete variable takes values and not intermediate fractional values between two integral values, we have frequencies that correspond to each of its integral values . SIZE OF HOUSEHOLD F 1 5 2 7 3 25 4 30 5 3
Prepare a frequency array from the following data 23,27,23,29,27,23,22,21,24,25,25,26,22,30,28,24,29, 30 ITEMS TALLY F 21 / 1 22 // 2 23 24 25 26 27 28 29 30
UNIVARIATE AND BIVARIATE FREQUENCY DISTRIBUTION: The frequency distribution of a single variable is called univariate frequency distribution A Bivariate Frequency Distribution can be defined as the frequency distribution of two variables .