CHAPTER 2: METHODS OF DATA COLLECTION AND PRESENTATION Objectives At the end of this chapter students will be able to: Arrange raw data in an array and then classified data to construct a frequency table and a cumulative frequency table. To organize data using frequency distribution. To present data using suitable graphs or diagrams
2.1 Methods of Data Collection Data : is the raw material of statistics. It can be obtained by Measurement counting Observing
Sources of data The statistical data may be classified under two categories depending up on the sources. 1. Primary data : - Data collected by the investigator or researcher by himself for the purpose of a specific inquiry or study. 2.Secondary data : - When an investigator uses data, which have already been collected by others. Sources of secondary data are:- books Journals reports etc.
2.2. Methods of Data Presentation The presentation of data is broadly classified in to the following two categories : Tabular presentation Diagrammatic and Graphic presentation The process of arranging data in to classes or categories according to similarities technically is called classification. It eliminates inconsistency and also brings out the points of similarity and/or dissimilarity of collected items/data. Classification is necessary because it would not be possible to draw inferences and conclusions if we have a large set of collected [raw] data.
2.2.1. Frequency distribution Frequency : is the number of times a certain value or class of values occurs . Frequency distribution (FD): is the organization of raw data in table form using classes and frequency. There are three types of FD and there are specific procedures for constructing each type. The three types are :- I . Categorical FD II . Ungrouped FD and III . Grouped FD
Categorical FD I. Categorical FD : Used for data that can be placed in specific categories; such as nominal, ordinal level of data. Example 2.1: Twenty five patients were given a blood test to determine their blood type. The data is as shown below: A B B AB O A O O B AB B B B O A O O O AB AB A O O B A. Solution : Since the data are categorical by taking the four blood types as classes we can construct a FD as shown below.
coun’t Step 1 : Make a table as shown below Step 2 : Tally data and place the result under the column Tally Step 3 : Count the tallies and place the result under the column frequency. Step 4 : find the percentage of values in each class by the formula (%= f/n * 100%; where: f = frequency, n =total number of observation.) CLA SS TALLY F RE Q UANC Y P ERCENR T A / / / / 5 5 / 25* 1 00 = 2 0% B / / / / // 7 28% AB / / / / 4 16% O / / / / / / / / 9 9 / 25* 1 00 = 3 6 %
I I . U ngrou p ed Fr e quen c y D i s t r i b u ti on ( UFD ) UFD: is often constructed for small set of data or data of discrete variable. Constructing ungrouped frequency distribution : First find the smallest and largest raw score in the collected data. Arrange the data in order of magnitude and count the frequency. To facilitate counting one may include a column of tallies.
Coun’t Example 2.2 : The following data represent the number of days of sick leave taken by each of 50 workers of a company over the last 6 weeks. i. Construct ungrouped frequency distribution ii. How many workers had at least 1 day of sick leave? iii. How many workers had between 2 and 6 days of sick leave? 2 5 8 3 4 1 7 1 7 1 5 4 4 1 8 9 7 1 7 2 5 5 4 3 3 2 5 1 3 2 4 5 5 7 5 1 1 2
Coun’t Solution: i. Since this data set contains only a relatively small number of distinct or different values, it is convenient to represent it in a frequency table which presents each distinct value along with its frequency of occurrence. ii. Since 12 of the 50 workers had no days of sick leave, the answer is 50-12=38 iii. The answer is the sum of the frequencies for values 3, 4 and 5 that is 4+5+8=17 Class F r e q u e n c y Cumu l a t i ve f re qu e n c y 12 12 1 8 20 2 5 25 3 4 29 4 5 34 5 8 42 7 5 47 8 2 49 9 1 50
3. Grouped Frequency Distribution (GFD) When the range of the data is large the data must be grouped in to classes that are more than one unit in width. Definition of some basic terms Grouped frequency distribution : is a FD when several numbers are grouped into one class. Class limits (CL): It separate one class from another. The limits could actually appear in the data and have gaps between the upper limits of one class and the lower limit of the next class. Unit of measure (U): This is the possible difference between successive values. E.g. 1, 0.1, 0.01,0.001, etc
Coun’t Class boundaries: Separate one class in a grouped frequency distribution from the other. The boundary has one more decimal place than the raw data. There is no gap between the upper boundaries of one class and the lower boundaries of the succeeding class. Lower class boundary is found by subtracting half of the unit of measure from the lower class limit and upper class boundary is found by adding half unit measure to the upper class limit . Class width (W): The difference between the upper and lower boundaries of any consecutive class. The class width is also the difference between the lower limit or upper limits of two consecutive class.
Coun’t Class mark (Midpoint): It is found by adding the lower and upper class limit (boundaries) and divided the sum by two. Cumulative frequency: It is the number of observation less than or greater than the upper class boundary of class. CF (Less than type): it is the number of values less than the upper class boundary of a given class. CF (Greater than type): it is the number of values greater than the lower class boundary of a given class. Relative frequency ( Rf ): The frequency divided by the total frequency. This gives the present of values falling in that class . Rfi = fi/n= fi/ ∑fi
Coun’t Relative cumulative frequency ( RCf ): The running total of the relative frequencies or the cumulative frequency divided by the total frequency gives the present of the values which are less than the upper class boundary or the reverse . CRfi = Cfi /n= Cfi /∑ fi
Coun’t STEPS IN CONSTRUCTING A GFD 1 . Find the highest and the smallest value 2 . compute the range; R = H – L Select the number of class desired (K) I. Choose arbitrary between 5 and 15. II . Using struggles formula K = 1 + 3.322Logn ; n = Total frequency Find the class width (W) by dividing the range by the number of classes and round to the nearest integer the result you get. W = R/K Identify the unit of measure usually as 1, 0.1, 0.01 ,
Coun’t 6.Pick a suitable starting point less than or equal to the minimum value. Your starting point is lower limit of the first class.Then continue to add the class width to get the rest lower class limits . Find the upper class limits UCLi = LCLi -U. then continue to add width to get the rest upper class limits 8.find class boundaries - LCBi = LCLi – ½ U - UCBi = UCLi + ½ U 9. Find class mark CMi = ( UCLi + LCLi )/ 2 or CMi = ( UCBi + LCBi )/ 2 .
Coun’t 10. Tally the data 11. Find the frequencies 12. Find the cumulative frequencies .Depending on what you are trying to accomplish, it may be necessary to find the cumulative frequency. 13. If necessary find Rf and RCf . Example 2.3 : The blood glucose level, in milligrams per deciliter, for 60 patients is shown below . Construct a grouped frequency distribution for the data set.
Coun’t 55 70 85 90 93 86 103 74 92 63 101 83 82 100 97 97 109 84 84 75 92 68 114 84 101 81 91 82 115 86 69 59 56 84 77 90 77 97 80 101 61 74 87 80 58 81 78 88 86 59 82 83 59 78 116 72 62 105 65 78 Solution :- Highest value = 116, Lowest value = 55 2) Range = 116 – 55 = 61 3) K = 1+ 3.322Log60 = 1 + 3.322(1.78) = 6.9 ≈ 7 4) W = R / K = 61/7 = 8.7 ≈ 9 5) U = 1 6 ) LCL1=55 7 ) Find the upper class limits . 8 ) Find class boundaries 9 ) Find class mark
Coun’t C l a s s li m i t Fr e qu e ncy C l a s s boun d a r y C l a s s M ar k C F( < ) C F( > ) Rf % R f 55 – 63 9 54.5 – 63.5 59 9 60 0.15 15 64 – 72 5 63.5 – 72.5 68 14 51 0.08 8 73 – 81 12 72.5 – 81.5 77 26 46 0.2 20 82 – 90 17 81.5 – 90.5 86 43 34 0.28 28 91 – 99 7 90.5 – 99.5 95 50 17 0.12 12 100 – 108 6 99.5 – 108 . 5 104 56 10 0.1 10 109 –117 4 108.5 – 1 1 7.5 113 60 4 0.07 7
2.2.2 . Diagrammatic presentation of data: Bar charts, Pie-chart, Cartograms The most convenient and popular way of describing data is using graphical presentation . It is easier to understand and interpret data when they are presented graphically than using words or a frequency table. A graph can present data in a simple and clear way The three most commonly used diagrammatic presentation for discrete as well as qualitative data are: Pie charts Bar charts Pictogram
pie chart A. pie chart is a circle that is divided in to sections or wedges according to the percentage of frequencies in each category of the distribution. The angle of the sector is obtained using: Example 2.4 : Using the immunization status of children in certain area given in example 2.5 , draw the pie chart.
Bar Charts Bar Charts Used to represent & compare the frequency distribution of discrete variables and attributes or categorical series. Bars can be drawn either vertically or horizontally . In presenting data using bar diagram All bars must have equal width and the distance between bars must be equal . The height or length of each bar indicates the size ( frequency ) of the figure represented.
Coun’t There are different types of bar charts. The most common being : Simple bar chart Component or sub divided bar chart. Multiple bar charts. I.Simple bar chart Are used to display data on one variable . They are thick lines ( narrow rectangles ) having the same breadth. The magnitude of a quantity is represented by the height /length of the bar.
I . Simple bar chart Example 2.5 Consider the immunization status of children in certain area; immunization Status (class) number/ frequency Relative frequency In percentage Non immunized 75 37.5% Partially immunized 57 27.2% Fully immunized 78 37.1% total 210 100% Draw a simple bar chart of the immunization status of children.
Coun’t
Component Bar chart II. Component Bar chart When there is a desire to show how a total (or aggregate) is divided in to its component parts , we use component bar chart. The bars represent total value of a variable with each total broken in to its component parts and different colors or designs are used for identifications Example 2.6: Consider data on immunization status of women by marital status
Coun’t Marital status Immunization status immunization Not immunization No % No % Total Single 58 24.7 177 75.3 235 married 156 34.7 294 65.3 450 divorced 10 35.7 18 64.3 28 widowed 7 50.0 7 50 14 total 231 31.0 496 68.2 727 Draw a component ( sub-divided ) bar chart of the immunization status of women by marital status
Coun’t
III. Multiple Bar charts Multiple Bar charts These are used to display data on more than one variable. They are used for comparing different variables at the same time. Example 2.7: Draw a multiple bar chart to represent the immunization status of women by marital status given in Example 2.6. Solution :
Coun’t
2.2.4 Graphical Presentation of data 2.2.4 Graphical Presentation of data The histogram , frequency polygon and cumulative frequency graph or ogive is most commonly applied graphical representation for continuous data .
Coun’t Procedures for constructing statistical graphs: Draw and label the x and y axes. Choose a suitable scale for the frequencie s or cumulative frequencies and label it on the y-axes. Represent the class boundaries for the histogram or ogive or the mid points for the frequency polygon on the x-axes . Plot the points . Draw the bars or line s to connect the points.
Histogram Histogram :- a graph which displays the data by using vertical bars of various heights to represent frequencies . Class boundaries are placed along the horizontal axes. Class marks and class limits are sometimes used as quantity on the x-axis . Example 2.8: Construct a histogram to represent the blood glucose level for 60 patients given in example 2.3 . Solution :
Coun’t
Frequency polygon Frequency polygon If we join the mid-points of the top s of the adjacent rectangles of the histogram with line segments a frequency polygon is obtained. When the polygon is continued to the x-axis just outside the range of the lengths the total area under the polygon will be equal to the total area under the histogram. Example 2.9 : Construct a Frequency polygon to represent the following data.
Coun’t Class limit frequency Class mark Class boundaries R.F %RF Less than C.F. More than C.F. 15-24 3 19.5 14.5-24.5 0.06 6% 3 50 25-34 4 29.5 24.5-34.5 0.08 8% 7 47 35-44 10 39.5 34.5-44.5 0.20 20% 17 43 45-54 15 .49.5 44.5-54.5 0.30 30% 32 33 55-64 12 59.5 54.5-64.5 0.24 24% 44 18 65-74 4 69.5 64.5-74.5 0.08 8% 48 6 75-84 2 79.5 74.5-84.5 0.04 4% 50 2
Coun’t Solution: Adding two class marks with fi = 0, we have 9.5 at the beginning, and 89.5 at the end, the following frequency polygon is plotted .
Ogive (cumulative frequency polygon) An Ogive (pronounced as “oh-jive”) is a line that depicts cumulative frequencies , just as the cumulative frequency distribution lists cumulative frequencies. Note that the Ogive uses class boundaries along the horizontal scale , and graph begins with the lower boundary of the first class and ends with the upper boundary of the last class. Ogive is useful for determining the number of values below some particular value.
Coun’t There are two type of Ogive namely less than Ogive and more than Ogive . The difference is that less than Ogive uses less than cumulative frequency and more than Ogive uses more than cumulative frequency on y-axis. Example 2.10: i) Draw a less than Ogive fo r data of blood glucose level of the 60 patients given in Example 2.3.
Coun’t
Coun’t Note: For both ogives , one class with frequency zero is added for similar reason with the frequency polygon.