Statistical techniques for interpreting and reporting quantitative data i

1,195 views 120 slides Feb 05, 2021
Slide 1
Slide 1 of 120
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120

About This Presentation

Meaning of Statistics, Classification & Tabulation of data, Measurement of Scales


Slide Content

Statistical Techniques for Interpreting and Reporting Quantitative Data - I M. Vijayalakshmi M.Sc., M.Phil. (Life Sciences), M.Ed., M.Phil. (Education), NET (Education), PGDBI Assistant Professor (Former), Sri Ramakrishna Mission Vidyalaya College of Education (Autonomous), Coimbatore – 641020.

Meaning of 'Statistics’ The Word 'Statistics' appears to have been derived from the Latin word “Status” meaning “a political state”. Some believe that the word has its root in the German word ' Statistik '. Statistics was simply the collection of numerical data

Definition Defined as the scientific study of handling quantitative information . It embodies the methodology of collection, Classification, Description and Interpretation of data obtained through the conduct of surveys and experiments. The essential purpose is to describe and draw inferences about the numerical properties of populations.

Croxton and Cowden – “The collection, presentation, analysis and interpretation of numerical data"

Examples of Statistics are: - The number of teachers recruited every year;  - The number of colleges functioning in Tamilnadu ; - The number of science graduates produced in a year; - The number of candidates selected for IAS in a year etc.,

Statistics deals with facts and figures. It is the scientific method of collecting the appropriate data, classifying and tabulating the collected data, analyzing them with appropriate statistical techniques and finally drawing truthful inferences and conclusions.

Importance of the study of Statistics Knowing the performance of his students in different subjects Comparing their achievements with students of other institutions Identifying those students who require his help in order to secure more marks

Selecting them for admission to higher courses or for jobs based on their performance in entrance/competitive examinations Developing norms for achievement and psychological tests Constructing and standardizing scholastic ability tests etc.

Steps involved in the Statistical Method I. Collection of Data II. Classification & Tabulation III. Statistical Analysis of data IV. Drawing of inferences

I. Collection of Data : i ) Identify the variables and their nature. ii) Select the appropriate scales of measurement. iii) Obtain the accurate quantitative measurements.

Classification & Tabulation : Transforming the raw data into a suitable frequency distribution. III. Statistical Analysis of data : Descriptive Analysis Inferential Analysis

 

IV. Drawing of lnferences : Avoiding Type I & Type II errors Levels of significance of Inferences.

Descriptive Analysis To describe the properties of the given group taken for study, we calculate certain measures like the averages (Mean, Median, Mode), or measures of dispersion or measures of association. These are called 'descriptive statistics'.

The three major aspects of descriptive statistics are – Measures of central tendency Measures of dispersion I deviation Measures of association I relationship.

Inferential Statistics Analysis in inferential statistics is based on 'sampling technique'. To study a large population, we normally choose from it a small random sample and obtain the descriptive statistics (measures pertaining to the sample) from which we try to infer the measures pertaining to the large population.

Example To estimate the mean performance of +2 students in Maths in Tamil Nadu. we may choose a small sample of 500 students from among those who are about to appear for the public examination; conduct a maths test for the group of 500 students selected and calculate the mean of the scores obtained from the test. This is a sample 'statistic'. From this, we infer the mean of the population of +2 students in Tamil Nadu, which is called 'parameter'. Thus inferential statistics is the technique of estimating the population parameter from the known sample statistic.

Types of Variables Any quantity or trait whose value will go on changing is called a 'variable' · Eg .: i ) Height of pupils ii) Weight of students Achievement scores etc. A 'constant' is one which has a fixed value at all times, in all places. Variables are of two types – i ) Continuous ii) Discrete

Continuous variable: Variable which can have all possible values from - α to + α is called "Infinite Continuous variable". Variables which can have all possible values between any two specified limits are called "Finite Continuous Variables".   Eg : Expenditure - Infinite Continuous Variable Achievement scores - Finite Continuous Variable.

Discrete Variable : Variables which can take only certain specified or allowed values, (and not any other values) are called Discrete Variables.

Scales of Measurement There are four types of scales of measurement. They are i ) Nominal Scale Ordinal Scale Interval Scale Ratio Scale Depending upon the nature of the variable, the suitable scale of measurement should be employed.

Nominal Scale Meant for variables which can be merely labeled or categorized like: Males & Females Married & Single Rural & Urban People Hostlers & Day Scholars High & Low Socio-Economic status etc

Ordinal Scale Meant for variables which cannot be accurately measured but can be rated and ordered. Ordinal Scale is better than Normal Scale.   Eg : Variables like Beauty, Singing Ability, Selling Ability, Oratorical Skill etc, can only be rated and ranked.

Interval Scale Meant for variables which can be measured accurately. Eg : Achievement scores, Height, Weight, Temperature measurements in Fahrenheit scale etc. In Interval scale the origin of the scale or the absolute zero is not known .

Ratio Scale Meant for variables which can be measured accurately as in Interval scale; apart from that the absolute zero value (absence of the trait) is also meaningfully known. Eg : C.G.S. Scales in Physics, Centigrade Scale of Thermometry etc. Here two values of a variable can be expressed as a ratio .

Primary and Secondary Data Primary Data : i ) When the data is collected for the first time , directly from the sources , then it is called primary data. ii) It is original in character iii) The shape of the primary data is like the shape of the raw material . It must be classified, tabulated and interpreted.

Methods of collecting Primary Data: i ) Direct Personal Contact ii) Post and correspondence iii) Schedules through enumerators iv) Combination of the above methods

Secondary Data : Secondary data is called second-hand data. Data which is already collected for some purpose is made use of now, for a totally different purpose . It is in the shape of a finished product .

Sources of Secondary Data: i ) Official publications like U.N.0. reports, I.M.F. (International Monetary Fund) reports, Central and State Government Publications etc. ii) Semi-official publications like reports of city corporation, L.I.C, Reserve Bank etc. iii) Private Publications.

iv) Journals, Newspapers, Published Research articles etc. v) Unpublished data like registers of companies, schools, Govt. Audit Reports. Unpublished Research Theses etc.

Raw and Grouped Data A group of obtained individual scores is known as 'Raw Data'. If the number of such scores is small, then we can handle them as such to calculate sample statistics like the Mean, Standard Deviation, Correlation Co-efficient etc.

However if the group of scores is large (Usually if the group contains 30 or more scores , then it is referred as large group) then it is very difficult to handle them as such to compute the required sample Statistics. In such cases we organize the scores in a number of classes and find how many items get placed in each of these classes.

This table in which raw scores are arranged in the form of classes and class frequencies is called 'Frequency Distribution'. Data that is present in the form of a frequency distribution is known as 'Grouped Data'.

Classification of data Classification is the grouping of related facts into different classes. Facts in one class differ from those of another class with respect to some characteristics called a basis of classification Sorting facts on one basis of classification and then on another basis is called cross-classification

Types of classification Geographical – area-wise Ex: cities, districts Chronological – on the basis of time Qualitative – according to some attributes Quantitative – in terms of magnitudes

Chronological classification

Qualitative classification

Quantitative classification

Forming a Frequency Distribution Frequency distribution is a table in which raw scores are arranged in the form of classes and class frequencies. In a frequency distribution table, there will be number of classes of equal size . The number of score values which fall in a particular class interval is known as the frequency of that class.

Example Step I: Find the maximum and the minimum values. The difference between the two is called the Range. Here the Range is 96-4 = 92. Usually we smoothen the maximum and minimum values such that the range becomes a multiple of 5 . So, taking the maximum value as 100 and the minimum value as 0, we have the Range 100-0 = 100.

Step II: Determine the width ‘ i ’ of the class interval. Usually it is desirable to have i = 5, 10, 15, 20, 25, 50, 100 and the multiples of 100.  

Step III: Determine the number of class intervals (n), using the relation Range n = ----------------- ( i ) Usually, it is desirable to have 'n' ranging between 5 and 15. Of course it is not a hard and fast rule. When considered Step II & Ill together, in our example we can have i = 10; hence n = 100/10 = 10.

Classification according to class intervals Class Limits – lower and upper limit Class intervals – difference between upper and lower limit Class frequency – number of observations corresponding to the particular class Class mid-point – upper limit of the class + lower limit of the class / 2 Two methods of class intervals Exclusive method Inclusive method

Step IV : Write the class intervals (C.I) either in the Exclusive type (where the upper limit of the class becomes the lower limit of the succeeding class ) or Inclusive type (Where both the upper and lower limits of the class are included in the same class interval ; naturally the upper limit of a class is one score less than the lower limit of the succeeding class).

Exclusive class interval type Inclusive class interval type 0-10 0-9 10-20 10-19 20-30 20-29 30-40 30-39 40-50 40-49 50-60 50-59 60-70 60-69 70-80 70-79 80-90 80-89 90-100 90-99

Step V: Check the individual values, and mark each one as a 'tally' against the C.I. in which it falls. For making counting easy, every fifth tally mark against any class interval is made as a horizontal line. Step VI: Count the tally marks against each and every class interval and put the number, which is the frequency of that class.

Relative Frequency Table   Relative frequency = class frequency sum of all frequencies

Cumulative Frequency Table Rating Frequency 0-2 20 3-5 14 6-8 15 9-11 2 12-14 1 Total Frequency 52 Rating Cumulative Frequency 0-2 20 3-5 34 6-8 49 9-11 51 12-14 52

Relative Frequency Table Rating Frequency 0-2 20 3-5 14 6-8 15 9-11 2 12-14 1 Total Frequency 52 Rating Relative Frequency 0-2 38.5% 3-5 26.9% 6-8 28.8% 9-11 3.8% 12-14 1.9% 20/52 = 38.5% 14/52 = 26.9% e tc.

F iducial limits [ fə¦dü·shəl  ′ lim·əts ] (statistics) The boundaries within which a  parameter is considered to be  located; a concept in fiducial  inference.

What is a fiducial confidence interval? A fiducial confidence interval is a confidence interval based on fiducial statistical theory, which considers unknown population parameters to be random variables. Fiducial confidence intervals are primarily used in probit analysis. For a 100(x)% fiducial confidence interval, the probability that the population parameter falls within the interval is (x). This interpretation is fundamentally different from that of standard confidence intervals. Standard confidence intervals do not consider population parameters to be random variables, but fixed values, and consider the confidence interval itself to be random, because the interval is derived from a random sample.

What is a difference between fiducial limits and confidence limits/intervals? Confidence limits (95% or 99%) are calculated either for mean and proportion. In either case the underlying distribution is Normal when sample size is adequately large. Fiducial limits is applicable only in the case of lethal dose required for 50% mortality(LD50) or 90% mortality (LD90). The underlying distribution is logistic growth or S-shaped curve. Even though we transform the data (% values into probit and dose values into logdose ) to fit a linear regression equation (Y( probit )= a+b * logdose ), the upper limit of conventional confidence limits may be beyond the value (say 110%), which is not true in the case of logistic growth. In such situations Fiducial limits is more appropriate than confidence limits.

Tabulation of Data A table is a systematic arrangement of statistical data in columns and rows One of the simplest and most revealing devices for summarising data and presenting them in meaningful fashion is the statistical table Tables are the devices, that are used to present the data in a simple form. It is probably the first step before the data is used for analysis or interpretation.

General principals of designing tables The tables should be numbered e.g table 1, table 2 etc. A title must be given to each table, which should be brief and self explanatory. The headings of columns or rows should be clear and concise. The data must be presented according to size or importance chronologically, alphabetically, or geographically. If percentages or averages are to be compared, they should be placed as close as possible. No table should be too large Most of the people find a vertical arrangement better than a horizontal one because, it is easier to scan the data from top to bottom than from left to right Foot notes may be given, where necessary, providing explanatory notes or additional information.

Parts of a Table Table number – top or bottom Title of the table – suitable Caption – column heading Stub – row heading Body of the table – numerical information Headnote – brief explanatory statement Footnote – explanations - to understand the reader

Types of Tables

Simple and Complex tables

General Purpose and Special Purpose General Purpose Reference or repository tables Provide information for general use or references Special Purpose Summary or analytical or derivative tables Provide information for particular discussion

Charting Data Most convincing and appealing ways in which data may be presented is through charts A picture is said to be worth 10,000 words Presented in an interesting form greater memorizing effect Diagrams and Graphs

Diagrams General Rules : Title Proportion between width and height Selection of appropriate scale Footnotes Index Neatness and cleanliness Simplicity

Types of Diagrams 1. One – dimensional diagrams Ex: Bar diagrams 2. Two - dimensional diagrams Ex: Rectangles, squares and circles 3. Pictograms and Cartograms

1. One – dimensional diagrams Bar diagrams The data presented is categorical Data is presented in the form of rectangular bar of equal breadth. Each bar represent one variant /attribute. Suitable scale should be indicated and scale starts from zero. The width of the bar and the gaps between the bars should be equal throughout. The length of the bar is proportional to the magnitude/ frequency of the variable. The bars may be vertical or horizontal.

Types of Bar Diagrams

Simple Bar Diagrams Represent only one variable

Horizontal Bar Diagrams

subdivision of a single bar to indicate the composition of the total divided into sections according to their relative proportion.

Multiple Bar Diagrams Each observation has more than one value, represented by a group of bars. Percentage of males and females in different countries, percentage of deaths from heart diseases in old and young age

Deviation Bars R epresenting net quantities – excess or deficit Net profit, net loss, net exports or imports etc., Have both positive and negative values

Broken Bars W ide variations in values – very small, other very large

Two-dimensional Diagrams Length as well as width of the bars is considered Area of the bar represents the given data Also known as surface diagrams or area diagrams Types – Rectangles Squares Circles

Rectangles

Pie diagram Consist of a circle whose area represents the total frequency (100%) which is divided into segments. Each segment represents a proportional composition of the total frequency.

Pie diagram

Pictogram Diagram Popular method of presenting data to those who cannot understand orthodox charts. Small pictures or symbols are used to present the data, e.g a picture of a doctor to represent the population physician. Fraction of the picture can be used to represent numbers smaller than the value of whole symbol

Cartogram Diagram Statistical maps Used to give quantitative information on a geographical basis Represent special distribution Shown in many ways – shades of colours , dots, placing pictograms, numerical figure in geographical unit

Cartogram Diagram

Line Graphs It is diagram showing the relationship between two numeric variables (as the scatter) but the points are joined together to form a line (either broken line or smooth curve. Used to show the trend of events with the passage of time.

Line Graphs

Band graphs It is a type of line graph which shows the total for successive time periods broken up into sub-totals for each of the component parts of the total. The various component parts of the whole are plotted one over the other and the gaps between the successive lines are filled by different shades, colours , etc., so that the appearance of a series of bands

Band graphs

Histogram It is very similar to the bar chart with the difference that the rectangles or bars are adherent (without gaps). It is used for presenting class frequency table (continuous data). Used for Quantitative, Continuous, Variables. It is used to present variables which have no gaps e.g age, weight, height, blood pressure, blood sugar etc. It consist of a series of blocks. The class intervals are given along horizontal axis and the frequency along the vertical axis.

Histogram

Frequency Polygon Derived from a histogram by connecting the mid points of the tops of the rectangles in the histogram. The line connecting the centers of histogram rectangles is called frequency polygon. We can draw polygon without rectangles so we will get simpler form of line graph. A special type of frequency polygon is the Normal Distribution Curve.

Frequency Polygon

Smoothed Frequency Curve

Smoothed Frequency Curve

Cumulative frequency diagram or O’give   Here the frequency of data in each category represents the sum of data from the category and the preceding categories.   Cumulative frequencies are plotted opposite the group limits of the variable.   These points are joined by smooth free hand curve to get a cumulative frequency diagram or Ogive .