INTRODUCTION MEASURES OF CENTRAL TENDENCY MEASURES OF DISPERSION CORRELATION
INTRODUCTION Statistics is defined as, "the discipline that concerns with the collection, organization, analysis, summarization, interpretation and presentation of data".
A.L.Bowley : Science of counting or science of averages Turtle: a body of principles and techniques of collecting, classifying, presenting, comparing and interpreting the quality data Wallis and Roberts: Statistics is a body of methods for making decisions in the face of uncertainty Croxton and Cowden: defined as the collection ,presentation, analysis and interpretation of numerical data
Descriptive statistics: methods of data collection , presentation and characterization of a set of data. All these help in describing the various features of the collected sample data. It includes graphical representation and quantitative measures Eg : bar charts, line graphs Inferential statistics Helps in characterizing a population or help in decision making which is based on the sample results of the population The larger unit about which analysis is to be done is called population and the fraction or portion of that population is called Sample
Biostatistics: Special science related to figures which is responsible to collect, analyze and interpret the data obtained from an experimental study or a survey Biostatistics the branch of statistics that deals with data relating to living organisms . Biostatistics applied to the collection, analysis, and interpretation of biological data and especially data relating to human biology, health, and medicine Biostatistics is the branch of statistics applied to biological or medical sciences, nursing, public health.
Uses of biostatistics T o check whether the difference between two populations is real or a chance occurrence for a particular attribute U sed to evaluate efficiency of vaccines To fix priorities in public health programs
Steps in Biostatistics: Generation of hypothesis. Collection of experimental data. Classification of the collected data. Categorization and analysis of collected data. Interpretation of data.
Data- different observations of statistical analysis and interpretation Frequency distribution: It is a statistical method for summarizing the data. A statistical data is arranged in groups according to conveniently established division of range of the observation. that frequencies are listed in a table is known as ‘frequency distribution/table’. Frequency distribution is a series when a number of observations with similar or closely related values are put in separate groups
Objectives of Frequency Distribution To estimate the frequencies of the population To facilitate the analysis of data . To facilitate computation of various statistical measures . In a frequency distribution raw data is presented by distinct groups which are known as classes Components of frequency distribution: Class : Groups according to size of data. Class limit : The smallest and largest possible measurements in each class. lower limit and upper limit
Class mark- It is also known as middle value. Class mark = ½(Lower limit+ Upper limit ) Class interval = (Upper limit- Lower limit ) Class Frequency - The number of observations falling in each class. Tally mark- Strokes against each frequency observed .
x Frequency Tally Marks 10- 20 2 11 20- 30 5 1111 30- 40 5 1111 40- 50 4 1111 Classes Class limit Lower limit 40 Upper limit 50 Class mark ½(lower +upper) ½(40+50) 0.5*90=45
Frequency distribution types 1. Discrete or Ungrouped Frequency distribution These data’s not arranged in group , these are individual series and arranging in ascending order . No continuity from one class to another Number of times particular value is repeated which is called the frequency of that class Exact measurements of units is clearly mentioned There is a definite difference between the variables of different groups of items
Example: From the following, make a ungrouped frequency distribution. 11,12,5,3,11,13,17,13,5,5,11,5 X Frequency Tally Marks 3 1 1 5 4 1111 11 3 111 12 1 1 13 2 11 17 1 1
2. Grouped frequency distribution- It is based on classes, forming frequency distribution table. Example: From the following data construct a grouped frequency distribution. 3,8,5,2,15,16,13,12,10,19,18,11 The class intervals theoretically continue from the beginning of the frequency distribution to the end with out break
Types of class intervals Exclusive method: the upper limit of one class will be lower limit of another class Inclusive method; Overlapping is avoided, both the upper and lower limits are included in the class interval
Open end classes: A class limit is missing at the lower end of the first class interval or at the upper end of the last class interval or both are not specified Situation arises in number of practical situations- economics, medical data when there are few very high or few very low values which are far apart from majority of observations
Range: The difference between largest and smallest value denoted by R R= Largest value- smallest value R=L-S Mid value: The central point of a class interval is called the mid value or midpoint It is calculated by adding the upper and lower limits of a class and dividing by 2 Mid value= L+U 2
Number of class intervals: It should not be many For any ideal frequency distribution, the number of class intervals can vary from 5 to 15 The difference between lower and upper limits help to fix number of class intervals Sturges rule K= 1+3.322Log10 N N= Total number of observations K= number of class intervals If number of observations =10, then K= 1+ 3.322Log10= 4.322=4
Cumulative frequency distribution: It shows the number of data items with values less than or equal to the upper class limit of each class Cumulative relative frequency distribution gives the proportion of the data items cumulative percentage frequency distribution shows the percentage of data items with values less than or equal to the upper class limit of each class
Measures of central tendency It is Known as measure of central value or measure of location. It is a statistical measure and calculates the location or position of central point to explain the central tendency of the whole quality of data Averages are the values which lie between the smallest and the largest observations Averages are also known as measures of central tendency
Importance of central tendency To find representative value: gives us one value for the distribution and the value represents the entire distribution To condense data To make comparisons: comparing two or more distributions Helpful in further statistical analysis Calculating other statistical measures like dispersion( Statistical dispersion means the extent to which numerical data is likely to vary about an average value)
Properties of good measures of central tendency It should be rigidly defined Easy to understand and calculate Remain unaffected by the extreme values Capable of being used in further statistical computation Based on all items in the series
Various measures of central tendency are Arithmetic mean Median Mode Geometric mean Harmonic mean
Geometric mean is defined as the n th root of the product of n numbers where n is the total number of data values.
The Harmonic Mean (HM) : defined as the reciprocal of the average of the reciprocals of the data values.. It is based on all the observations, and it is rigidly defined . Harmonic mean gives less weightage to the large values and large weightage to the small values to balance the values correctly. In general, the harmonic mean is used when there is a necessity to give greater weight to the smaller items. It is applied in the case of times and average rates.
Different central values are classified as given below Mathematical average When all the values of items in series are considered while taking average - Mean, Geometric mean, harmonic mean Position average – Average depends on the position of the items rather then values of the items. median, mode, percentiles
Applications of AM Standard deviation and variance can be calculated. Correlations and regressions analysis uses mean. In bioequivalence studies, mean (e.g. AUC and c max ) and residual error are determined. Material attributes (size of particles) and product properties are expressed by mean, e.g. Mean dissolution, mean weight of product, mean disintegration time, mean content uniformity, mean assay, mean potency, etc.
Merits of mean : It considers all observations can be used for comparisons Simple to calculate and understand can be used in algebraic calculations no need of sorting or arrangement (ascending and descending order ) It is stable and not affected by the variation of sampling
Limitations of Arithmetic Mean The arithmetic mean is: Very much affected by extreme values. Not determined by inspection and computation is essential. Not suitable to evaluate qualitative data (non-numerical). Not an appropriate measure, in case of skewed distribution Not applicable to nominal or categorical data (e.g. Stages of cancer), results do not give meaningful conclusions.
Characteristics of Arithmetic Mean A good average is defined: No scope for different interpretations. Not affected by extreme values or fluctuations. Should possess sampling stability. Capable of being used for comparison statistically. Easy to calculate and understand.
Method of Calculation of Mean 1. Calculation of Arithmetic .Mean- Individual series 2. Calculation of Arithmetic .Mean - Discrete series (ungrouped data ) 3. Calculation of Arithmetic .Mean - Continuous series (grouped data)
Individual series/direct method
I. Calculation of Mean - Individual series of data Prob) The hardness of 6 tablets is measured ( kg/cm 2 ) and given below. Hardness, kg/cm 2 5.2 4.8 5.4 5 4.6 5.2 Sol) Sum of observations: x = 5.2+4.8+5.4+5.0+4.6+5.2 = 30.2 kg/cm 2 Number of observations: n = 6
Prob ) Tablets (samples) are taken from a batch and weighed. The weights of tablets are nearer to each other, having frequencies. Calculate the mean weight of the tablets for the following data.
Sol) The mean can be calculated as follows:
Prob ) The particle sizes (in a powder) are measured using the microscopic method. The experimental data are reported in the table given below. Find the mean particle size using the direct method.
Median
Merits of median Easily defined and understood Evaluated by using graphical methods Useful in open end classes Applied in unequal distributions Demerits Unsuitable for large and small items in a series Not based on all of the observations (positional average) It is difficult to determine incase of even number of observations Gets affected by sampling fluctuations more than that of mean
Applications can be used to understand the features of a data set when Observations are qualitative in nature Extreme points are present in the data set A fast estimate of an average
Method of Calculation of Median 1. Calculation of median- Individual series 2. Calculation of median- Discrete series (ungrouped data ) 3. Calculation of median- Continuous series (grouped data)
When n is an even number: as the observations are even, it is difficult to locate the central point, median. Two middle values will be considered to estimate the median and mean. The disintegration times (in seconds) of 6 six tablets are given below.
Illustration : the median c max value is calculated from the data of c max , from bioequivalence studies of a drug formulation (given below). The given data are arranged and cumulative frequency is obtained.
In the above table, 22 is the term that first appeared in the row (having the value of 135 g/mL). The median C max = 135 g/ mL.
Illustration : the particle size distribution data of tablets (in a sample) is considered, along with the number of particles (frequency). The median size particle is calculated as follows. The size range is in continuous distribution and the interval is uniform. The given data are arranged to get the cumulative frequency
Size range x(µ m) Frequency(f) Cumulative frequency (c f ) Observation 20-30 3 3 30-40 5 8 Cumulative frequency of the preceding median class (c ) 40-50 (median class) 20 ( f) 28 Cumulative frequency of the median class 50-60 10 38 60-70 5 43 Ԑ f = 43 or n= 43 n/2= 43/2=21.5
From the table, it is observed that the median size should lie between 40-50 m (not necessarily the middle point, because preceding cumulative frequency is also considered for computation). The exact median is estimated using following equation Data : L = 40 m; n = 43; c.f = 8; f = 20; i = 10 m; M d = ?
Mode : Defined as an observations that occur most frequently in the data Used in case of nominal scales Merits Simple and accurate Applied in open end distributions Can be identified by merely examining the data and its computation is easier Gets moderately affected by the items Best reprentative data as it is associated with highest frequencies Demerits In Bimodal distribution, mode value cannot be determined Based on only fewer observations