Intro
Applied Statistics for Agricultural Sciences
Stat-500 3(3-0)
Muhammad Arslan Bhatti
M.Phil,PhD in Statistics (UAF)
University of Agriculture,
Faisalabad
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Contents
1
Introduction to Statistics
2
Presentation of Data
3
Measure of Location
4
Measure of Variability
5
Skewness
6
Kurtosis
7
Box-Whisker Plot
8
Cheb Rule
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction
Meaning Of Statistics
The word Statistics has either been derived from Latin word Status
or Italian word Statistica or German word Statistik, meaning of
each word isan organized political state
Meaning of the term Statistics
The word Statistics is generally used to convey following three
meanings(senses) and it is defined differently in each sense
1
Plural sense
2
Singular sense
3
Plural of the word Statistics
Plural sense
In plural use the statistics denotes some systematic collection of
numerical data about some topic or topics or Any purposefully and
systematic collected data is called statistics.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction to Statistics
For example , statistics on population growth, statistics on
industrial production,statistics of accidents etc.
Singular sense
In singular sense the word statistics denotes a somewhat specilised
human activity (technique) concerned with the collection, ordering,
analysis, and interpretation of such data.
Word Statistic
The word Statistic (techinical word), refers to a numerical quantity
such as mean, med, variance etc, calculated from sample values. or
simply a numerical value calculted from a sample is called statistic.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction to Statistics
Statistics
Statistics may be defined as the science of the collection, presenta-
tion, analysis and interpretation of numerical data under uncertain
conditions.
Descriptive Statistics
Deals with the collection, Classification, Summarization and presen-
tation of data.
Inferential Statistics
Deals with the conclusion drawn about the population using the
data of a sample taken from the same population.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction to Statistics
Population
A population is the totality of the observations made on all the ob-
jects (under investigation) possesssing some common specific char-
acteristcs, which are of particular interest to researchers.
Example 1
The heights of all the students enrolled at SUMMIT College in BBA
degree in Spring 2014, the wages of all employees of a mill in a given
year, etc.
Sample
A sample is a representative part of the population which is selected
to obtain information concerning the characteristics of the popula-
tion. The number of observations in a sample is called the size of
the sample which is denoted by n.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction to Statistics
Parameter
A parameter is a numerical characteristic of a population, such as
its mean or standard deviation, etc. Parameters are fixed constants
that characterize a population. They are denoted by Greek letters.
Parameter is a fixed quantity.
Statistic
A statistic is a numerical characteristic of a sample such as its mean
or standard deviation, etc. The statistics are used to draw valid
inferences about the population. They are denoted by Latin letters.
Statistic is a variable quantity.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction to Statistics
Statistical Observations
The sequence of observations, made on a set of objects included in
the sample selected frome a given population, are known as statisti-
cal observations. (The numerically recording of information is called
observation/datum).
Data
The set of observations is called data.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction to Statistics
Variable
Any characteristic, which may varies with respect to individual, time,
and (or) place. For example
No Leaves per plant
Number of Plants
Weight of any individual
Selling Price of any crop
Quality of Seed
Performance
Variables are usually represented by last alphabets as X, Y, Z etc.
Types of Variables
(i) Quantitative variable
(ii) Qualitative variable.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction to Statistics
Quantitative variable
A variable is called Quantitative variable when a characteristic can be
expressed numerically such as weight, income, number of children.
Qualitative variable
If a characteristic is non-numerical such as sex, colour education etc
the variable is called Qualitative variable.
Types of Quantitative variable
(i) Discrete variable
(ii) Continuous variable.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction to Statistics
Discrete variable
A variable which can assume some specific values within a given
range is called a discontinuous or discrete variable. For example,
number of students in a class, number of trees in a field, number of
leaves in a tree etc. A discrete variable takes on values which are
integers or whole numbers.
Continuous variable
A variable which can assume any value (fractional or integral) within
a given range is called a continuous variable. For example Height of
a plant, the temperature at a place.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction to Statistics
Measurement
The process of assigning numbers or labels to objects, persons, states
or, events in accordance with specific logically accepted rules for
representing quantities or qualities of attributes or characteristics.
Data can be classified according to levels of measurement. The level
of measurement of the data often dictates the calculations that can
be done to summarize and present the data. It will also determine
the statistical tests that should be performed. There are actually
four levels of measurement: nominal, ordinal, interval, and ratio
[Stevens 1951]. The lowest, or the most primitive, measurement
is the nominal level. The highest, or the level that gives the most
information about the observation, is the ratio level of measurement.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction to Statistics
Scales of Measurement
Nominal Scale:A scale in which objects or individuals are broken
into categories that have no numerical properties. or Possible to
place data into unordered categories i.e categories with no logical
order. For example, eye colour, religion, sex, etc.
Ordinal Scale:A scale in which objects or individuals are cate-
gorized and the categories form a rank order along a continuum.
or Possible to place data into ordered categories i.e categories with
logical order. For example, cricket teams standings in ICC ranking,
students’ grades, etc.
Interval Scale:A scale in which the units of measurement (inter-
vals) between the numbers on the scale are all equal in size. or
Equal differences in the characteristic are represented by equal dif-
ferences in the measurements, however there is no true zero point.
For example, temperature, shoe size and IQ scores, etc.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction to Statistics
Scales of Measurement
Ratio Scale:A scale in which, in addition to order and equal units
of measurement, there is an absolute zero that indicates an absence
of the variable being measured. or Equal differences in the char-
acteristic are represented by equal differences in the measurements.
The zero point is the absence of the characteristic. For example,
bank balance, weight, height, etc.
Primary and Secondary Data
According to its source of collection the statistical data may have
the following types:
(i) Primary Data
(ii) Secondary Data
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction to Statistics
Primary and Secondary Data
Primary Data:Primary data are the sequence of observed values
in their order of collection. They are most original in nature and
have not undergone any statistical treatment. They form the raw
material of a statistical inquiry.
Secondary Data:Secondary data are the sequence of observations
that have undergone any sort of statistical method at least once.
Methods of collection of Primary Data
The primary data may be collected by the following methods:
(i) Direct personal investigation
(ii) Indirect personal investigation
(iii) Investigation through quetionnaire to be filled in by the infor-
mants
(iv) Investigation through quetionnaire in charge of enumerators
(v) Estimation through local correspondents
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Introduction to Statistics
Collection of Secondary Data
Secondary data may be obtained from the following sources:
(i) Official publication
(ii) Semi official publication
(iii) Private sources
(iv) Research organizations
(v) Technical, trade, economic, commercial journals. and news pa-
pers.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Presentation of Data
Presentation of Data
A major reason for calculating statistics is to describe and summarize
a set of data. A mass of numbers is not usually very informative so
we need to find ways of abstracting the key information that allows
us to present the data in a clear and comprehensible form. In this
chapter we shall be looking at an example of a collection of data
and considering the best way of describing and summarizing it.
Frequency Distribution
A frequency distribution is a tabular arrangement of data,which
shows the distribution of the observations (or objects) among dif-
ferent classes.
The number of observations falling in a particular class is
referred to as class frequency and is denoted by ”f”.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Presentation of Data
Frequency Distribution
Data presented in the form of a frequency distribution is also
calledgrouped data. Data which have not been arranged in a
systematic order are calledraw data or ungrouped data.
The number or the values of the variables which are used to
separate two classes are calledClass Limits. The smaller num-
ber is called lower class limit and larger number is called upper
class limit.
Theclass mark or the midpointis that value which divides a
class into two equal parts.
Class intervalis the length of a class. It is obtained by:
The difference b/w the U.C.B and the L.C.B.OR
The difference b/w either two successive L.C.L or U.C.L.OR
The difference b/w two successive midpoints.
A uniform class interval is usually denoted by ”h”.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Presentation of Data
CONSTRUCTION OF A FREQUENCY DISTRIBUTION
Decide the number of classes :The number of classes is
determine by the formula i.e K=1+3.3 log(n).
Determine the range of variation of the data :R = largest
observation - smallest observation.
Determine the approximate size of class interval :Cass In-
terval = Range/No. of Classes.
Decide where to locate the class limits :Started just below
the smallest value in the data and then add class interval to
get lower class limit of the next class. Repeat this process until
the lower class limit of the last class is achieved.
Distribute the data into appropriate classes :Take an ob-
servation and marked a vertical bar ”I”(Tally) against the class
it belongs.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Presentation of Data
Example 1:
The following data is the final plant height (cm) of thirty plants of
wheat. Construct a frequency distribution.
87 91 89 88 91 87 92 90 98 95
97 96 100 101 96 98 99 100 102 99
101 105 103 107 105 106 107 112 89 98
Example 2:
The following data represent the number of goals scored by a team
in 10 matches 0,0,1,1,3,1,3,0,2,0. Make a discrete frequency distri-
bution
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Presentation of Data
Example 3:
The following data represent the gender of 10 students Male, Male,
Female, Male, Female, Male, Female, Male, Male, Female. Con-
struct frequency Distribution.
STEM AND LEAF DISPLAY
A relatively small data set can be represent by stem and leaf display.
In addition to information on the number of observations falling in
the various classes, it displays details of what those observations
actually are. In stem and leaf display the range of the variable is
divided into classes and these are indicated by numbers printed to
the left hand side of the stem usually the vertical axis which together
with knowledge of the class width determine the classes. The digits
printed on the horizontal axis starting from the stem are leaves.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Presentation of Data
Example 1:
Represent the following data by STEM-AND-LEAF DISPLAY
32 45 38 41 49 36 52
62 63 59 68 60 56 51
Taking 10 unit as the width of the class.
Taking 5 unit as the width of the class
Back-to-Back Stem-and-Leaf Display
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Presentation of Data
GRAPHICAL REPRESENTATION
Visual representation of statistical data in the form of points, lines,
areas, is known as graphical representation.
Such visual representation can be divided in to two groups.
(i) Graph
(ii) Diagram
The basic difference between a graph and a diagram is that a graph
is a representation of data by a continuous curve, while a diagram
is any other one, two or three dimensional form of visual represen-
tation.
Types of Diagram
(a) Simple bar chart
(b) Multiple bar chart
(c) Component bar chart
(d) Pie Chart
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Presentation of Data
Types of Graph
(a) Graph of time series or Historigram
(b) Histogram
(c) Frequency Polygon
(d) Frequency Curve
(e) Cumulative Frequency Polygon or Ogive
Types of Frequency Distribution
(1) Symmetrical distribution (2) Skewed distribution
Symmetrical distribution:A frequency distribution or curve is said
to be symmetrical if values equidistant from a central maximum have
the same frequencies.
Skewed distribution:A frequency distribution or curve is said to
be skewed when it departs from symmetry.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Some Important Notations
Sum (
P
X):-Sum of all the values of a variable X.
Product (
Q
X):-product of all the values of a variable X.
Absolute value (|X|):- Magnitude of the value of variable X
Deviation (X−a):- Deviation of observations of a variable X
from a, where ‘a’ is any constant
Squared Deviations (X−a)
2
:- Squared Deviations of
observations of variable X from ‘a’.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Measure of Central Tendency or Location
Measure of Location
Measures of location are designed to provide the analyst with some
quantitative values of where the center, or some other location, of
data is located.The diagrammatic representation of a set of data
can give us some impression about its distribution. Even then there
remains a need for a single quantitative measure which could be used
to indicate the center of the distribution. An average is a single value
which represents a set of data or a distribution as a whole. It is more
or less central value round which the observations in the set of data
or distribution tend to cluster. Such a central value is also called a
measure of central tendency or measure of location. The decision as
to which measure is to be used depends on the particular situation
under consideration.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Averages
Types of Averages
The following are the types of averages:
Arithmetic Mean
Median
Mode
Geometric Mean
Harmonic Mean
Arithmetic Mean
Ungrouped Data:Suppose that the observations in a sample are
x1,x2, ...,xn.The sample mean, denoted by ¯x, is
¯x=
P
n
i=1
xi
n
=
x1+x2+...+xn
n
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Averages
Arithmetic Mean
Grouped Data:If a frequency distribution of a variable X consists of
the valuesx1,x2, ...,xnwith frequenciesf1,f2, ...,fnrespectively,then
their arithmetic mean is given by
¯x=
f1x1+f2x2+...+fnxn
f1+f2+...+fn
=
P
n
i=1
fixi
P
n
i=1
fi
=
P
n
i=1
fixi
n
Weighted Arithmetic Mean
If a variable X has the valuesx1,x2, ...,xnwith their weights
w1,w2, ...,wnrespectively,then their weighted arithmetic mean, de-
noted by ¯xw, is given by
¯xw=
w1x1+w2x2+...+wnxn
w1+w2+...+wn
=
P
n
i=1
wixi
P
n
i=1
wi
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Averages
Combined Mean
If a distribution consists of K subgroups consisting ofn1,n2, ...,nk
observations (
P
nj=n) with their means ¯x1,¯x2, ...,¯xkrespec-
tively,then the combined mean ¯xcof alln=
P
njobservations is
given by
¯xc=
P
nj¯xj
P
nj
Median
Ungrouped Data:Given that the observations in a sample are
x1,x2, ...,xn.arranged in increasing order of magnitude, the sample
median is
˜x=
(
x
(n+1)/2, if n is odd,
1
2
(x
n/2+x
n/2+1)if n is even.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Averages
Median
Grouped Data:The group containing the median in a grouped
frequency distribution is located by the relation
Median= ˜x=
n
2
thvalue of the distribution =x
(n+1)/2
then
Median= ˜x=l+
h
f
(
n
2
−c)
Where
l = lower class boundary of the median class
h = class interval of the median class
f = frequency of the median class
c = cumulative frequency of the group preceding the median class
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Averages
Quantiles
Quartiles:Quartiles are the values that divide a set of data in to
four equal parts after arranging them in ascending or descending
order of magnitude. Quartiles are denoted byQ1,Q2,andQ3.Q1is
called lower quartile,Q3is called upper quartile andQ2is also called
median
for Ungrouped Data
Qj=
j(n+1)
4
thvalue of the distribution j = 1, 2, 3
for Grouped Data
Qj=l+
h
f
(
jn
4
−c) j = 1, 2, 3
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Averages
Quantiles
Deciles:Deciles are the values that divide a set of data in to ten
equal parts after arranging them in ascending order of magnitude.
Deciles are denoted byD1,D2,D3...D9.
for Ungrouped Data
Dj=
j(n+1)
10
thvalue of the distribution j = 1, 2, 3, . . . ,9
for Grouped Data
Dj=l+
h
f
(
jn
10
−c) j = 1, 2, 3, . . . ,9
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Averages
Quantiles
Percentiles:Percentiles are the values that divide a set of data
in to 100 equal parts after arranging them in ascending order of
magnitude. Percentiles are denoted byP1,P2,P3...P99.
for Ungrouped Data
Pj=
j(n+1)
100
thvalue of the distribution j = 1, 2, 3, . . . ,99
for Grouped Data
Pj=l+
h
f
(
jn
100
−c) j = 1, 2, 3, . . . ,99
Quantiles
Quantiles:Quantiles are the values that divides a set of data in to
more than two equal parts. Quartiles, Deciles and Percentiles are
collectively called Quantiles.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Averages
Mode
The mode is defined as that value in the data which occurs the
greatest number of time provided such a value exists.
for Ungrouped Data
Mode=xmf
wherexmfis the value with maximum frequency.
for Grouped Data
Mode=l+
(fm−f1)h
(fm−f1)+(fm−f2)
l = lower class boundary of the modal class
h = class interval of the modal class
fm= frequency of the modal class
f1= frequency of the class preceding the modal class
f2= frequency of the class succeeding the modal class
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Averages
Relation between Mean, Median and Mode
For Symmetrical distribution Mode = Median = Mean
For Skewed distribution Mode = 3(median) - 2(mean)
Geometric Mean
The geometric mean, G of a set of n positive valuesX1,X2,X3...Xn
is the positive nth root of the product of the values. For ungrouped
data Geometric mean is given by
G.M= (
Q
n
i=1
Xi)
1
n=Antilog(
P
n
i=1
logXi
n
)
for Grouped Data
G.M= (
Q
n
i=1
Xi
fi
)
1
P
f
i=Antilog(
P
n
i=1
filogXi
n
)
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Averages
Harmonic Mean
The Harmonic mean, H of a set of n valuesX1,X2,X3...Xnis defined
as the reciprocal of the arithmetic mean of the reciprocals of the
values.
For ungrouped data Harmonic mean is given by
H.M=
n
P
n
i=1
(
1
X
i
)
For Grouped Data
H.M=
P
fi
P
n
i=1
(
f
i
X
i
)
Note:Geometric mean and Harmonic mean are useful measure of
central tendency for averaging rates and ratios.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Averages
Relation between A.M., G.M. and H.M.
A.M.≥G.M.≥H.M.
The three means are equal only when all the observations are iden-
tical.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Measure of Dispersion
Measure of Dispersion
By dispersion we mean, the extent to which the values are spread
out from the average. A quantity that measures this characteristic
is called a measure of dispersion, or variability. Measures of central
tendency summarize what is the typical of elements of a list, but
not every element is typical. Are all the elements close to each
other? Are most of the elements close to each other? What is the
biggest difference between elements? On the average, how far are
the elements from each other? Measures of spread or variability tell
us.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Measure of Dispersion
Types of Measure of Dispersion
There are two types of Measure of dispersion:
Absolute Measure of Dispersion
Relative Measure of Dispersion
Absolute Measure of Dispersion
An absolute measure of dispersion is one that measures the disper-
sion in terms of the same units as the units of the data.
Relative Measure of Dispersion
A relative measure of dispersion is one that is in the form of a
ratio, co-efficient or percentage and is independent of the units of
measurements.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Measure of Dispersion
Types of Absolute Measure of Dispersion
The Range
The Semi-Interquartile Range or Quartile Deviation
The Mean Deviation or Average Deviation
The Variance and The Standard Deviation
Types of Relative Measure of Dispersionl
Coeffitient of Dispersion
Coeffitient of Quartile Deviation
Coeffitient of Mean Deviation
Coeffitient of Variation
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Absolute Measure of Dispersion
Range
The Range R, is the difference between the largest and the smallest
observation in a set of data.
R = largest value - smallest vaue =Xm−Xo
For group data, Range is the difference between upper boundary of
the highest class and lower boundary of the smallest class.
Quartile Deviation
The Quartile Deviation is the half of the difference between the
largest and the smallest quartile in a set of data.
Q.D.=
Q3−Q1
2
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Absolute Measure of Dispersion
Mean Deviation or Average Deviation
The mean deviation is defined as the arithmetic mean of the abso-
lute deviations of the observations measured either from the mean ,
median or mode.
The mean deviation is denoted by (M.D) and defined by
For ungrouped data
M.D.=
P
|X−
¯
X|
n
orM.D.=
P
|X−Xmed|
n
orM.D.=
P
|X−Xmode|
n
For grouped data
M.D.=
P
f|X−
¯
X|
P
f
orM.D.=
P
f|X−Xmed|
P
f
orM.D.=
P
f|X−Xmode|
P
f
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Absolute Measure of Dispersion
The Variance and Standard Deviation The Variance
The Variance is defined as the arithmetic mean of the squared de-
viations of the observations measured from their mean. Sample
variance is denoted byS
2
and defined as For ungrouped data
S
2
=
P
(X−
¯
X)
2
n
For grouped data
S
2
=
P
f(X−¯X)
2
P
f
The Standard Deviation
Variance measures the variation in the data as the square of the
units of measurements of the data so it is difficult to interpret it
preciselyMuhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Types of Absolute Measure of Dispersion
The Variance and Standard Deviation The Standard Deviation
The positive square root of the variance is called Standard Deviation.
It is denoted by S for sample For ungrouped data
S=
q
P
(X−¯X)
2
n
For grouped data
S=
qP
f(X−¯X)
2
P
f
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Symmetrical and Skewed Distributions
Symmetrical Distribution
A frequency distribution is said to be symmetrical if the values equi-
distant from the central maximum have the same frequency.
In symmetrical distribution the values of arithmetic mean,
median, mode coincide.
The two tails of the frequency curve of a distribution are
equal in length from the central value.
Skewed Distribution
A distribution that lacks symmetry with respect to a vertical axis is
said to be skewed.
If the right tail of a distribution is longer then its left tail then
the distribution is said to bepositively skewed.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Symmetrical and Skewed Distributions
Skewed Distribution
If the left tail of a distribution is longer then its right tail then
the distribution is said to benegatively skewed.
Relation between mean, median, mode:
Mean = Median = Mode For symmetrical distribution
Mean<Median<Mode For positively skewed distribution
Mode>Median>Mean For negatively skewed distribution
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Measure of Skewness
Coeffitient of Skewness:
The coefficient of skewness measures the direction of concentra-
tion and the degree to which a distribution is departed from the
symmetry. An ideal measure of skewness should have the following
desirable qualities.
1
It should be a pure numbr, i, e., independent of origin and of
unit of measurement.
2
It should be zero for a symmetrical distributin.
Sk=
P
n
i=1
(Xi−
¯
X)
3
S
3
If co-efficient of skewness = 0 the distribution is symmetrical
If co-efficient of skewness<0 the distribution is +vely skewed
If co-efficient of skewness>0 the distribution is -vely skewed
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Measure of Kurtosis
Kurtosis
The term kurtosis is used to indicate the length of the tails and
peakedness or flatness of symmetrical distributions.
If the peak of the curve becomes relatively high, then the
curve is leptokurtic.
On the other hand flat-topped curve is called platykurtic.
The curve which is neither very pecked nor very flat-topped is
called mesokurtic or normal curve.
Measure of Kurtosis
The kurtosis are measured by
K=
P
n
i=1
(Xi−¯X)
4
S
4
If K<0 The distribution is platykurtic.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Measure of Kurtosis
Measure of Kurtosis
If K = 0 The distribution is mesokurtic.
If K>0 The distribution is leptokurtic.
Five Number Summary
The five number summary of a data set consist of
The First Quartile (Q1)
The Second Quartile (Q2)
The Third Quartile (Q3)
The Maximum Value (Xm)
The Minimum Value (X0)
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Box-and-Whisker Plot
Box Plot
A box-and whisker plot is a diagrammatic representation of some
main features of an observed data. This diagram consists of a rect-
angle in the central part of the observed data and whiskers are drawn
to the lowest and highest values from the rectangle. The limits of
the box are the upper and lower quartiles. A line is drawn within
the box to indicate the position of the median.
A box-and-whisker plot is a five number summary presented in the
form of a diagram.
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Box-and-Whisker Plot
Interpretation of BOX-WHISKER plot
Line within box( median) indicates average size of the data
Length of graph / box indicates variation in the data
Position of line within box indicates the shape of the data
Line at the center of the box indicates data is symmetrical
Line above the center oif the box indicates data is -vely skewed
Line below the center of the box indicates data is +vely skewed
Identification of outlier
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Box-and-Whisker Plot
Mark inner and outer fences
Lower Inner Fence(LIF) =Q1−1.5(IQR)
Upper Inner Fence (UIF) =Q3 + 1.5(IQR)
Lower Outer Fence (LOF) =Q1−3(IQR)
Upper Outer Fence (UOF) =Q3 + 3(IQR)
Any value within Inner fences→Consistent with rest of data values
Any value outside inner but inside outer fences→Possible / sus-
pected Outlier
Any value outside outer fences→Sure outlier
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
Chebyshev’s Rule
Chebyshev’s Rule
Chebyshev’s theorem is used in describing the spread of observations
in the distribution. Regardless of how the data are distributed, (i.e
Regardless of the shape of the data).
at least (1−1/k
2
)x100% of the values will fall within k standard
deviations of the mean (for k>1)
(1−1/k
2
)x100% = 0%..........k= 1(X±1S)
(1−1/k
2
)x100% = 75%..........k= 2(X±2S)
(1−1/k
2
)x100% = 89%..........k= 3(X±3S)
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500
Intro
The Empirical Rule
The Empirical Rule
If the data distribution is approximately bell-shaped, then the inter-
val:
(¯X±1S) contains about 68% of the values
(¯X±2S) contains about 95% of the values
(
¯
X±3S) contains about 99.7% of the values
Muhammad Arslan Bhatti M.Phil,PhD in Statistics (UAF) University of Agriculture, FaisalabadStat-500