Detailed discussion about the types of statistics form Measures of Central Tendency, Measures of Dispersion, Skewness, Kurtosis, Probability Distributions and much more with their uses cases
Size: 856.07 KB
Language: en
Added: Jan 13, 2022
Slides: 29 pages
Slide Content
Statistics-I
Type of Statistics
Types
Statistics is a branch of mathematics dealing
with the collection, analysis, interpretation,
presentation and organizing the data.
There are two types:
1.Descriptive statistics
2.Inferential statistics
Rupak Roy
1. Inferential statistics: refers to the sample
of data that is used to represent the
general population.
In simple words it provides conclusions by
taking samples of data from the population.
Definition
Rupak Roy
Definition
2.Summary statistics are used to summarize
a set of observations in order to
communicate as much as information
about the data as possible. It is the part
of descriptive statistics and is used to
basically summarize or describe a set of
observations.
It is also known as Descriptive statistics
Rupak Roy
Example
If the weight of the population is
45 kg
57kg
72 kg
52 kg
Then what is the summary of the weight for the
population, we can say the average weight of
the population is 56.5 kg and therefore we are
able to describe the population in the simplest
way as possible.
Now let’s study the summary statistics in detail.
Rupak Roy
Types
Summary statistics
Measures of Central
Tendency
1 . Mean
2 . Median
3 . Mode
5 . Geometric Mean
Measures of
Dispersion
1.Standard
Deviation
2.Variance
3.Interquartile
Range
Others
1.Co efficient
2.Skewness
3.Kurtosis
4.Probability
Distributions.
5.Distribution plot
Rupak Roy
Definitions
○Measures of central tendency: Is the value that describes
which group of data clusters around a central value. In
simple words, it is a way to describe the center of a data set.
Again what is the center of data? A single number that
summarizes the entire dataset using techniques such as
mean/average or median of the dataset.
○Measures of Dispersion: “dispersion (also called
as variability, scatter or spread) is the extent to a distribution
of data is stretched or squeezed“.
Here in the graph we can see the
distribution of data (assume population)
is more stretched at the right side
ranging from 50 to 80
Rupak Roy
Measures of Central Tendency
1.Mean: Is the average of observations.
Most effective when data is not heavily skewed.
2.Median: Represents the middle value of the dataset.
Useful for skewed data.
We will talk about skewed data in the upcoming slides.
3.Mode: Means the maximum number of times a
particular value has occurred in the dataset.
4.Geometric mean: nth root of a product of ‘n’ numbers. It
is used when we want to get the average rate of the
event and the event rate is determined by multiplication.
Rupak Roy
○Formula for calculating Geometric Mean
GM = And if we have two numbers then
geometric mean will be 2 23*56
example: Geometric Mean of 23, 56, 66?
3 23 * 56*66
3 85008 = 43.9696761which means 3 times of 43.9696761
is 85008
Note:
If one of the observation in the event is zero, Geometric
Mean becomes Zero and also it doesn’t works with
negative numbers like -1 , -4 , -5 and so on.
Rupak Roy
Calculation of Mode;
For ungrouped data = Max. no of number of value occurred
Example: 23,45,76,33,54,33,76,33 Therefore Mode = 33
For grouped data = = {(L + Delta 1) / Delta 1+Deta 2 } * i
Where Delta 1= f1 +f0
and Delta 2 = f1- f2
Nowadays, we don’t have to worry about the calculation as in
any statistical software's like R, excel it will automatically do the
intense calculation for large amount of data.
For more in-depth information you can visit this website
https://www.mathsisfun.com/data/frequency-grouped-mean-median-mode.html
Mathematically
‘Delta’ can be
represented as
Rupak Roy
Measures of dispersion
Standard Deviation is basically a measure of how near or far the
observations are from the mean.
Variance: is the fact or quality of being different, divergent or
inconsistent. A value of zero means that there is no variability, all the
values in the data set are the same.
Let’s understand this with the help of an example.
Rupak Roy
Rupak Roy
Measures of dispersion
Interquartile Range: is a measure of variability,
by dividing a data set into parts i.e. in quartiles .
Let’s say,
Q1 is the middle value in the first half of the data set.
Q2 is the median value.
Q3 is the middle value in the
second half of the data set.
Therefore, interquartile range (IQR) = Q3 – Q1
Rupak Roy
Example: Interquartile Range:
Assume we have a data set of 9 rows or values
6,4,6,8,10,14,18,19,23.
Where, Median = 10 ( Q2 )
As it has an even number of data values for both half
of the dataset, the middle value will be the average of
the two middle values for each half of the data set.
Q1 = (4+6)/2 =5 , Q3 = (18+19)/2= 37/2 =18.5
Therefore, interquartile range (IQR) = Q3 – Q1
= 18.5 – 5 = 13.5
Rupak Roy
Skewness: Refers to the lack of symmetry or imbalance in data
distribution.
In a symmetric distribution the data is
normally distributed where mean,
median, mode is at the same point.
However in real life data is never perfectly
distributed, hence we call it skewed data.
If the left side has longer tail than the mass
distribution of data is concentrated on the right
side which is known as negatively skewed
Rupak Roy
If the right side has longer tail than the
mass distribution of data is
concentrated on the left side is
known as positively skewed.
Here is the summary of all the Skewness as shown in the figure below:
Rupak Roy
Example (skewed data)
Temperature (*c)
10
40
35
33
35
-----------
153 (Total)
Mean = 153/5 = 30.6, if we apply mean is 30.6
that will be incorrect since we can see maximum
number of values are above 35.
So we have to use median For Ungrouped data
((n+1)/2)th
That will be ((5+1)/2)th = 6/2 = 3
i.e. 3th term which is 35.
For grouped data:
Where L, lower class boundary of the group containing the group.
B, Cumulative frequency of the groups
G, Frequency of the median group
W, width/Range of the group
Again, we don’t have to worry about the calculation, as in any statistical software's like R,
excel it will automatically calculate the intense calculation for large amount of data. For
more in-depth information you can visit this website
https://www.mathsisfun.com/data/frequency-grouped-mean-median-mode.html
Rupak Roy
Kurtosis: is a measure of whether data are peaked or flat relative to the
normal distribution.
(+) Leptokurtic
(-) PlatyKurtic
(0) Meskurtic
(+) Leptokurtic
This means the distribution is more clustered near the mean and has
relativity less standard deviation.
(-) Platykurtic
Where the distribution is less clustered around the mean and a standard
deviation more than Leptokurtic
(0) Meskurtic is typically measured with respect to the normal distribution.
Meskurtic has tails similar to normal distribution i.e. neither high nor low,
rather it is consider to be a baseline for the other two’s.
Rupak Roy
Now how to check the data is skewed or not
in Excel:
=skew(select the range of values/numbers)
=skew(10.24,9.48……….-0.42,-0.95)
= - 0.27 means Negatively skewed.
And to check the Kurtosis in Excel
=kurt(select the values/numbers)
=kurt(10.24,9.48……….-0.42,-0.95)
= -1.6 means it is PlatyKurtic
Rupak Roy
Population versus Sample
Population refers to the whole data,
which in turn very difficult to collect and
analyze.
So it is easier to take a subset of the
population which is referred as sample
that represents the population and
provides the same valuable insights.
Rupak Roy
What is probability?
Is the quality or state of being probable,
which something is likely to happen or be
the event?
Simply, it is the measure of the likelihood
that an even will occur .
Probability is quantified as a number
between 0 and 1.
Example: if probability is ½ i.e. 0.5 (or 50%)
likelihood that an event A or event B will
occur.
Rupak Roy
Probability
Formula for calculating probability:
P = number of outcomes from events a / total number of
possible outcomes
Example:
A bag has 10 balls from which 6 is blue,
3 is red, 1 is green. What is the probability of drawing a
particular ball?
Probability of drawing a blue ball = 6/10=0.6
Probability of drawing a red ball = 3/10 = 0.3
Probability of drawing a green ball = 1/10 = 0.1
Probability of drawing a ball not green = 9/10 = 0.9
Or
1 – ( probability of drawing a green ball ) = 1-0.1 = 0.9
Rupak Roy
Probability with replacement
Without Replacement
Say we take 2 marbles from a bag of 10
marbles, one after another. We don’t put 1
marble back after picking it up. So what is the
probability of seeing 2 marbles?
p = (2/10 * 1/10)
With replacement
Now if we put back the marble, what will be the
probability of seeing 2 marbles.
P= (2/10 *2/10)
Rupak Roy
Independent events: two events are said to be
independent of each other, this means that the
probability that if one event occurs in no way affects
the probability of the other event occurring.
Example: when you rolled a die and flipped a coin,
the two events are independent to each other and
therefore doesn’t effects the each other’s event.
Mutually Exclusive Events: if A happens, B cannot
happen. A clear example is the set of outcome of a
single coin toss, which can result in either heads or
tails, but not both.
Some important concepts.
Rupak Roy
Random Variable
Is a variable whose values are of a random phenomenon.
There are two types of random variables :
Discrete random variable is one which may take on only
of distinct values such as 2, 3, 4.
Continuous random variable is one which takes an infinite
number of possible values usually measurements like
height, weight, amount such as weight can be measured
to infinite for example between (45 to 46)kilos it can be
45.1, 45.2……45.99999 to infinite…….46
In mathematics, decimals are infinite between two
numbers.
Rupak Roy
Probability distribution
A probability distribution is a depiction of all possible
values and likelihoods that a random variable can
take within a given range.
So a probability distribution includes all possible
outcomes of an experiment repeated n times.
A discrete probability distribution is a list of
discrete values.
A continuous probability distribution is the one
which takes an infinite number of possible values.
We will study those in depth in our next slides.
Rupak Roy
Recap
What have we learned so far?
qDescriptive statistics, Inferential statistics.
qProbability.
qContinuous and Discrete Random Variables.
qSkewness.
Rupak Roy
next
In detail about probability distribution and
how it is used in predicting a future event.
Rupak Roy