Types of Statistics

RupakRoy4 535 views 29 slides Jan 13, 2022
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

Detailed discussion about the types of statistics form Measures of Central Tendency, Measures of Dispersion, Skewness, Kurtosis, Probability Distributions and much more with their uses cases


Slide Content

Statistics-I
Type of Statistics

Types
Statistics is a branch of mathematics dealing
with the collection, analysis, interpretation,
presentation and organizing the data.
There are two types:
1.Descriptive statistics
2.Inferential statistics
Rupak Roy

1. Inferential statistics: refers to the sample
of data that is used to represent the
general population.
In simple words it provides conclusions by
taking samples of data from the population.
Definition
Rupak Roy

Definition
2.Summary statistics are used to summarize
a set of observations in order to
communicate as much as information
about the data as possible. It is the part
of descriptive statistics and is used to
basically summarize or describe a set of
observations.
It is also known as Descriptive statistics
Rupak Roy

Example
If the weight of the population is
45 kg
57kg
72 kg
52 kg
Then what is the summary of the weight for the
population, we can say the average weight of
the population is 56.5 kg and therefore we are
able to describe the population in the simplest
way as possible.
Now let’s study the summary statistics in detail.
Rupak Roy

Types
Summary statistics
Measures of Central
Tendency
1 . Mean
2 . Median
3 . Mode
5 . Geometric Mean
Measures of
Dispersion
1.Standard
Deviation
2.Variance
3.Interquartile
Range
Others
1.Co efficient
2.Skewness
3.Kurtosis
4.Probability
Distributions.
5.Distribution plot
Rupak Roy

Definitions
○Measures of central tendency: Is the value that describes
which group of data clusters around a central value. In
simple words, it is a way to describe the center of a data set.
Again what is the center of data? A single number that
summarizes the entire dataset using techniques such as
mean/average or median of the dataset.
○Measures of Dispersion: “dispersion (also called
as variability, scatter or spread) is the extent to a distribution
of data is stretched or squeezed“.
Here in the graph we can see the
distribution of data (assume population)
is more stretched at the right side
ranging from 50 to 80
Rupak Roy

Measures of Central Tendency
1.Mean: Is the average of observations.
Most effective when data is not heavily skewed.
2.Median: Represents the middle value of the dataset.
Useful for skewed data.
We will talk about skewed data in the upcoming slides.
3.Mode: Means the maximum number of times a
particular value has occurred in the dataset.
4.Geometric mean: nth root of a product of ‘n’ numbers. It
is used when we want to get the average rate of the
event and the event rate is determined by multiplication.
Rupak Roy

○Formula for calculating Geometric Mean
GM = And if we have two numbers then
geometric mean will be 2 23*56
example: Geometric Mean of 23, 56, 66?

3 23 * 56*66
3 85008 = 43.9696761which means 3 times of 43.9696761
is 85008
Note:
If one of the observation in the event is zero, Geometric
Mean becomes Zero and also it doesn’t works with
negative numbers like -1 , -4 , -5 and so on.
Rupak Roy

Calculation of Mode;
For ungrouped data = Max. no of number of value occurred
Example: 23,45,76,33,54,33,76,33 Therefore Mode = 33
For grouped data = = {(L + Delta 1) / Delta 1+Deta 2 } * i

Where Delta 1= f1 +f0
and Delta 2 = f1- f2
Nowadays, we don’t have to worry about the calculation as in
any statistical software's like R, excel it will automatically do the
intense calculation for large amount of data.
For more in-depth information you can visit this website
https://www.mathsisfun.com/data/frequency-grouped-mean-median-mode.html
Mathematically
‘Delta’ can be
represented as
Rupak Roy

Measures of dispersion
Standard Deviation is basically a measure of how near or far the
observations are from the mean.
Variance: is the fact or quality of being different, divergent or
inconsistent. A value of zero means that there is no variability, all the
values in the data set are the same.
Let’s understand this with the help of an example.
Rupak Roy

Rupak Roy

Measures of dispersion
Interquartile Range: is a measure of variability,
by dividing a data set into parts i.e. in quartiles .
Let’s say,
Q1 is the middle value in the first half of the data set.
Q2 is the median value.
Q3 is the middle value in the
second half of the data set.
Therefore, interquartile range (IQR) = Q3 – Q1
Rupak Roy

Example: Interquartile Range:
Assume we have a data set of 9 rows or values
6,4,6,8,10,14,18,19,23.
Where, Median = 10 ( Q2 )
As it has an even number of data values for both half
of the dataset, the middle value will be the average of
the two middle values for each half of the data set.
Q1 = (4+6)/2 =5 , Q3 = (18+19)/2= 37/2 =18.5
Therefore, interquartile range (IQR) = Q3 – Q1
= 18.5 – 5 = 13.5
Rupak Roy

Skewness: Refers to the lack of symmetry or imbalance in data
distribution.
In a symmetric distribution the data is
normally distributed where mean,
median, mode is at the same point.
However in real life data is never perfectly
distributed, hence we call it skewed data.
If the left side has longer tail than the mass
distribution of data is concentrated on the right
side which is known as negatively skewed

Rupak Roy

If the right side has longer tail than the
mass distribution of data is
concentrated on the left side is
known as positively skewed.
Here is the summary of all the Skewness as shown in the figure below:
Rupak Roy

Example (skewed data)
Temperature (*c)
10
40
35
33
35
-----------
153 (Total)
Mean = 153/5 = 30.6, if we apply mean is 30.6
that will be incorrect since we can see maximum
number of values are above 35.
So we have to use median For Ungrouped data
((n+1)/2)th
That will be ((5+1)/2)th = 6/2 = 3
i.e. 3th term which is 35.
For grouped data:
Where L, lower class boundary of the group containing the group.
B, Cumulative frequency of the groups
G, Frequency of the median group
W, width/Range of the group
Again, we don’t have to worry about the calculation, as in any statistical software's like R,
excel it will automatically calculate the intense calculation for large amount of data. For
more in-depth information you can visit this website
https://www.mathsisfun.com/data/frequency-grouped-mean-median-mode.html
Rupak Roy

Kurtosis: is a measure of whether data are peaked or flat relative to the
normal distribution.
(+) Leptokurtic
(-) PlatyKurtic
(0) Meskurtic
(+) Leptokurtic
This means the distribution is more clustered near the mean and has
relativity less standard deviation.
(-) Platykurtic
Where the distribution is less clustered around the mean and a standard
deviation more than Leptokurtic
(0) Meskurtic is typically measured with respect to the normal distribution.
Meskurtic has tails similar to normal distribution i.e. neither high nor low,
rather it is consider to be a baseline for the other two’s.
Rupak Roy

Now how to check the data is skewed or not
in Excel:
=skew(select the range of values/numbers)
=skew(10.24,9.48……….-0.42,-0.95)
= - 0.27 means Negatively skewed.
And to check the Kurtosis in Excel
=kurt(select the values/numbers)
=kurt(10.24,9.48……….-0.42,-0.95)
= -1.6 means it is PlatyKurtic
Rupak Roy

Population versus Sample
Population refers to the whole data,
which in turn very difficult to collect and
analyze.
So it is easier to take a subset of the
population which is referred as sample
that represents the population and
provides the same valuable insights.
Rupak Roy

What is probability?
Is the quality or state of being probable,
which something is likely to happen or be
the event?
Simply, it is the measure of the likelihood
that an even will occur .
Probability is quantified as a number
between 0 and 1.
Example: if probability is ½ i.e. 0.5 (or 50%)
likelihood that an event A or event B will
occur.
Rupak Roy

Probability
Formula for calculating probability:
P = number of outcomes from events a / total number of
possible outcomes
Example:
A bag has 10 balls from which 6 is blue,
3 is red, 1 is green. What is the probability of drawing a
particular ball?
Probability of drawing a blue ball = 6/10=0.6
Probability of drawing a red ball = 3/10 = 0.3
Probability of drawing a green ball = 1/10 = 0.1
Probability of drawing a ball not green = 9/10 = 0.9
Or
1 – ( probability of drawing a green ball ) = 1-0.1 = 0.9
Rupak Roy

Probability with replacement
Without Replacement
Say we take 2 marbles from a bag of 10
marbles, one after another. We don’t put 1
marble back after picking it up. So what is the
probability of seeing 2 marbles?
p = (2/10 * 1/10)
With replacement
Now if we put back the marble, what will be the
probability of seeing 2 marbles.
P= (2/10 *2/10)
Rupak Roy

Independent events: two events are said to be
independent of each other, this means that the
probability that if one event occurs in no way affects
the probability of the other event occurring.
Example: when you rolled a die and flipped a coin,
the two events are independent to each other and
therefore doesn’t effects the each other’s event.

Mutually Exclusive Events: if A happens, B cannot
happen. A clear example is the set of outcome of a
single coin toss, which can result in either heads or
tails, but not both.
Some important concepts.
Rupak Roy

Random Variable
Is a variable whose values are of a random phenomenon.
There are two types of random variables :
Discrete random variable is one which may take on only
of distinct values such as 2, 3, 4.
Continuous random variable is one which takes an infinite
number of possible values usually measurements like
height, weight, amount such as weight can be measured
to infinite for example between (45 to 46)kilos it can be
45.1, 45.2……45.99999 to infinite…….46
In mathematics, decimals are infinite between two
numbers.
Rupak Roy

Probability distribution
A probability distribution is a depiction of all possible
values and likelihoods that a random variable can
take within a given range.
So a probability distribution includes all possible
outcomes of an experiment repeated n times.
A discrete probability distribution is a list of
discrete values.
A continuous probability distribution is the one
which takes an infinite number of possible values.
We will study those in depth in our next slides.
Rupak Roy

Recap
What have we learned so far?
qDescriptive statistics, Inferential statistics.
qProbability.
qContinuous and Discrete Random Variables.
qSkewness.
Rupak Roy

next
In detail about probability distribution and
how it is used in predicting a future event.
Rupak Roy

To be continued.
Rupak Roy