Slide 1zjzckkasjasfjsajkfakjlasfasajfdfjaksdffj.pdf

BirBetalMatketing 8 views 36 slides May 08, 2024
Slide 1
Slide 1 of 36
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36

About This Presentation

smaskjlkjasashdffasfdlkaskdfjasidfjasfdlkasdlffjasdaslfjfaslfjas


Slide Content

Research needs good understanding of data analysis
Vikash Raj Satyal
([email protected])
Summarize your Data:

What to look in the dataset?
If our study have a large data set, we
(researcher) are interested to know :-
•What the central value is,
•What is the spread from center,
•What is the shape & size of data
distribution

Major economic dataset
Questions
•What is percapita GDP?
•Whose percapita GDP is this?
•Did you earn $1191 in this FY
142920(Rs.126,018)? (Rs11,910monthly)

•Nepali people earn about
55 times low percapita GDP
than USA, and
165 times lower than
Monacopeople

Nepali Database

Research Paradigm
3. Survey(Collect data)
4. Statistical analysis
5. There is not enough evidence to
support research(alternative)
hypothesis(H
A)
6. Res. Hypo accepted
H
Ais true
= Failure of research hypo.
7. Report writing
1. Setup research
hypo/Refine(Lit Review)
2. Develop instruments
5a. Report writing
7

Why Dolpa&
Mugualso have
highest annual
growth rate?
Why Achham,
Palpahas one of
the lowest
growth rate?
Mugu
Dolpa

•What is the general IQ of US university students?
•In the US the mean IQ for persons completing no more than a…..
•Bachelor’s degree 113 (80th centile)
•Master’s degree 117 (87th centile)
•PhD, LLD, MD 124 (95th centile)

Central Tendency in large sample data
In any large data set, data are
clustered around center. So
researchers focus to find out
that central value.
Depending on the shape of the
data distribution center is
calculated differently using
different statistical formula

When….

Statistical way of measuring
the center of a data set
•Mean(AM, GM, HM, Weighted mean)
•Median
•Mode
•Partition values

Median not mean, for:
(i)Open End Classes.
(ii)unequal class interval data table.
(ii)When data has several extreme values(outliers).
(iii) qualitative data( in frequency).
(IV) When data strongly lack normality

Quantiles
or Partition values
•Quartiles (3)
•Deciles (9)
•Percentiles (99)
•Quintile(5) (Not Quantile)

Mode is most frequently occurring value
•Less used
•Popular in business and industry
•Only way to locate central value when data is nominal
(How many type A sold? most preferred flavor of ice cream)

Mode & symmetry

Which Average is better?
AM is best for interval data, however it should not be used :
•For highly skewed data
•in open end classes.
•When there are very large and very small items(outliers).
•In case of average ratio and rate of change.
Median is the best average for:
•open end classes
•Skewed data or in presence of outliers
•For ordinal qualitative data eg.: less honest, honest, very honest
Mode is used for qualitative nominal data frequently used in
business and industry

Does Shape and Size of the data matters?
•Elongation of left or right tail is Skewness
•skewness described dataset’s symmetry –or lack of
symmetry.
•A perfectly symmetrical data set will have a skewness of
0.

Skewness
•Negative (left) skewness indicates more small values(on left tail)
•Positive (right) skewness indicates more large values(on right tail)

•kurtosis measures extreme values in either tail.
•Normal curve has no Kurtosis
•Kurtosis is measured comparing
the Normal curve

Calculation in EXCEL:
Statistics for lexp(life expectancy) using NHDR2014.xlsx

Calculation in EXCEL:
Statistics for priceusing auto.xlsx

Use data, nhdr2014 to calculate the following
1.Average life expectancy (‘life’)
2.Average gdppercapita (‘income’)
3.Average life expectancy (‘life’) of 3 ecologies (eg, average life(mountain)= …. )
4.Calculate Q1, Q2, Q3 of ‘income’
5.Using 3 quartiles of ‘income’ we can divide any other data in 4 equal parts.
Make a new variable, call it ‘groups’, that will have 4 value-labels according to
below criteria:
‘poor’ if below Q1
‘below average’, if between Q1 to Q2,
‘above average’, if between Q2 to Q3
‘rich’if above Q3
6.Find the average of ‘life’ & ‘hdi’ for this newly created variable with 4 groups
7.How many ‘districts’ falls in each of these ‘groups’? And which district has the
highest & lowest ‘life’ value that falls in each of these 4 ‘groups’?
8.Save this data for your future use

Dispersion in data is meaningful

Central value alone can disguise the picture

Variability is beauty of the wild nature
•Geographical variation generates
variety in species of flora and fauna
•Ethnography –cultural diversity
•Epidemiology treats variation in
disease

How to measure data dispersion?
Range
Standard Deviation
Quartile Deviation
Coefficient of variation

1. Range
Range= Largest value–Smallest value
•High Range in temperature acts for desertification
•Range of mobile sets
•Range of social disparity

2. Quartile deviation (semi inter quartile range)
•Inter-quartile Range = Q
3–Q
1
•Quartile deviation (Q.D.) :
??????.??????.=
��.��−��.��
�
=�.��

3. Variance & Standard Deviation
•Most popular measure of variation
•It uses all observations
•Std(standard deviation) is the square root of variance
•Std= ??????????????????????????????????????????�

Sample VS population VARIANCE
For Papulation
s
2
=
(??????− ??????)²
??????
=
??????²
??????

??????
??????
2
(individual data)
s
2
=
??????(??????− ??????)²
??????
=
????????????²
??????

????????????
??????
2
Grouped data
For sample
S
2
=
(??????− ??????)²
??????−1
Also, S
2
=
??????
??????−1
s
2
S
2
=
??????
??????−1
s
2
=
??????
??????−1
�??????
2
??????

�??????
??????
2
When n ∞, sample mean population mean

Example: Variance and stdof the life
of electric bulbs(in hours)
Length of lifeNo. of bulbs
500–700 5
700–900 11
900–1100 26
1100–1300 10
1300–1500 8
Length of
life
No. of
bulbs
mid-
value
f X fx fx2
500–700 5 600 3000 1800000
700–900 11 800 8800 7040000
900–1100 26 1000 2600026000000
1100–130010 1200 1200014400000
1300–1500 8 1400 1120015680000
SUM 60 6100064920000
Mean = 1016.67
Variance =48388.89
Std= 219.9747

4. Coefficient of Variation(C.V.)
The co-efficient of variationis the relative measure based on the
standard deviation and is defined as the ratio of the standard
deviation to the mean expressed in percent.
C.V. =
??????
μ
x100%
It is used to compare the compactness of two or more data
Smaller C.V. indicates consistent or less variable data
C.V. is unit-less so data in same or different units can be compared
by it. eg. Weights in KG and in Pounds

Which type of electric bulbs has better consistency in life span?
Length of life
No. of
bulbs(alpha, a)
No. of
bulbs(beta, b)
fa fb
500–700 5 4
700–900 11 30
900–1100 26 12
1100–1300 10 8
1300–1500 8 6
Length of life
# bulbs
(alpha, a)
# bulbs
(beta, b)
Mid-value
fa fb X Xfa Xfb X2fa X2fb
500–700 5 4 600 3000240018000001440000
700–900 11 30 800 880024000704000019200000
900–1100 26 12 1000 26000120002600000012000000
1100–1300 10 8 1200 1200096001440000011520000
1300–1500 8 6 1400 1120084001568000011760000
SUM 60 60 61000564006492000055920000
mean(a)1016.7 mean(b) 940.0
std(a)=220.0 std(b)= 220.0
CV(a)21.64% CV(b) 23.4%

Hans Rosling
(27 July 1948 –7 February 2017)
most admired TED shows
Swedish epidemiologist with high data exploratory power
Gapminder foundation
2014 second time in Nepal from UNESCO
How not to be ignorant /The Joy of Statistics
( first 5 minutes of the total 1 hours Video)
http://www.gapminder.org/videos/the-joy-of-stats/

Thanks
Tags