GIS Pre PhD Research Methodology (1).ppt

BiswajitRath23 25 views 76 slides Jul 05, 2024
Slide 1
Slide 1 of 76
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76

About This Presentation

The editing of data is the first step of data processing.
Editing of data is a process of examining the collected raw data in order to detect errors and omissions and to correct these when possible.
In the process of editing, a careful scrutiny of the completed questionnaires and/or schedules is...


Slide Content

Measures of central tendency
Dr. B.Umaprasada Rao
Assistant Professor
Department of Mathematics
Institute of Science
GITAM (Deemed to beUniversity)
Visakhapatnam
Operations in Data Processing &

Operations in Data Processing
Editing
Coding
Classification
Tabulation

Stage I –Editing
The editing of data is the first step of data
processing.
Editing of data is a process of examining the
collected raw data in order to detect errors and
omissions and to correct these when possible.
In the process of editing, a careful scrutiny of the
completed questionnaires and/or schedules is
made.
Editing of data is done to ensure that the data
are accurate, consistent with other data, uniformly
entered and possibly complete

During the process of editing, it is also seen that
Data have been well arranged to facilitate the
further steps, i.e. coding and tabulation.
Editing, thus involves scrutinizing the collected
data to identify and minimize errors,
incompleteness, misclassification and gaps in the
information obtained from respondents, to the
extent possible.
In large scale surveys, the company undertaking
the research project appoints supervisors, or
editors, for Editing the data.

Depending upon the stages at which editing is
done, Editing is classified into two types:
A. Field Editing
Field editing consists in the review of the reporting
forms by the investigator for completing
(translating or rewriting) what the latter has
written in abbreviated and/or in illegible form at
the time of recording the respondents’ responses.
This type of editing is necessary in view of the fact
that individual writing styles often can be difficult
for others to decipher.

B. Central Editing
Central editing should take place when all forms
or schedules have been completed and returned
to the office. This type of editing implies that all
forms should get a thorough editing by a single
editor in a small study and by a team of editors in
case of a large inquiry.

Operations in Data Processing
Stage II -Coding
Coding is the process of assigning
symbols to answers, so that responses
can be put into a limited number of
categories or classes.
The symbols assigned can be numerical
or alphabetical or both.

Stage III –Classification
Voluminous raw data collected through a survey
must be reduced into homogeneous groups in
order to facilitate meaningful analysis. This is
achieved through Classification of data.
Classification (Class) refers to, arranging data in
groups or classes on the basis of some common
characteristics.
Data having a common characteristic are placed
in one class and in this way, the entire data get
divided into a number of classes.

Definitions
❑According to Conner, "Classification is the
process of arranging things (either normally or
notionally) in groups or classes according to their
resemblances and affinities and give expressions
of the unity attributes that may subsist amongst a
diversity individuals“.
❑According to Secrist, "Classification is the
process of arranging data into sequences and
groups according to their common characteristics

Purpose of Classification
A. Simplifying and condensing the data.
B. Comparison of characteristics.
C. Render data for tabulation and
statistical analysis.
D. Drawing meaningful conclusions.
E. To study the relationships.

Types of Classification
Classification can be of the following two
types
A. Classification Based on Attributes.
B. Classification based on Class
Intervals.

Stage IV –Tabulation
Tabulation refers to the process of
arranging data in a tabular format.
Usually, data is presented in the form
of statistical tables, which undergo
further analysis.
In other words, tabulation is an orderly
arrangement of data in columns and
rows.

Parts of Table
a) Table number
b) Table title
c) Head notes (also known as prefatory notes)
d) Captions (Column headings)
e) Stubs (Row Headings)
f) Body of the table
g) Foot note
h) Source note

Purpose of Tabulation
a)Tables present data in a condensed manner.
b) Data presented is easily understandable.
c) To furnish maximum information
d) To facilitate easy comparison between two or
more parameters of interest.
e) Facilitates comparison.
f) Data in tables can be subjected to various
statistical computations (analysis of data).

Classification of Tabulation
a) Simple Tabulation
b) Complex Tabulation (CrossTabulation)

Statistics
Definition(Webster): Statistics as “classified facts
representing the conditions of the people in a state
… especially those facts which can be stated
numbers or in any other tabular or classified
arrangement.
Another definition(Bowley): Numerical statements
of facts in any department of enquiry placed.

There are two major areas of Statistics
1. Descriptive Statistics: Descriptive statistics
concern the development of certain indices from
the raw data.
2. Inferential statistics: Inferential statistics
concern with the process of generalization.

The important statistical measures that are used to
summarize the survey or research data are:
(i)Measures of central tendency
(ii)Measures of dispersion
(iii)Measures of asymmetry(skewness)
(iv)Measures of relation ship

•Measures of central tendency are also usually
called as the averages
•They give us an idea about the concentration of
the values in the central part of the distribution.
•The following are the five measures of central
tendency that are in common use:
•(i) Arithmetic mean (ii) Median (iii) Mode
(iv) Geometric mean and (v) Harmonic mean
(vi) Weighted mean.

MEASURE OF CENTRAL TENDENCY
Median Mean Mode
The middle value of the data Most commonly occurring value
The average of the data

Mean(Average)

Merits:
Iteasytounderstandandeasytocalculate.
Itisbaseduponalltheobservations.
Itisfamiliartocommonmanandrigidlydefined.
Itiscapableoffurthermathematicaltreatment.
Itisaffectedbysamplingfluctuations.Henceitismore
stable.

Demerits
It cannot be determined by inspection.
Arithmetic mean cannot be used if we are dealing with
qualitative characteristics, which cannot be measured
quantitatively like caste, religion, sex.
Arithmetic mean cannot be obtained if a single observations
is missing or last.
Arithmetic mean is very much affected by extreme values.

Find mean days of confinement after delivery in the following series:-
Days of No.of Patients
confinement
6 5
7 4
8 4
9 3
10 2

Median
1.Measure of central tendency.
2.The median is determined by sorting the data
set from lowest to highest values and taking the
data point in the middle of the sequence.
3.Middle value in ordered sequence
•If odd n, middle value of the sequence.
•If even n, Average of two middle values.
4.Not effected by extreme values.

Merits
Itisrigidlydefined.
Itiseasytounderstandandeasytocalculate.
Itisnotatallaffectedbyextremevalues.
Itcanbecalculatedfordistributionswithopen-endclasses.
Medianistheonlyaveragetobeusedwhiledealingwith
qualitativedata.
Canbedeterminedgraphically.

Demerits
In case of even number of observations median
cannot be determined exactly.
It is not based on all the observations.
It is not capable of further mathematical
treatment.

Findoutthemedianfornumberofsportsinjuries
happenedincricketinallterms
37,57,65,46,12,14,19,23,56,78,5,33.

For grouped data

Findthemedianweightof590infantsborninahospitalinone
yearfromthefollowingtable.
Weightsifinfantinkg No.ofinfants
2.0-2.5 37
2.5-3.0 117
3.0-3.5 207
3.5-4.0 155
4.0-4.5 48
4.4andabove 26

WeightsifinfantinkgNo.ofinfantsCumulative
frequency
2.0-2.5 37 37
2.5-3.0 117 154
3.0-3.5 207 361
3.5-4.0 155 516
4.0-4.5 48 564
4.4andabove 26 590
N/2=590/2=295
Median class 3.0-3.5 so L=3.0,f=207,Cf=154,h=0.5

For grouped data
Calculate the median for the following data
Series: class interval Frequency
5-9 2
10-14 11
15-19 26
20-24 17
25-29 8
30-34 6
35-39 3
40-44 2
45-49 1

class interval Frequency cumulative frequency
5-9 2 2
10-14 11 13
15-19 26 39
20-24 17 56
25-29 8 64
30-34 6 70
35-39 3 73
40-44 2 75
45-49 1 76

Mode
1.Measure of central tendency.
2.The mode is most frequently occurring value in
the data set.
3.May be no mode or several modes.

Merits
Mode is readily comprehensible and easy to
calculate.
Mode is not at all affected by extremvalues.
Mode can be conveniently located even if the
frequency distribution has class intervals of
unequal magnitude.
Open-end classes also do not pose any problem
in the location of mode.
Mode is the average to be used to find the ideal
size.

Demerits
•Mode is ill defined.
•It is not based upon all the observations.
•It is not capable of further mathematical
treatment.
•As compared with mean, mode is affected to a
great extent by fluctuations of sampling.

Mode example
No mode
Raw data 10.3 4.9 8.9 11.7 6.3 7.7
One mode
Raw data 6.3 4.98.9 6.3 4.94.9
More than one mode
Raw data 21 2828 41 4343

Mode for ungrouped data
2,2,3,4,6,7,4,4,4,4,8,9,0 mode is 4
10,10,3,3,4,2,1,6,7 mode is 10 and 3
10, 34,23,12,11,3,4 no mode

Mode for grouped data

Q. Find the mode for group data
Age group No.of persons
20-30 3
30-40 20
40-50 27
50-60 15
60-70 9

Solution

Q. Calculate the mode for the following
frequency distribution
IQ Range Frequency
90-100 11
100-110 27
110-120 36
120-130 38
130-140 43
140-150 28
150-160 16
160-170 1

Relation between mean, median and
mode
Mean=3*Median-2*mode

Summary of Central Tendency
Measures
Measure Description
Mean Balance Point
Median Middle value when ordered
Mode Most frequent

Calculate Mean, Median, Mode
Age Group No.of Patients
25-30 4
30-35 3
35-40 2
40-45 3
45-50 4
50-55 8
55-60 6

The following table gives the frequency distribution of marks
obtained by 2300 medical students of Gujarath in MCQ of PSM
exam. Find Mean, Median, Mode.
Marks No. of Students
11-20 141
21-30 221
31-40 439
41-50 529
51-60 495
61-70 322
71-80 153

Geometric mean

Geometric mean for Group data

Geometric mean

Harmonic mean
Harmonic mean (formerly sometimes called the
sub contrary mean) is one of several kinds of
average.
The harmonic mean is a very specific type of
average.
It’s generally used when dealing with averages
of units, like speed or other rates and ratios.

Rahul drives a car at 20mph for the first
hour and 30 mph for the second. What’s his
average speed?

We need the harmonic mean:
=2/(1/20+1/30)
=2/(0.05+0.033)
=2/(0.083)
= 24.09624mph

Weighted Mean

A weighted mean is a kind of average. Instead of
each data point contributing equally to the final
mean, some data points contribute more “weight”
than others.
If all the weights are equal, then the weighted
mean equals the arithmetic mean (the regular
average” you’re used to).
Weighted means are very common in statistics,
especially when studying populations.

Steps:
1.Multiply the numbers in your data set by the
weights.
2.Add the numbers in step1 up. Set this number
aside for a moment.
3.Add up all of the weights.
4.Divide the numbers you found in step2 by the
number you found in step3.

You take three 100-point exams in your statistics
class and score 80,80 and 95. The last exam is
much easier than the first two, so your professor
has given it less weight. The weights for the three
exams are :
•Exam1: 40% of your grade.
•Exam2: 40% of your grade.
•Exam3: 20% of your grade
•Note: 40% as a decimal is 0.4

Step1: Multiply the numbers in your data set by
the weights:
0.4(80)=32
0.4(80)=32
0.2(95)=19
Step2: Add the numbers up. 32+32+19=83.
Step3: (0.4+0.4+0.2)=1
Step4: 83/1=83

The arithmetic mean is best used when the sum of
the values is significant. For example, your grade
in your statistics class. If your were to get 85 on
the first test, 95 on the second test, and 90 on the
third test, your average grade would be 90.

Thank You
Tags