The editing of data is the first step of data processing.
Editing of data is a process of examining the collected raw data in order to detect errors and omissions and to correct these when possible.
In the process of editing, a careful scrutiny of the completed questionnaires and/or schedules is...
The editing of data is the first step of data processing.
Editing of data is a process of examining the collected raw data in order to detect errors and omissions and to correct these when possible.
In the process of editing, a careful scrutiny of the completed questionnaires and/or schedules is made.
Editing of data is done to ensure that the data are accurate, consistent with other data, uniformly entered and possibly complete
Size: 1.04 MB
Language: en
Added: Jul 05, 2024
Slides: 76 pages
Slide Content
Measures of central tendency
Dr. B.Umaprasada Rao
Assistant Professor
Department of Mathematics
Institute of Science
GITAM (Deemed to beUniversity)
Visakhapatnam
Operations in Data Processing &
Operations in Data Processing
Editing
Coding
Classification
Tabulation
Stage I –Editing
The editing of data is the first step of data
processing.
Editing of data is a process of examining the
collected raw data in order to detect errors and
omissions and to correct these when possible.
In the process of editing, a careful scrutiny of the
completed questionnaires and/or schedules is
made.
Editing of data is done to ensure that the data
are accurate, consistent with other data, uniformly
entered and possibly complete
During the process of editing, it is also seen that
Data have been well arranged to facilitate the
further steps, i.e. coding and tabulation.
Editing, thus involves scrutinizing the collected
data to identify and minimize errors,
incompleteness, misclassification and gaps in the
information obtained from respondents, to the
extent possible.
In large scale surveys, the company undertaking
the research project appoints supervisors, or
editors, for Editing the data.
Depending upon the stages at which editing is
done, Editing is classified into two types:
A. Field Editing
Field editing consists in the review of the reporting
forms by the investigator for completing
(translating or rewriting) what the latter has
written in abbreviated and/or in illegible form at
the time of recording the respondents’ responses.
This type of editing is necessary in view of the fact
that individual writing styles often can be difficult
for others to decipher.
B. Central Editing
Central editing should take place when all forms
or schedules have been completed and returned
to the office. This type of editing implies that all
forms should get a thorough editing by a single
editor in a small study and by a team of editors in
case of a large inquiry.
Operations in Data Processing
Stage II -Coding
Coding is the process of assigning
symbols to answers, so that responses
can be put into a limited number of
categories or classes.
The symbols assigned can be numerical
or alphabetical or both.
Stage III –Classification
Voluminous raw data collected through a survey
must be reduced into homogeneous groups in
order to facilitate meaningful analysis. This is
achieved through Classification of data.
Classification (Class) refers to, arranging data in
groups or classes on the basis of some common
characteristics.
Data having a common characteristic are placed
in one class and in this way, the entire data get
divided into a number of classes.
Definitions
❑According to Conner, "Classification is the
process of arranging things (either normally or
notionally) in groups or classes according to their
resemblances and affinities and give expressions
of the unity attributes that may subsist amongst a
diversity individuals“.
❑According to Secrist, "Classification is the
process of arranging data into sequences and
groups according to their common characteristics
Purpose of Classification
A. Simplifying and condensing the data.
B. Comparison of characteristics.
C. Render data for tabulation and
statistical analysis.
D. Drawing meaningful conclusions.
E. To study the relationships.
Types of Classification
Classification can be of the following two
types
A. Classification Based on Attributes.
B. Classification based on Class
Intervals.
Stage IV –Tabulation
Tabulation refers to the process of
arranging data in a tabular format.
Usually, data is presented in the form
of statistical tables, which undergo
further analysis.
In other words, tabulation is an orderly
arrangement of data in columns and
rows.
Parts of Table
a) Table number
b) Table title
c) Head notes (also known as prefatory notes)
d) Captions (Column headings)
e) Stubs (Row Headings)
f) Body of the table
g) Foot note
h) Source note
Purpose of Tabulation
a)Tables present data in a condensed manner.
b) Data presented is easily understandable.
c) To furnish maximum information
d) To facilitate easy comparison between two or
more parameters of interest.
e) Facilitates comparison.
f) Data in tables can be subjected to various
statistical computations (analysis of data).
Classification of Tabulation
a) Simple Tabulation
b) Complex Tabulation (CrossTabulation)
Statistics
Definition(Webster): Statistics as “classified facts
representing the conditions of the people in a state
… especially those facts which can be stated
numbers or in any other tabular or classified
arrangement.
Another definition(Bowley): Numerical statements
of facts in any department of enquiry placed.
There are two major areas of Statistics
1. Descriptive Statistics: Descriptive statistics
concern the development of certain indices from
the raw data.
2. Inferential statistics: Inferential statistics
concern with the process of generalization.
The important statistical measures that are used to
summarize the survey or research data are:
(i)Measures of central tendency
(ii)Measures of dispersion
(iii)Measures of asymmetry(skewness)
(iv)Measures of relation ship
•Measures of central tendency are also usually
called as the averages
•They give us an idea about the concentration of
the values in the central part of the distribution.
•The following are the five measures of central
tendency that are in common use:
•(i) Arithmetic mean (ii) Median (iii) Mode
(iv) Geometric mean and (v) Harmonic mean
(vi) Weighted mean.
MEASURE OF CENTRAL TENDENCY
Median Mean Mode
The middle value of the data Most commonly occurring value
The average of the data
Demerits
It cannot be determined by inspection.
Arithmetic mean cannot be used if we are dealing with
qualitative characteristics, which cannot be measured
quantitatively like caste, religion, sex.
Arithmetic mean cannot be obtained if a single observations
is missing or last.
Arithmetic mean is very much affected by extreme values.
Find mean days of confinement after delivery in the following series:-
Days of No.of Patients
confinement
6 5
7 4
8 4
9 3
10 2
Median
1.Measure of central tendency.
2.The median is determined by sorting the data
set from lowest to highest values and taking the
data point in the middle of the sequence.
3.Middle value in ordered sequence
•If odd n, middle value of the sequence.
•If even n, Average of two middle values.
4.Not effected by extreme values.
Demerits
In case of even number of observations median
cannot be determined exactly.
It is not based on all the observations.
It is not capable of further mathematical
treatment.
WeightsifinfantinkgNo.ofinfantsCumulative
frequency
2.0-2.5 37 37
2.5-3.0 117 154
3.0-3.5 207 361
3.5-4.0 155 516
4.0-4.5 48 564
4.4andabove 26 590
N/2=590/2=295
Median class 3.0-3.5 so L=3.0,f=207,Cf=154,h=0.5
For grouped data
Calculate the median for the following data
Series: class interval Frequency
5-9 2
10-14 11
15-19 26
20-24 17
25-29 8
30-34 6
35-39 3
40-44 2
45-49 1
Mode
1.Measure of central tendency.
2.The mode is most frequently occurring value in
the data set.
3.May be no mode or several modes.
Merits
Mode is readily comprehensible and easy to
calculate.
Mode is not at all affected by extremvalues.
Mode can be conveniently located even if the
frequency distribution has class intervals of
unequal magnitude.
Open-end classes also do not pose any problem
in the location of mode.
Mode is the average to be used to find the ideal
size.
Demerits
•Mode is ill defined.
•It is not based upon all the observations.
•It is not capable of further mathematical
treatment.
•As compared with mean, mode is affected to a
great extent by fluctuations of sampling.
Mode example
No mode
Raw data 10.3 4.9 8.9 11.7 6.3 7.7
One mode
Raw data 6.3 4.98.9 6.3 4.94.9
More than one mode
Raw data 21 2828 41 4343
Mode for ungrouped data
2,2,3,4,6,7,4,4,4,4,8,9,0 mode is 4
10,10,3,3,4,2,1,6,7 mode is 10 and 3
10, 34,23,12,11,3,4 no mode
Mode for grouped data
Q. Find the mode for group data
Age group No.of persons
20-30 3
30-40 20
40-50 27
50-60 15
60-70 9
Solution
Q. Calculate the mode for the following
frequency distribution
IQ Range Frequency
90-100 11
100-110 27
110-120 36
120-130 38
130-140 43
140-150 28
150-160 16
160-170 1
Relation between mean, median and
mode
Mean=3*Median-2*mode
Summary of Central Tendency
Measures
Measure Description
Mean Balance Point
Median Middle value when ordered
Mode Most frequent
The following table gives the frequency distribution of marks
obtained by 2300 medical students of Gujarath in MCQ of PSM
exam. Find Mean, Median, Mode.
Marks No. of Students
11-20 141
21-30 221
31-40 439
41-50 529
51-60 495
61-70 322
71-80 153
Geometric mean
Geometric mean for Group data
Geometric mean
Harmonic mean
Harmonic mean (formerly sometimes called the
sub contrary mean) is one of several kinds of
average.
The harmonic mean is a very specific type of
average.
It’s generally used when dealing with averages
of units, like speed or other rates and ratios.
Rahul drives a car at 20mph for the first
hour and 30 mph for the second. What’s his
average speed?
We need the harmonic mean:
=2/(1/20+1/30)
=2/(0.05+0.033)
=2/(0.083)
= 24.09624mph
Weighted Mean
A weighted mean is a kind of average. Instead of
each data point contributing equally to the final
mean, some data points contribute more “weight”
than others.
If all the weights are equal, then the weighted
mean equals the arithmetic mean (the regular
average” you’re used to).
Weighted means are very common in statistics,
especially when studying populations.
Steps:
1.Multiply the numbers in your data set by the
weights.
2.Add the numbers in step1 up. Set this number
aside for a moment.
3.Add up all of the weights.
4.Divide the numbers you found in step2 by the
number you found in step3.
You take three 100-point exams in your statistics
class and score 80,80 and 95. The last exam is
much easier than the first two, so your professor
has given it less weight. The weights for the three
exams are :
•Exam1: 40% of your grade.
•Exam2: 40% of your grade.
•Exam3: 20% of your grade
•Note: 40% as a decimal is 0.4
Step1: Multiply the numbers in your data set by
the weights:
0.4(80)=32
0.4(80)=32
0.2(95)=19
Step2: Add the numbers up. 32+32+19=83.
Step3: (0.4+0.4+0.2)=1
Step4: 83/1=83
The arithmetic mean is best used when the sum of
the values is significant. For example, your grade
in your statistics class. If your were to get 85 on
the first test, 95 on the second test, and 90 on the
third test, your average grade would be 90.