STATISTICS
IN ECONOMICS AND BUSINESS
Nguyen Huyen Trang
Faculty of Statistics -National Economics University [email protected]
LECTURE 2: DATA COLLECTION
•Data measurement
•Source of data
•Sampling process
•Types of sampling
•Nominal
•Ordinal
•Interval
•Ratio
DATA MEASUREMENT
NOMINAL SCALE
•The labels are numerically coded
•Have no logical orderamong labels and numbers
Sex Code
Male 1
Female 2
ORDINAL SCALE
The data classified can be ranked or ordered.
Strongly
Disagree
DisagreeNeutralAgree
Strongly
Agree
1 2 3 4 5
Strongly
Disagree
DisagreeNeutralAgree
Strongly
Agree
5 4 3 2 1
LEARNING STATISTICS IS INTERESTING!
INTERVAL SCALE
Similar to the ordinal level, but differences between data values
are equal and meaningful
Team Round 1Round 2Round 3
A 1 2 3
B 2 3 1
C 3 1 2
Which team
is the
winner?
TeamRound
1
Round
2
Round
3
Total
A 18 18 16 52
B 15 16 18 49
C 11 19 17 47
Team A is
the winner!
There is no natural zero point
→cannot calculate the ratio
INTERVAL SCALE
The interval level with a natural zero starting point
Can use EVERY function
RATIO SCALE
LEVEL OF MEASUREMENT
Qualitative (Categorical)Quantitative (Scale)
Nominal OrdinalDiscreteContinuous
Listing,
Grouping
Listing,
Grouping,
Sorting,
Maybe ±
Listing, Grouping, Sorting
Math operation: ±, , ÷, …
Interval Ratio
Coded by numbers Used to rank
EXERCISE 1 –GROUP WORK
Whattypeofdataandmeasurementscalewouldeachofthefollowing
represent?
(1) What is your favorite sport?
(2) Do you like opera?
(3) How many hours per week do you watch television?
(4) What kind of music do you like?
(5) To what degree do you enjoy reading novels?
(6) On a scale of from 1 (Dislike) to 7 (Like), how much do you like
Italian food?
(7) In what state were you born?
Place these variables in the following classification tables
Nominal
Ordinal
Discrete Continuous
Interval
Ratio
EXERCISE 1 –GROUP WORK
EXERCISE 2
Whatisthe levelof measurementfor eachof the followingvariables?
A. student’smajor
B. distance studentstravelto class
C. studentscores on the first statisticstest
D. a classification of studentsby state of birth
E. a rankingof studentsas freshmen, sophomore, junior, and senior
F. numberof hoursstudentsstudyper week
SOURCES OF DATA
Both must be:
•Relevant
•Accurate
•Current
•Impartial
Primary
Source
Collected for
the particular
purpose
Secondary
Source
Already exists,
collected for
some other purpose
SOURCES OF DATA
GROUP WORK
What are the advantages and disadvantages of
Primary and secondary data?
PRIMARY VS SECONDARY DATA
PRIMARY VS SECONDARY DATA?
•Focus group
•Statistical Yearbook of Vietnam
•Survey
•Interview
•Trade Association Report
POPULATION VS SAMPLE
Population:
A set of all interested elements
N represents the population size, maybe infinite
Sample:
A part of the population that is selected to represent
the entire group
n represents the sample size, finite
CENSUS VS SAMPLING
A censusis a study of every unit,
everyone or everything, in a
population
Sampling is a method of
studying from a few selected
items, instead of the entire big
number of units
REASON TO TAKE SAMPLE
•Collectinginformationfromtheentirepopulationis
sometimesimpossible
•Enableresearch/surveystobedonemorequickly/timely
•Lessexpensiveandoftenmoreaccuratethanlargecensus
•Allowsforminimaldamageorlost
•Beusedtovalidatecensusdata
AN IMPORTANT REQUIREMENT
A sample must be representative of the population.
SAMPLING PROCESS
Define Population
SpecifySampling Frame
Determine Sampling Method
Probability Sampling Non-Probability Sampling
Determine Appropriate
Sample Size
Execute Sampling Design
MOVING FROM
POPULATION TO SAMPLE
Population
Sample
Sampling frame
(a list of all items of
the population)
TYPES OF SAMPLING
PROBABILITY VS
NON-PROBABILITY SAMPLING
FEATURE
PROBABILITY
SAMPLING
NON-PROBABILITY
SAMPLING
Meaning
Subjects of the population get an
equal opportunity to be selected
as a representative sample
The researcher selects sample
based on the subjective judgment
of the researcher rather than
random selection
Alternately known as Random sampling Non-random sampling
Basis of selection Randomly Arbitrarily
Opportunity of
selection
Fixed and known Not specified and unknown
Research Conclusive Exploratory
Result Unbiased Biased
Method Objective Subjective
Inferences Statistical Analytical
Hypothesis Tested Generated
PROBABILITY SAMPLING
SIMPLE RANDOM SAMPLING
•Informal method: randomly picking. Easiest way and can be
applied to a small population (picking a name out of a hat,
choosing the short straw, lottery draw,…)
•Formal method: use the table of random numbers, software
programs
SIMPLE RANDOM SAMPLING
•Five steps in applying this method
i. Obtain a complete sampling frame
ii. Give each case a unique number starting at one
iii. Decide on the required sample size
iv. Select numbers for the sample size from a table of random numbers
v. Select the cases that correspond to the randomly chosen numbers
•Example: Randomly call a few students to take attendance
TABLE OF RANDOM NUMBERS
54033935397490257237839400383070718700154548745727980851451238614
92744532239060836942713827136865638241139237439008765537928614332
17716956902158444015676229532821217209443022673254405063880850946
99153066304828763905436109753715845172953932721392847397407180258
32607841095616987115942179304181437842233892577017804827078893096
25123113078887615580354701526692263495085960382354937822477557586
62173290616858276463262616861677488615331677798307562492997096282
60706305347561481804102397653551098788062405943888305213011918724
WHEN DO WE APPLY THIS?
▪Have a good sampling frame
▪Population is geographically concentrated
▪Data collection technique does not involve travelling
SYSTEMATIC RANDOM SAMPLING
Choose every “k
th
” individual to be a part of the sample
SYSTEMATIC RANDOM SAMPLING
Steps to obtain a systematic sample:
•Obtain a sampling frame
•Determine the population size: N
•Determine the sample size required: n
•Divide population of N individuals into groups of k individuals:
•Randomly select one individual from the 1st group
•Select every k
th
individual thereafter
k=
N
n
STRATIFIED RANDOM SAMPLING
•Populationisdividedintotwoormoregroups
calledstrata
•Subsamplesarerandomlyselectedfromeachstrata
STRATIFIED RANDOM SAMPLING
•Thesamplingprocedureismorecomplicated
•Stepstotakeastratifiedsample
•Selectthestratifyingvariable
•Dividethesamplingframeintostrataorcategories
•Drawasystematicorrandomsampleofeachstratum
CLUSTER RANDOM SAMPLING
▪Dividethepopulationintoseparategroups,calledclusters.
▪Twotypesofclustersampling:
•Onestagecluster
•Twostagecluster
CLUSTER RANDOM SAMPLING
▪Onestagecluster
•Randomlyselectsubsets
•Sampleentireparticipationsin
theselectedsubset
▪Twostagecluster
•Randomlyselectsubsets
•Conduct simple random
sampling for participations in
the selected subset
MULTI-STAGE RANDOM SAMPLING
•To be a complex form of cluster and stratified sampling
•Carried out in stages
•Using smaller and smaller sampling units at each stage
PROBABILITY SAMPLING
Technique Advantages Disadvantages
Random
-Easy to conduct
-Not require any additional information
except the contact info
-Meets assumption of many statistical
procedures
-Identification of all members of the
population can be difficult
-Can be expensive and unfeasible for large
population
Systematic
-Easy to construct, execute, compare, and
understand
-Spread over population
-High sampling bias if periodicity exists
Stratified
-More accurate sample
-Effective representation of all subgroups
-Problem if strata not clearly defined
-Complex to apply in practical levels
Cluster
-Time efficient
-Cost efficient: reduce field cost
-Applicable where no complete list of units
is available
-May not be representative of whole
population
NON-PROBABILITY SAMPLING
The process of selecting
sample without using
statistical probability
theory
QUOTA SAMPLING
▪Similar tostratified sampling: population is divided
into subsets
▪Select the participations from each subset based on
specified proportion
PURPOSIVE SAMPLING
Selective
Sampling
Subjective
Sampling
-Also known as: Judgmental Sampling, Selective Sampling,
Subjective Sampling
-Rely on the judgement of the researcher
VOLUNTEER SAMPLING
▪Participants self-select to become part of a study because
they volunteer when asked, or respond to an advert
▪Two types of volunteer sampling:
-Snowball
-Self selection
SNOWBALL SAMPLING
▪Known as network or chain-referral sampling
▪Existing participations recruit future participations among their
acquaintances
SELF SELECTION SAMPLING
▪Individuals identify their wish to take part in the study
▪Individuals volunteer to be part of the sample
CONVENIENCE SAMPLING
▪Known as Haphazard or Accidental sampling
▪Sample units are only selected if they can be accessed easily and
conveniently
NON-PROBABILITY SAMPLING
Technique Advantages Disadvantages
Quota
-Low cost, time and administrations
-No need for list of population elements
-Dependent on subjective decisions
-Not possible to generalise
Purposive
-Select only individuals who are relevant
to research purpose
-Less costly, more convenient
-No guarantee that chosen sample are
true representative of the population
-Limited generalizability
Volunteer
-May have an interest in the subject so
they are less likely to give biased
information
-Doesn’t require a lot of screening
-Over-representation of a particular
network
-Take a long time to get enough people
to do experiment
Convinience
-High levels of simplicity and ease
-Less time and cost required
-Usefulness in pilot studies
-Highest level of sampling error
-Sample is not representative of
population