Introduction to Sampling and Sampling Procedure Department of Epidemiology Faculty of Public Health Universitas Indonesia
Objectives of the course At the end of the session, you should be able to: Identify and describe common methods of sampling. Discuss problems of bias that should be avoided when selecting a sample. Decide on the sampling method(s) most appropriate for the research design you are developing.
What we want to achieve from doing sampling? Representative generalizable What else?? 4/27/2024
Why Sampling? Would reduce time and cost, yet collect accurate data Better quality than total population because: Fewer staff required choose best ones and train them better Fewer errors because of better quality control Our choice is usually the sampling methods that gives the highest degree of accuracy and precision for a given amount of resources.
web comic xkcd by Randell Patrick Munroe ( http://www.xkcd.com/2618/ ) Problem with this finding?
Let’s learn from other’s experience Selection Bias - Are you choosing the right way? (productiveclub.com)
4/27/2024
Level of selection Main exclusions Not assessed, assessed and found not eligible, and not classified because of inadequate data. Death, inability to cooperate, admin issues, confidentiality, voluntary non response... (do not enter study) Failure to complete study requirements, missing data, loss to follow up (do not complete study) LEVELS OF SUBJECT SELECTION (1)
Direction of selection of subjects Target pop. The population to which the results can be applied Source pop. The population defined in general terms and enumerated if possible, from which eligible subjects are drawn Eligible pop. The population of subjects eligible for inclusion in the study; should be defined precisely Study entrants Individuals who enter the study; should be defined and counted. All non participants should be accounted for with reasons for non-participations Study participants Individuals who contribute data to the study; the results apply directly only to these subjects Selection Selection Selection Direction of application of results Application Application Application
Ex : A clinical trial assessing different treatments in the management of Acute Myocardial Infarction, carried out in a major teaching hospital.
Level of selection Ex (cont’d) patients who enter the study but their outcome information is not available, so they are not fully participants (they do not contribute to the key analysis) patients who enter the study and provide outcome data all patients with an appropriate diagnosis seen at the participating hospital, within preset limits of age and other factors, who do not have the various clinical contraindications that will be defined in the trial protocol patients admitted to the teaching hospital, or to the particular clinical unit, over a certain period of time patients seen in other geographical areas, or other countries, include patients seen at future time
Terms (1) Census: Complete enumeration of a population Sample: A set part of a population that has properties that give information about the whole population Sampling: Process of selecting a representative part of a population for the purpose of estimating or investigating parameters of the whole population
Terms (2) Sampling error: The inaccuracy that results from sampling particular sections of the population Non-representatives of the sample: Sampling bias Measurement error
Sampling The sample population should be as representative as possible of the study population Generalizations of results to the whole population more likely to be accurate Sampling frame complete and inclusive up-to-date relevant for the research topic
Good sampling constitutes of … 4/27/2024
Sampling procedures
SAMPLING METHODS Purposive sampling strategies for qualitative studies Non-Probability Sampling (the likelihood of any member of the population being selected is unknown) Random sampling strategies to collect quantitative data Probability Sampling (the likelihood of any member of the population being selected is known) 4/27/2024 MA. Metodologi Penelitian Epidemiologi
Formal Sampling Procedure Sampling error can be calculated Valid need to get response rate / participation rate as large as possible. Low participation rates are prone to bias (responders and non-responders almost always differ in important ways). 4/27/2024 MA. Metodologi Penelitian Epidemiologi
Non-responder/Non-participation(1) Those who were selected to be included in the sample, but were not included due to non-response Response rate, example: In-person interview: 75% Involving physical examination: 55-65% Completing mailed questionnaire: <60%
Non-responder/Non-participation(2) If a large proportion are non-responders sample the non-responders to obtain their information, and compare their characteristics with the responders. Adjustment could be made in the estimates. Non-responder might be very different from the responders, how? May be seriously ill, too old to participate May be too well (very healthy) and have little interest to participate 4/27/2024 MA. Metodologi Penelitian Epidemiologi
Definition (1) Sampling unit: smallest unit where sampling procedure will be applied. For example: Measure the prevalence of Upper RespiratoryTract Infection (ISPA) sampling unit: individual Measure proportion of households with ISPA sampling unit: household Sampling frame: list of all sampling unit in a population
Definition (2) Sample: sampling unit chosen from an eligible population Probability sampling: sampling in which each sampling unit has a known, nonzero probability of being included in the sample
Definition (3) Target Population Source Population: includes eligible and non-eligible population Eligible Population: includes those who were selected and participate, those who were selected but do not participate, and those who were excluded based on exclusion criteria (e.g.: those who already died) Study Sample: includes individuals who were selected and participate in the study
Non-probability sampling (1) In this approach, one ensures that one will get a variety of people. These methods are best for qualitative not quantitative studies. Non-probability sampling includes: Purposive sampling Convenience sampling Snowball/Network sampling Opportunistic Theoretical sampling
Non-probability sampling (2) Theoretical sampling (developing theoretical insights of the research problem, consciously changing the type of people interviewed) Network sampling Possibly limits the diversity of respondents; usually used for hard-to-reach populations Strategy: Increase the number of sources
What are the disadvantages of non-probability sampling in epidemiologic research? Can we address the problem by increasing sample size? No, because the same source of bias still present In fact, false confidence may be placed in the resulting estimates (highly significant, but false) 4/27/2024 MA. Metodologi Penelitian Epidemiologi
Conclusion: Majority of the balls are red
Conclusion: Majority of the balls are red Valid conclusion?
Probability Sampling Probability Sampling Random Sampling Used if we want to generalise the findings obtained from a sample to the total study population. Each sampling unit has a known, nonzero probability of being included in the sample. Uses random selection procedures to ensure that each unit of the sample is chosen on the basis of chance.
Example 1 A study to examine risk factors of obesity among junior high school students in Bandung District. Researchers select 2 public schools (junior high schools) and used purposive sampling to enrol students from Grade 1 and 2 Researcher was getting help from school staff and teacher who visit the classes and select 10 students from each class. Some teacher ask the students “who wants to join this survey?” Other teachers select 10 students who they think will cooperate wel .. What do you think about the above sampling strategy? 4/27/2024
Example 2 A study to examine cure rate of women with cervical cancer in DKI Jakarta Researchers conduct the study in Fatmawati Hospital and Dharmais Hospital What do you think about the above sampling strategy? 4/27/2024
Probability sampling methods Simple random sampling Systematic sampling Stratified sampling Cluster sampling Multistage sampling
Simple random sampling (1) Each sampling unit in the population has an equal chance of being included in the sample. Advantages: Simple to carry out and to understand
Simple random sampling (2) Disadvantages (cont.): Not the most efficient method often does not provide the most precise estimates for a given amount of money Need the knowledge of complete sampling frame in advance Has to be careful that each unit really does have an equal chance of being included in the sample, e.g.: sampling from telephone directories where not all people are list in the directories sampling frame is not complete sample not representative
Systematic sampling (1) Sampling units are selected systematically through out the sampling frame every k th unit is selected (e.g.: the 3 rd , 10 th )
Systematic sampling (2) Calculate and fix the sampling interval (The number of elements in the population divided by the number of elements needed for the sample.) Choose a random starting point between 1 and the sampling interval. After the selection of first sample, every kth unit is included where 1≤r ≤k. e.g. Let N =100, we want to sample n =10. Then, the sampling interval k =100/10=10. Let the random start r is selected between 1 and 10 (say, r=2). So, the sample will be selected from the population with serial indexes of: 2, 12, 22,........., i.e., r, r+k , r+2k,......., r+(n-1)k
Systematic sampling (3) Advantages: No need to know the sampling frame in advance the frame can be constructed as the study progresses. Contoh : pada tes skrining bayi baru lahir selama 1 thn di RS, tidak tahu berapa jumlah kelahiran shg tidak tahu sampling frame (SF), tetapi SF dapat diestimasi ( dari jumlah kelahiran tahunan di RS tsb ), sehingga dapat menentukan sistem pemilihan
Systematic sampling (4) Advantages (cont.): Often simpler to implement in the field than other sampling methods e.g.: it is easier to visit every 5 th house on a block than to determine which houses to visit determined by random table
Systematic sampling (5) Advantages (cont.): 3. If a trend is present in the sampling frame (e.g.: units consist of small to large values) this method will ensure coverage of the spectrum of units. Kelsey et al., page 316:
Systematic sampling (6) Disadvantages: If there is a cyclical trend in the data the method could consistently included the peaks or the lows. Example: selecting children patients on Mondays only (every 7 th days starting on Monday) would include sicker children. Another example: Kelsey et al, p. 317
Example 3 A study aims to investigate the mortality rate of CVD in Indonesia We know that …. Incidence of CVD is higher among men than women Mortality of CVD is also higher among men than women If we do a random sampling or systematic sampling, what would we expect to get in our sample in terms of distribution by sex? More men than women Do you have any concern about that? 4/27/2024
Stratified sampling (1) Why need stratified sampling? People in the population differ systematically on some characteristics These characteristics might relate to the factor being studied
Stratified sampling (2) The population is divided into strata or sub-group with similar characteristic (e.g.: old vs young, women vs men, etc ), sample will then be chosen randomly from each stratum / sub-group. Stratified sampling is widely used Advantages: May ensure that each sub-group of the population will be represented. Means can be calculated separately for each sub-group in addition to the overall mean.
Stratified sampling (3) Advantages (cont.): Will obtain more precise parameter estimates compared to when a simple random sampling is used, because the variance calculated from the entire sample is based on within-stratum variance (especially if the subgroup is more homogeneous , but for sure more homogeneous than the entire population). Could save money , especially if an expensive procedure is involved (e.g.: x-ray). Example: through questionnaire, determine which members of the large sample have symptoms suggestive of osteoarthritis. Those with positive symptoms will be sampled more heavily to get an x-ray procedure, compared to those with no symptoms.
Stratified sampling (4) Advantages (cont.): Easier to implement for geographic area, period of time samples can be divided into strata. Strata will be handled as separate administrative units using the same procedures. Disadvantages: Loss of some precision if very small numbers of units are sampled within individual sub-group
Stratified sampling (5) Nevertheless, stratification should always be considered whenever sampling is being planned to obtain more homogeneous subgroup, and precision can be increased by using these subgroup for stratification.
Stratified sampling (6)
Cluster sampling (1) Clusters are sampled from all available clusters in a defined area, followed by selecting all individuals in the selected clusters as the study sample. Unlike strata, clusters should be as heterogeneous as possible.
Cluster sampling (2) Example: select elementary schools at random from all available elementary schools, and visit all selected schools to examine children to get the prevalence of dental caries among elementary school children in a defined area. More simple than listing all elementary school children in an area, draw a random sample of all the children listed and examine them
Cluster sampling (3) Advantages: No need to list all member of the population, just the members of selected clusters. More economical than simple random sampling, but poorer precision compared to that from the total population. Disadvantages: For the same sample size, variance from cluster sampling is greater (but larger numbers can reduce variance)
Multistage sampling(1) Primary sampling units are first selected (e.g.: municipalities). Then, secondary (smaller) sampling units (e.g.: city blocks) are selected from each primary sampling unit. Could be extended to tertiary sampling units (e.g.: household) or further (e.g.: individuals). May include several sampling procedures at different stages.
Multistage sampling(2) Multistage sampling differs from cluster sampling, in that sampling at secondary stage is still done; whereas in cluster sampling, all individuals in the selected cluster (secondary stage) will be included as samples. In multistage sampling, different sampling methods could be used on different stage / level. National surveys usually employed this sampling procedure to get a representative sample at national level, efficiently.
Multi-stage cluster sampling For example, to carry out an immunization survey of school children in a given province : Step 1: Select m kabupaten from the M mutually exclusive and exhaustive kabupaten composing the province; Step 2: Select a sample of kecamatan within each of the kabupaten selected at the first step; Step 3: Select a sample of school within each of the kecamatan selected at the second stage; Step 4 : Select a sample of classrooms within each of the schools selected at the third stage; Step 5 : Take every child within the classrooms selected at the fourth stage.
Other sampling methods Area sampling: combination of cluster and multistage sampling. A geographic area is divided into smaller areas, a sample of these smaller areas will be randomly selected. Within these areas: Units are enumerated and all are included (cluster) Units are enumerated and will be sampled randomly (multistage)
Other sampling methods Multiphase sampling: obtaining some information from the entire sample, and other information from sub-samples of the full sample. Sequential sampling: sample will be determine after another study was done (or results of group of observations are obtained). Most commonly used in clinical trials.
So, which sampling method to choose? Check your: Research aims (question, objective, etc) Resources 4/27/2024