Definitions Sample unit- the basic unit around which a sampling procedure is planned. Can be a person, household, school Sampling frame- a list of all of the sampling units in a population Sample - collection of sampling units from the eligible population
Why sample Unable to study all members of a population Reduce bias Save money and time Measurements may be better in sample than in entire population Feasibility
SAMPLING DESIGNS
PROBABILITY (RANDOM) SAMPLING
NON PROBABILITY (NON RANDOM) SAMPLING
Probability (random) sampling It is the scientific approach, providing a rigorous basis for estimating the fidelity with which the phenomena observed in the sample are representative of those in the population, and for computing statistical significance Sampling in which each sampling unit has a known and non zero probability of being included in the sample- random sample
Types of probability (random) sampling Simple random sampling Systematic sampling Stratified random sampling cluster sampling Multistage sampling
Which sampling design is the best ? Choose the method that gives the greatest degree of accuracy and precision for a given cost
Simple random sampling Each sampling unit has equal chance of being included in the sample In epidemiology, sampling generally done without replacement as the approach allows for a wider coverage of sampling units, and as a result smaller standard errors
Advantages Simple process and easy to understand – Easy calculation of means and variance
Disadvantages Not most efficient method, that is, not the most precise estimate for the cost Requires knowledge of the complete sampling frame Can not always be certain that there is an equal chance of selection
Simple random sampling Estimate haemoglobin levels in patients with SCD 1. determine sample size 2. obtain a list of all patients with SCD in a hospital or clinic 3. patient is the sampling unit 4. use a table of random numbers to select units from the sampling frame 5. measure haemoglobin in all patients 6. calculate mean and standard deviation of sample
Systematic sampling The sampling units are spaced regularly throughout the sampling frame e.g. every 3 rd unit would be selected May be used as either probability sample or not Not probability sample unless the starting point is randomly selected Non- random sample if the starting point is determined by some other mechanism than chance
Advantages Sampling frame does not need to be defined in advance Easier to implement in the field If there are unrecognized trends in the sample frame, systematic sample ensure coverage of the spectrum of units Disadvantages Variance cannot be estimated unless assumptions are made
Systematic sampling Estimate HIV prevalence in children born during a specified period at a hospital 1. impossible to construct sampling frame in advance 2. select a random number between some prespecified bounds 3. beginning with the random number chosen, take every 5 th birth and measure for HIV infection
Stratified random sampling The sampling frame comprises groups or strata with certain characteristics A sample of units are selected from each group or stratum
Advantages Assures that certain subgroups are represented in a sample Allows investigator to estimate parameters in different strata More precise estimates of the parameters because strata are more homogenous, e.g smaller variance within strata Strata of interest can be sampled most intensively e.g. groups with greatest variance Other administrative advantages
Disadvantages Loss of precision if small number of units is sampled fromstrata
Stratified random sampling Assess dietary intake in adolescents 1. define three age groups: 11-13, 14-16, 17-19 2. stratify age groups by sex 3. obtain list of children in this age range from schools 4. Randomly select children from each of the 6 strata until sample size is obtained 5. measure dietary intake
Cluster sampling Clusters of sampling units are first selected randomly Individual sampling units are then selected from within each cluster
Advantages The entire sampling frame need not be enumerated in advance, just the clusters once identified More economical in terms of resources than simple random sampling
Disadvantages Loss of precision, i.e wider variance, but can be accounted for within larger number of clusters
Cluster sampling Estimate the prevalence of dental caries in school children 1. Among the schools in the catchments’ area, list all of the classrooms in each school 2. Take a simple random sample of classrooms or cluster of children 3. Examine all children in a cluster for dental caries 4. estimate prevalence of caries within clusters than combine in overall estimate, with variance
Multistage sampling Similar to cluster sampling except that there are two sampling events instead of one Primary units are randomly selected Individual units within primary units randomly selected for measurement
Multistage sampling Estimate the prevalence of dental caries in school children 1. among the schools in the catchment area, list all of the classrooms in each school 2. take a simple random sample of classrooms, or cluster of children 3. enumerate the children in each classroom 4. take a simple random sample of children within the classroom 5. examine all selected children in a cluster for dental caries 6. estimate prevalence of caries within clusters than combine in overall estimate, with variance
Non- Probability Sampling Convenience sampling A non-random collection of sampling units from an undefined sampling frame
Advantage Convenient and easy to perform
Disadvantage No statistical justification for sample Case series of patients with a particular condition at a certain hospital Graduate students walking down the hall are asked to donate blood for a study Children with febrile seizures reporting to an emergency room Investigator decides on who to enrol in a study
Consecutive sampling A case series of consecutive patients with a condition of interest Consecutive series means all patients with the condition within hospital or clinic, not just the patients the investigator happens to know about
Advantages Removes investigator from deciding who enters a study Requires protocol with definitions of condition of interest Straight forward way to enrol subjects
Disadvantages Non- random Outcome of 1,000 consecutive patients presenting to the emergency room within chest pain Natural history of all 125 patients with HIV associated TB during 5 year period Explicit efforts must be made to identify and recruit all persons with the condition of interest
Capture-recapture sampling A non-random method of sampling that relies on lists of sampling units obtained from multiple sources The overlap in the lists allows one to estimate the number of individuals not captured Uses of this method Estimate parameter when incomplete information is available from > 2 sources Refine of prevalence or incidence estimates from population surveys Assess completeness of event reporting Derive plausible upper and lower limits on total population affected
Advantages Does not require random sample Can give more precise estimate of parameter than probability sample Easy to perform in the field Useful in estimating events in difficult to access populations
Disadvantages Analysis of lists may be complicated Need to be able to match individuals across lists Assumptions regarding probability of being listed by a source Unfamiliar to epidemiologists Capture-recapture
Estimate the number of AIDS cases among IDUs in a city 1 . from hospital and clinic records obtain lists of persons with diagnosis of HIV/AIDS during study period 2. Determine IDUs status 3. identify people who appear on multiple lists 4. use nested log-linear models to estimate the number of IDUs with AIDS not captured by different lists Use the list of reported cases and estimate of non-reported cases to obtain overall estimate of the number of IDUs with AIDS (with confidence intervals)
Sample size determination Why sampling? Get information about large populations Less costs Less field time More accuracy i.e. Can Do A Better Job of Data Collection When it’s impossible to study the whole population