Session Outline What is sample size? Basic information needed for sample size calculation. Why to determine sample size? How large a sample do we need? What are the methods of determining it? What are the factors that affect it? Types of measurement in research. How do we determine sample size? Conclusion
What is a Sample? This is the sub- population , to be studied in order to draw a inference from a reference population (a population to which the findings of the Study are to be generalized). In Census, the sample size is equal to the population size. However, in research, because of time constraints and budget, a representative sample is normally used. Larger the sample, more accurate will be the findings from a Study.
Cont’d…………… Availability of resources sets upper limit of the sample size. Required accuracy sets lower limit of sample size. Thus, an optimum sample size is an essential component of any research.
Basic Information Needed for Sample Size Calculation The approach to sample size calculation can be arrived at by thinking through the following set of questions: What type of study is this? Single sample (prevalence survey) Comparison of two groups (cross- sectional, case- control, cohort study) What is the main (primary) outcome? Mean of a measurement (mean blood pressure) Proportion Ordered scale (pain scores) What is the expected variability between the subjects? How large a difference would be considered clinically important and reasonable?
What is sample size determination Sample size determination is the mathematical estimation of the number of subjects/units to be included in a study. When a representative sample is taken from a population, the finding are generalized to the population. Optimum sample size determination is required for the following reasons: To allow appropriate analysis To provide desired level of accuracy To allow validity to the significance test.
How large a sample do we need? If the sample is too small: Even a well conducted Study may fail to answer it’s research question. It may fail to detect important effects or associations. It may associate this effect or association imprecisely.
Cont’d…………… If the sample size is too large: The Study will be difficult and costly. Time constraint. Loss of accuracy. Hence, optimum sample size must be determined before commencement of a Study.
Types of Measurement in Research Random error Systematic error (bias) Precision (reliability) Accuracy (Validity) Effect size Design effect Type I(a) error Type II (b) error Power (1- β) Null hypothesis Alternative hypothesis
Definition of terms Random error: Errors that occur by chance. Sources are sample variability, subject to subject differences & measurement errors. These can be reduced by averaging, increasing sample size, repeating the experiment. Systematic error: Deviations not due to chance alone. Several factors, e.g. patient selection criteria may contribute. It can be reduced by good study design and conduct of the experiment. Precision: The degree to which a variable has the same value when measured several times. It is a function of random error. Accuracy: The degree to which a variable actually represent the true value. It is function of systematic error.
Cont’d…………… Power: This is the probability that the test will correctly identify a significant difference, effect or association in the sample should one exist in the population. Sample size is directly proportional to the power of the study. The larger the sample size, the study will have greater power to detect significance difference, effect or association. Effect size: Is a measure of the strength of the relationship between two variables in a population. The bigger the size of the effect in the population, the easier it will be to find out.
Cont’d…………… Design effect : Geographic clustering is generally used to make the study easier & cheaper to perform. The effect on the sample size depends on the number of clusters & the variance between & within the cluster. In practice, this is determined from previous studies and is expressed as a constant called ‘design effect’ often between 1.0 & 2.0. The sample sizes for simple random samples are multiplied by the design effect to obtain the sample size for the cluster sample.
Cont’d…………… Null hypothesis: It state that there is no difference among groups or no association between the predictor & the outcome variable. This hypothesis need to be tested. Alternative hypothesis: It contradict the null hypothesis. If the alternative hypothesis cannot be tested directly, it is accepted by exclusion if the test of significance rejects the null hypothesis. There are two types; one tail(one- sided) or two tailed(two-sided)
Cont’d…………… A type I error occurs if you reject the null hypothesis when it is true. A type II error occurs if you do not reject the null hypothesis when it is false.
At what stage can sample size be addressed? It can be addressed at two stages: Calculation of the optimum sample size is required during the planning stage, while designing the Study and information on some parameters. At the stage of interpretation of the result.
Approaches for estimating sample size Approaches for estimating sample size depend primarily on: The study design & The main outcome measure of the study There are distinct approaches for calculating sample size for different study designs & different outcome measures.
Procedure for calculating sample size There are 3 procedures that could be used for calculating sample size: Use of formulae Ready made tables Computer soft wares
Sample Size Formula The formula requires that we (i)specify the amount of confidence we wish to have, (ii) estimate the variance in the population, and (iii) specify the level of desired accuracy we want. When we specify the above, the formula tells us what sample size we need to use…. n
Use of formulae for sample size calculation & power analysis There are many formulae for calculating sample size & power in different situations for different study designs. The appropriate sample size for population-based study is determined largely by 3 factors The estimated prevalence of the variable of interest. The desired level of confidence. The acceptable margin of error.
Cont’d…………… To calculate the minimum sample size required for accuracy, in estimating proportions, the following decisions must be taken: Decide on a reasonable estimate of key proportions (p) to be measured in the study Decide on the degree of accuracy (d) that is desired in the study.~1%- 5% or 0.01 and 0.05 Decide on the confidence level(Z) you want to use. Usually 95%≡1.96. Determine the size (N) of the population that the sample is supposed to represent. Decide on the minimum differences you expect to find statistical significance.
For population >10,000. n= 𝑧 2 pq/ 𝑑 2 n= desired sample size(when the population>10,000) Z=standard normal deviate; usually set at 1.96(or a~2), which correspond to 95% confidence level. p=proportion in the target population estimated to have a particular characteristics. If there is no reasonable estimate, use 50%(i.e. 0.5) q=1-p(proportion in the target population not having the particular characteristics) d= degree of accuracy required, usually set at 0.05 level( occasionally at 2.0)
Example 1 If the proportion of a target population with certain characteristics is 0.50, Z statistics is 1.96 & we desire accuracy at 0.05 level, then the sample size is;- N=(1.962)(0.5)(0.5)/0.052 N=384.
If study population is < 10,000 nf=n/1+(n)/(N) nf= desired sample size, when study population <10,000 n= desired sample size, when the study population > 10,000 N= estimate of the population size Example, if n were found to be 400 and if the population size were estimated at 1000, then nf will be calculated as follows nf= 400/1+400/1000 nf= 400/1.4 nf=286
Sample size formula for comparison of groups If we wish to test difference (d) between two sub- samples regarding a proportion & can assume an equal number of cases (n1=n2=n’) in two sub samples, the formula for n ’ is n’=2 𝒛 𝟐 𝟐 pq/ 𝒅 𝟐 E.g. suppose we want to compare an experimental group against a control group with regards to women using contraception. If we expect p to be 40 & wish to conclude that an observed difference of 0.10 or more is significant at the 0.05 level, the sample size will be: n’= 2(1.96)2(0.4)(0.6)/0.12 =184 Thus, 184 experimental subject & another 184 control subjects are required.
Use of ready made table for sample size calculation How large a sample of patients should be followed up if an investigator wishes to estimate the incidence rate of a disease to within 10% of it’s true value with 95% confidence? The table show that for e =0.10 & confidence level of 95%, a sample size of 385 would be needed. This table can be used to calculate the sample size making the desired changes in the relative precision & confidence level .e.g. if the level of confidence is reduce to 90%, then the sample size would be 271. Such table that give ready made sample sizes are available for different designs & situation
Use of computer software for sample size calculation & power analysis The following software can be used for calculating sample size & power; Epi- info nQuerry Power & precision Sample STATA SPSS
Epi- info for sample size determination In STATCALC: 1 Select SAMPLE SIZE & POWER. 2 Select POPULATION SURVEY. 3 Enter the size of population (e.g. 15 000). 4 Enter the expected frequency (an estimate of the true prevalence, e.g.80% ± your minimum standard). 5 Enter the worst acceptable result (e.g. 75%) i.e the margin of error is 5%
CONCLUSIONS Sample size determination is one of the most essential components of every research Study. The larger the sample size, the higher will be the degree of accuracy, but this is limited by the availability of resources. It can be determined using formulae, ready made tables and computer soft wares. Steps: 1st Formulate a research question 2nd Select appropriate study design, primary outcome measure, statistical significance. 3rd use the appropriate formula to calculate the sample size.