Unit 5 Threats to validity in epidemiological study_c0b66b68-e197-48df-8fc2-50fd2d1fb7d2_3c46b8fa-60c9-47d9-baa2-c23799d8f5dd.pptx

subin940516 10 views 76 slides Sep 17, 2025
Slide 1
Slide 1 of 76
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76

About This Presentation

BHCM SEM IV Notes


Slide Content

Unit 5. Threats to validity in epidemiological study - Dr. RAJI SHRESTHA

CONTENTS Define bias (systematic error) and types of bias: (selection bias, misclassification/information bias, and confounding bias) Non-differential misclassification of disease and exposure Differential misclassification of exposure in a case-control study (recall bias) Ways to control confounding in the design phase of a study (matching, restriction, randomization) Ways to control for confounding in the analysis phase of a study (standardization, Mantel- Haenszel , regression)

Define bias and types of bias BIAS Bias is any systematic error in the determination of the association between the exposure and disease. The possibility of bias must be considered when evaluating a possible cause and effect relationship.

TYPES OF BIAS Many varieties of bias may arise in epidemiological studies. Some of these are : Bias due to confounding Memory or Recall bias Selection bias Berkersonian bias Interviewer’s Bias

(a) Bias due to confounding : Confounding bias is the result of having confounding variables in your model Mention has already been made about confounding as an important source of bias. This bias can be removed by matching in case control studies.

(b) Memory or recall bias : When cases and controls are asked questions about their past history, it may be more likely for the cases to recall the existence of certain events or factors, than the controls who are healthy persons. For example, those who have had a myocardial infarction might be more likely to remember and recall certain habits or events than those who have not. Thus cases may have a different recall of past events than controls.

(c) Selection bias The cases and controls may not be representative of cases and controls in the general population. There may be systematic differences in characteristics between cases and controls. The selection bias can be best controlled by its prevention (d) Berkesonian bias A special example of bias is Berkesonian bias, termed after Dr. Joseph Berkeson who recognized this problem. The bias arises because of the different rates of admission to hospitals for people with different diseases (i.e., hospital cases and controls).

(e) Interviewer's bias Bias may also occur when the interviewer knows the hypothesis and also knows who the cases are. This prior information may lead him to question the cases more thoroughly than controls regarding a positive history of the suspected causal factor. A useful check on this kind of bias can be made by noting the length of time taken to interview the average case and the average control. This type of bias can be eliminated by double-blinding.

Information Bias Due to systematic measurement error or misclassification of subjects on one or more variables, either risk factor or disease status. Types: Interviewer Bias Recall bias Observer bias Loss to follow up Hawthorne effect Surveillance bias

Types of Information Bias Interviewer Bias – an interviewer’s knowledge may influence the structure of questions and the manner of presentation, which may influence responses Recall Bias – those with a particular outcome or exposure may remember events more clearly or amplify their recollections Observer Bias – observers may have preconceived expectations of what they should find in an examination Loss to follow-up – those that are lost to follow-up or who withdraw from the study may be different from those who are followed for the entire study

Information Bias (cont.) Hawthorne effec t – an effect first documented at a Hawthorne manufacturing plant; people act differently if they know they are being watched Surveillance bias – the group with the known exposure or outcome may be followed more closely or longer than the comparison group

Misclassification bias Bias resulting from flawed definition of study variables or measurement of study variables Results in erroneous classification of subjects with regard to exposure and/or outcome. This is called misclassification. Misclassification (or classification error) happens when a participant is placed into the wrong population subgroup or category because of some kind of observational or measurement error. When this happens, the true link between exposure and outcome is distorted.

Cont …. People might be placed into the wrong groups because of: Incomplete medical records. Recording errors in records. Misinterpretation of records. Errors in records, like incorrect disease codes, or patients completing questionnaires incorrectly.

Types of Misclassification Bias Differential misclassification Non- differential misclassification

Differential misclassification Differential misclassification happens  when the information errors differ between groups.  In other words, the bias is different for exposed and non-exposed, or between those who have the disease and those do have not. Emphysema (a condition in which the air sacs of the lungs are damaged and enlarged, causing breathlessness) is diagnosed more frequently in smokers than in non-smokers. However, smokers may visit the doctor more often for other conditions (e.g. bronchitis) than non-smokers, which means that a reason smokers could be diagnosed with emphysema more often is simply because they go to the doctor more often — not because they actually have higher odds of getting the disease. Unless steps are taken to control for this possibility, emphysema will be under-diagnosed in non-smokers, which is a classification error because the diagnosis is related to the variable “how often smokers visit the doctor, versus non-smokers”.

Misclassification Bias (cont.) 250 100 150 100 50 50 Nonexposed 150 50 100 Exposed Total Controls Cases OR = ad/bc = 2.0 ; RR = a/(a+b)/c/(c+d) = 1.3 True Classification 250 100 150 90 50 40 Nonexposed 160 50 110 Exposed Total Controls Cases OR = ad/bc = 2.8 ; RR = a/(a+b)/c/(c+d) = 1.6 Differential misclassification - Overestimate exposure for 10 cases, inflate rates [ increase (something) by a large or excessive amount.]

Non-differential(random)misclassification Non-differential classification error happens when the information is incorrect, but is the same across groups.  It happens when exposure is unrelated to other variables (including disease), or when the disease is unrelated to other variables (including exposure). Bias introduced by non-differential misclassification is usually predictable (it goes towards the null value), but this isn’t always the case. In case-control studies, non-differential misclassification can happen when exposure status is incorrect for both controls and cases. In cohort studies, it happens when exposure status is incorrect for people with the disease and those without the disease.

Cont …. Many studies ask if a patient has “ever used” a particular drug. As this question covers an extremely large time span (possibly many decades), drug use might get erroneously (incorrectly) linked to some disease or condition. But as everyone in the study is asked the same error-inducing question, misclassification happens to everyone in the study. Note: The null value is a number corresponding to no effect, that is, no association between exposure and the health outcome. In epidemiology, the null value for a risk ratio or rate ratio is 1.0, and it is also 1.0 for odds ratios and prevalence ratios (terms you will come across).

Misclassification Bias (cont.) Cases Controls Total Exposed 100 50 150 Nonexposed 50 50 100 150 100 250 OR = ad/bc = 2.0 ; RR = a/(a+b)/c/(c+d) = 1.3 True Classification Cases Controls Total Exposed 110 60 170 Nonexposed 40 40 80 150 100 250 OR = ad/bc = 1.8 ; RR = a/(a+b)/c/(c+d) = 1.3 Non-differential misclassification - Overestimate exposure in 10 cases, 10 controls – bias towards null

OR… • There are two types of misclassification: – Non-differential misclassification – Differential misclassification • Definitions of these terms depend on the variable being measured (i.e., exposure or outcome )

Types of misclassification of outcome variables – Non-differential misclassification of outcome • The degree of outcome misclassification is not related to exposure status – Differential misclassification of outcome • The degree of outcome misclassification depends on the exposure status – this is a more serious problem

Types of misclassification of exposure variables – Non-differential misclassification of exposure • The degree of exposure misclassification is not related to outcome status – Differential misclassification of exposure • The degree of exposure misclassification varies by outcome status – this is a more serious problem

Non-differential misclassification of disease and exposure Non-differential misclassification: – Results in a bias toward the null when the exposure or disease that is misclassified is binary. – For example, when a binary exposure is measured with equal amount of error between case and control groups, it washes out the exposure-outcome Association. – This is a conservative bias, and the investigator at least knows that she/he is not presenting an artificially large association.

Non-differential misclassification when there are more than two categories of the exposure or disease does not necessarily result in bias towards the null. Categorization of a variable that has non-differential misclassification can generate differential misclassification.

Differential misclassification of exposure in a case-control study (recall bias) Differential misclassification of exposure or disease results in a bias in an unpredictable direction – it may be toward the null or away from the null. • It is possible to evaluate the bias on a case-by-case basis and speculate the direction of the bias, however the possibility of bias away from the null is problematic. • Generally considered a more serious problem than bias towards the null because – (a) the investigator does not know the direction of the bias with certainty, and – (b) if the bias is away from the null, the investigator risks presenting an artificially inflated effect estimate vs. an attenuated one

Recall bias Some specific exposure related information biases – Recall bias : occurs when participants are asked about past exposure after the outcome in question has occurred (or not), as often happens in case control studies.

CONFOUNDING FACTOR ……

What is a confounding factor? A " confounding factor" is defined as one which is associated both with exposure and disease, and is distributed unequally in study and control groups. An Example is given below to explain confounding. In the study of the role of alcohol in the etiology of oesophageal cancer , smoking is a confounding factor because: - it is associated with the consumption of alcohol - it is an independent risk factor for oesophageal cancer. In these conditions, the effects of alcohol consumption can be determined only if the influence of smoking is neutralized by matching

Ways to control confounding in the design phase of a study (randomization, restriction, matching) Randomization Restriction Matching There are various ways to modify a study design to actively exclude or control confounding variables including Randomization, Restriction and Matching.

Continued…… Confounding is a major problem in epidemiologic research, and it accounts for many of the discrepancies among published studies. Nevertheless, there are ways of minimizing confounding in the design phase of a study, and there are also methods for adjusting for confounding during analysis of a study.

1. RANDOMIZED TRIAL (Randomization) Randomization: subjects of study are randomly selected to even out unknown confounders The ideal way to minimize the effects of confounding is to conduct a large randomized clinical trial so that each subject has an equal chance of being assigned to any of the treatment options. If this is done with a sufficiently large number of subjects, other risk factors (i.e., confounding factors) should be equally distributed among the exposure groups. The beauty of this is that even unknown confounding factors will be equally distributed among the comparison groups. If all of these other factors are distributed equally among the groups being compared, they will not distort the association between the treatment being studied and the outcome.

The success of randomization is usually evaluated in one of the first tables in a clinical trial, i.e., a table comparing characteristics of the exposure groups. If the groups have similar distributions of all of the known confounding factors, then randomization was successful. However, if randomization was not successful in producing equal distributions of confounding factors, then methods of adjusting for confounding must be used in the analysis of the data.

Cont …….. Randomization is a process/method, which concern with no direction and no prediction for avoiding conscious and unconscious bias during allocation subjects to control or treatment group. By this method, an investigator avoids introduction of conscious and sub conscious biases by mixing all unite of population without direction and prediction so that every individual has equal chance to represent into the groups. It is an ideal method, applicable usually in experimental studies, which ensure equally distribution of potential confounding variables among study groups i.e. treatment and control. The main purpose is to achieve ‘equality’ of baseline characteristics of both treatment and control groups.

Strengths of Randomization There is no limit on the number of confounders that can be controlled. It controls for both known and unknown confounders. If successful, there is no need to "adjust" for confounding.

Limitations of Randomization to Control for Confounding It is limited to intervention studies (clinical trials) It may not be completely effective for small trials

2. RESTRICTION Limiting the study to subjects in one category of the confounder is a simple way of ensuring that all participants have the same level of the confounder. For example, If smoking is a confounding factor, one could limit the study population to only non-smokers or only smokers. If gender is a confounding factor, limit the participants to only men or only women If age is a confounding factor, restrict the study to subjects in a specific age category, e.g., persons >65.

Cont ……… Restriction is a process of limiting individual who has the particular characteristics or who do not matched the criteria of study subjects . Eligibility for entry into an analytic study is restricted to individuals within a certain range of values for the confounding factors, such as age, to reduce the effect of the confounding factor when it cannot be controlled by randomization. Subjects chosen for study are restricted to only those possessing a narrow range of characteristics, to equalize important extraneous factors. For example, in a study of the effects of smoking and lung cancer, the participation in the study could be restricted to alcohol users, thus, removing any potential effect of confounding by alcohol consumption. Example :OCP example - restrict study to women having at least one child

Drawbacks of Restriction Restriction is simple and generally effective, but it has several drawbacks: It can only be used for known confounders and only when the status of potential subjects is known with respect to that variable Residual confounding may occur if restriction is not narrow enough. For example, a study of the association between physical activity and heart disease might be restricted to subjects between the ages of 30-60, but that is a wide age range, and the risk of heart disease still varies widely within that range. Investigators cannot evaluate the effect of the restricted variable, since it doesn't vary Restriction limits the number of potential subjects and may limit sample size If restriction is used, one cannot generalize the findings to those who were excluded. Restriction is particularly cumbersome if used to control for multiple confounding variables.

3. MATCHING Another risk factor can only cause confounding if it is distributed differently in the groups being compared. Therefore, another method of preventing confounding is to match the subjects with respect to confounding variables. This method can be used in both cohort studies and in case-control studies in order to enroll a reference group that has artificially been created to have the same distribution of a confounding factor as the index group.

Cont … The process of making a study group and a comparison group comparable with respect to extraneous factors The individual cases are matched with individual controls having similar characteristics. It is the process of selecting controls in such a way that they are similar to the cases in certain characteristics, such as age, sex, race, occupation and socio-economic status , so that the selected controls are of same background to the cases, having also equal chance to be exposed to the risk factor.

For example: In a case-control study of lung cancer where age is a potential confounding factor, match each case with one or more control subjects of similar age. If this is done the age distribution of the comparison groups will be the same, and there will be no confounding by age. In a cohort study on effects of smoking each smoker (the index group) who is enrolled is matched with a non-smoker (reference group) of similar age. Once again, the groups being compared will have the same age distribution, so confounding by age will be prevented

Matching: for each patient in one group there is one or more patients in comparison group with same characteristics, except for the factor of interest. E.g. matching done for age ,sex ,race etc

Cont ….. There are several kinds of matching procedure. Group matching : Group matching can be done by assigning cases to sub-categories (strata) based on their characteristics ( e.g , age, occupation, social class) and then establishing appropriate controls. The frequency distribution of the matched variable must be similar in the study and comparison groups. Pair matching : matching is also done by pairs. For example, for each case, a control is chosen which can be matched quite closely. Thus, if we have a 50-year old worker with a particular disease, we will search for 50-year old worker without the disease as a control. Thus one can obtain pairs of patients and controls of the same sex, age, duration and severity of illness, etc.

Advantages of Matching Matching is particularly useful when trying to control for complex or difficult to measure confounding variables, e.g., matching by neighborhood to control for confounding by air pollution. It can also be used in case-control studies with few cases when additional control subjects are enrolled to increase statistical power, e.g., 4 to 1 matching of controls to cases.

Drawbacks of Matching It can only be used for known confounders. It can be difficult, expensive, and time-consuming to find appropriate matches. One cannot evaluate the effect of the matched variable. Matching requires special analytic methods. Controls for bias for only those factors involved in the match Usually not possible to match for more than a few factors because of the practical difficulties of finding patients that meet all matching criteria

Ways to control for confounding in the analysis phase of a study (Standardization, Mantel- Haenszel , Regression) Standardization Mantel- Haenszel Regression

1. Standardization Standardization is a method of computing and comparing adjusted rates of disease that indicate how the groups would have differed if they had had the same distribution of confounders. For example, if age is a confounding factor when evaluating an association, another strategy is to evaluate the association in different age groups and calculate the measure of association in each stratum of age. For example, if age is a confounder of the relation between physical activity and CHD, we could stratify the analysis into separate age groups in order to evaluate the association between activity and CHD separately for each age group.

Cont ….. A set of techniques used to remove as far as possible the effects of differences in age or other confounding variables when comparing two or more populations The method uses weighted averaging of rates specific for age, sex, or some other potentially confounding variable(s), according to some specified distribution of these variables. Standard population: A population in which the age and sex composition is known precisely, as a result of a census or by an arbitrary means – e.g. an imaginary population, the “standard million” in which the age and sex composition is arbitrary. A standard population is used as comparison group in the actuarial procedure of standardization of mortality rates. (e.g. Segi world population, European standard population)

Cont …. Types of standardization: Direct: the specific rates in a study population are averaged using as weights the distribution of a specified standard population. The standardized rate so obtained represents what the rate would have been in the study population if that population had the same distribution as the standard population w.r.t. the variables for which the adjustment or standardization was carried out. Indirect: used to compare the study populations for which the specific rates are either statistically unstable or unknown. The specific rates are averaged using as weights the distribution of the study population. The ratio of the crude rate for the study population to the weighted average so obtained is known as standardized mortality (or morbidity) ratio, or SMR.

2. Mantel- Haenszel method (Stratification) The Mantel- Haenszel method is a technique that generates an estimate of an association between an exposure and an outcome after adjusting for or taking into account confounding. The method is used with a dichotomous outcome variable and a dichotomous risk factor. Non-regression technique used to identify confounders and to control for confounding in the statistical analysis phase rather than the design phase of a study. Mantel- Haenszel methods are available for odds ratio, rate ratios, and risk difference Same principle apply (stratify & use M-H to summarize and tests

Cont … Stratification: The process of or the result of separating a sample into several sub-samples according to specified criteria such as age groups, socio-economic status etc. The effect of confounding variables may be controlled by stratifying the analysis of results After data are collected, they can be analyzed and results presented according to subgroups of patients, or strata, of similar characteristics.

Using stratification in confounding For Example: exposure = gender outcome = depression Gender % Depressed Male 17.7% Female 26.0% RR=1.47 Is pain severity a confounder? Pain associated with gender (exposure), depression (outcome), not a result of gender. So it is a possible confounder.

3. Regression Including confounding variables in a regression model allows the analysis to control for them and prevent the spurious effects that the omitted variables would have caused otherwise. Theoretically, you should include all independent variables that have a relationship with the dependent variable. Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable(s) (predictor). This technique is used for forecasting, time series modelling and finding the causal effect relationship between the variables. For example , relationship between rash driving and number of road accidents by a driver is best studied through regression.

Regression……. (continued..)

Errors in epidemiology: All the epidemiological studies are subjected to error, which can obscure or minimize the truth, the size and nature of a causal relationship. In epidemiology, an error can be defined as the deviation from true value or finding. Error can occur at any stage of the research process such as from study design to data collection, analysis and interpretation. Types of Error in Epidemiological Studies A. Random Error B. Systematic Error

A. Random Error • Random error refers to the fluctuations around a true value. • The effect of random error may produce an estimate that is different from the true underlying value. That means it may result in either an underestimation or overestimation of the true value. • It can produce type I or type II errors. In statistics, a Type I error means rejecting the null hypothesis when it's actually true, while a Type II error means failing to reject the null hypothesis when it's actually false. Sources of Random Error 1. Biologic Variation: It refers to the fluctuation in biological processes in the same individual over time. 2. Sampling Error: The part of the total estimation error caused by random influences on who or what is selected for the study. 3. Measurement Error: The error resulting from random fluctuations in measurement

Minimizing Random Errors • Sampling error can be reduced by increasing the size of a sample population. (the more individuals drawn from a population, the more likely it is that the sample will reflect the true composition of that population) • Measurement error can be minimized by ensuring that optimal instruments are used to provide the most accurate measurement of the exposure (for example, alcohol consumption or cigarettes smoked) and the outcomes (i.e. disease, injury, state of health or function).

B. Systematic Error (Bias) • The systematic error refers to any difference between the true value and the actual value obtained in the study that is not the result of random error. • It is the use of an invalid measure that misclassifies cases in one direction and misclassifies controls in another. • Systematic error or bias is more problematic as it can significantly affect the validity of a study.

Main sources of Systematic Error 1. Selection bias • Selection bias can result when the selection of subjects into a study leads to a result that is different from the results if you had enrolled the entire target population. 2. Information bias • Information bias results from systematic differences in the way data on exposure or outcome are obtained from the various study groups. Information bias occurs when information is collected differently between two groups, leading to an error in the conclusion of the association.

3. Observer bias • This may be a result of the investigator’s prior knowledge of the hypothesis under investigation or knowledge of an individual’s exposure or disease status. 4. Interviewer bias • This occurs where an interviewer asks leading questions that may systematically influence the responses given by interviewees. 5. Confounding • The word comes from the Latin word “ confundere ” meaning to mix together. • It occurs when an unstudied risk factor is associated with both the study exposure and the outcome, which results in a distortion of the estimated effect of an exposure on an outcome.

Minimizing Systematic Errors • Minimize the chance of bias in the study . Example: use more than one control group. • Clear definition of the study population • Set up strict guidelines for data collection • Train observers or interviewers to obtain data in the same fashion • Randomly allocate observers/interviewers data collection assignments • Use multiple sources of information • Institute a masking process if appropriate • Build-in methods to minimize loss to follow-up • Standardize measurement instruments

CONCLUSION: Strategies to reduce confounding are: Randomization (aim is random distribution of confounders between study groups) Restriction (restrict entry to study of individuals with confounding factors - risks bias in itself) Matching (of individuals or groups, aim for equal distribution of confounders) Mantel- Haenszel /Stratification (confounders are distributed evenly within each stratum) Standardization/adjustment (usually distorted by choice of standard) Regression/ multivariate analysis (only works if you can identify and measure the confounders)

Assignment -5 Short questions Define bias and list out its types. Type I and Type II error. Define confounding. Define confounding variable. Define differential misclassification What is recall bias? Berkersonian bias. Define Hawthorne effect. Long questions Define bias and types of bias used in epidemiological study Define bias and explain its types and ways of control and mitigation Define confounding. Describe methods to control confounding along with examples. How do you control bias in design phase of the study. Explain How do you control bias in analysis phase of the study. Explain Define error and explain the major source of error in measurement of disease. What is systematic error? Give example and ways to control them (5)

ASSIGNMENT-5 Define Bias and Explain the types of Bias. Briefly explain about Misclassification. Describe information bias. Explain the ways to control confounding in the design phase of a study. Explain the ways to control for confounding in the analysis phase of a study 5. Write Short notes: Non-differential misclassification of disease and exposure Differential misclassification of exposure in a case-control study Confounding Factor and its examples.

Case study unit 5 In a research study aimed at investing the prevalence of a specific disease in a population, a team of epidemiologist’s encountered challenges related to data collection errors. These errors have the potential to impact the accuracy and reliability of the study’s findings. As a researcher/epidemiologist, you have been tasked with identifying the sources of error and proposing strategies to mitigate them. Questions: a) What are the potential sources of error in epidemiological data collection? b) How can the research team ensure the accuracy and completeness of collected data? c) What strategies can be implemented to reduce recall bias in retrospective studies? d) How can the team ensure the reliability and validity of self-reported data? e) How can the team conduct quality control checks to identify and rectify errors in the data?