RoB, Quality Assessment and GRADE for Systematic Reviews: Dr Carlos Andrade

ACSRM 9 views 79 slides Nov 02, 2025
Slide 1
Slide 1 of 79
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79

About This Presentation

Objective: Clarify the distinctions and interrelations
between risk of bias, methodological quality, and GRADE.

Why it matters:
Systematic reviews and meta-analyses rely on the accuracy of included studies. Misunderstanding theseterms can lead to incorrect conclusions or misleading recommendations....


Slide Content

RISK OF BIAS, QUALITY
ASSESSMENT AND GRADE:
WHAT’S THE DIFFERENCE?
Webinar series
Carlos Alexandre Andrade

A LITTLE BIT ABOUT
MY BACKGROUND...
DDS in Dentistry – University of Brasília (Brazil)
Master’s in Dentistry – University of Brasília (Brazil)
Specialization in Dental Implants – Instituto Aria (Brazil)
PhD in Health Sciences – University of Debrecen (Hungary)
Researcher – Ministry of Health (Brazil)
Postdoctoral Fellowship – University of Liège (Belgium)

SOURCE MATERIAL

Objective: Clarify the distinctions and interrelations
between risk of bias, methodological quality, and GRADE.
Why it matters:
Systematic reviews and meta-analyses rely on the
accuracy of included studies.
Misunderstanding these terms can lead to incorrect
conclusions or misleading recommendations.
For example: RCT with unclear randomization might
mislead results despite having good reporting quality.
INTRODUCTION

CAKE ANALOGY
Was each ingredient fresh
and uncontaminated?
Did the baker follow the
recipe carefully?
Does the final cake taste
and look consistently
good?
If the eggs are rotten, the cake will taste
bad no matter how well you bake it.
Mixing correctly, baking at the right
temperature, not skipping steps.
RISK OF BIAS
METHODOLOGICAL
QUALITY
GRADE
It needs to be good enough in order to
send the recipe to a friend or to repeat
the process and the cake to have same
taste and look.

Bias
Systematic error that can be
introduced at any stage of a study,
leading to distortions in its results
and, therefore, threatening internal
validity.
Generally, this error is unintentional
and may be related to the
impossibility of a practical solution,
as well as to the researchers' failure
to recognize the error.
INTRODUCTION
Internal Validity
This is the degree of confidence
that a study's conclusions are
valid for the sample studied.
Extent to which a study
accurately demonstrates a causal
relationship between variables,
free from the influence of
confounding factors or biases.
External Validity
This refers to the applicability of a study's conclusions to the population
from which the sample was drawn or to other populations.

Risk of Bias
Focuses specifically on internal
validity: whether design,
conduct, or analysis distorted
results.
Involves critical judgment of
potential systematic errors.
Uses domain-based tools (e.g.,
RoB 2.0, ROBINS-I). No scoring.
Rather, classifies risk as low,
some concerns, or high.
Quality Assessment
Broader concept: evaluates how
well a study was designed and
conducted.
May include aspects like
sample size calculation, ethical
approval, and adherence to
reporting guidelines
(CONSORT, STROBE).
Often uses scales or checklists
(e.g., Jadad, PEDro, NOS) that
generate scores.
INTRODUCTION
Key Difference:
➡️ Quality assessment measures methodological rigor;
➡️ Risk of bias measures trustworthiness of results.

INTRODUCTION
1. Learn the specifics of the
tool
Before applying it, read the
original article and official
documentation of the tool. Ensure
the review team fully understands
each domain and criterion.
2. Train and calibrate reviewers
Perform pilot testing with a few
studies (3–6). Discuss results to
align interpretation and judgment
among reviewers, improving
reliability.
3. Independent assessment
At least two reviewers evaluate
each study separately to
minimize subjective influence
and bias.
4. Consensus meeting
Compare results and resolve
disagreements through
discussion. If disagreement
persists, involve a third reviewer
as a tiebreaker.
5. Interpret and report results
Describe clearly why a study was
judged low/high risk of bias.
Present results in tables or figures
and integrate them into the
review’s Results, Discussion, and
Conclusions sections.
STEPS FOR RISK OF BIAS OR QUALITY ASSESSMENT:

1. Selection bias
Arises from flaws in random
allocation or concealment,
leading to non-comparable
groups. Proper random
sequence generation and
allocation concealment are
essential to prevent selection
bias.
UNDERSTANDING TYPES OF BIAS
Randomized Clinical Trials (RCTs)
4. Detection bias
Happens when outcomes are
measured inaccurately or
differently across groups. This
is especially problematic for
subjective measures. Double-
blinding and standardized
procedures help minimize it.
2. Performance bias
Occurs when researchers apply
different care or treatments to
groups, often due to lack of
blinding. These performance
differences distort the true
intervention effect.
3. Attrition bias
Results from participant
dropout or incomplete outcomes.
If missing data are related to the
results or differ between groups,
estimates become biased.
Transparent handling and
explanation of losses are crucial.
5. Reporting bias
Involves reporting only
favorable or statistically
significant outcomes while
omitting others. Comparing the
final report with the original
protocol helps detect this “cherry-
picking.”
6. Funding bias
Emerges from financial or
professional conflicts of interest.
Industry-sponsored studies may
favor positive results. Declaring
and critically evaluating funding
sources helps ensure impartiality.

1. Confounding Bias
Occurs when external
factors (e.g., disease severity,
comorbidities, socioeconomic
status) influence both the
intervention and outcome,
distorting the true causal
effect.
UNDERSTANDING TYPES OF BIAS
Non-Randomized Studies of Intervention
4. Reporting Bias
Happens when only
favorable or significant
results are reported, while
other findings are omitted or
altered, creating a distorted
view of the evidence.
2. Selection Bias
Arises when participants or
data are systematically
excluded or lost to follow-up,
altering the relationship
between intervention and
outcome. More common in
non-randomized studies.
3. Information Bias
Results from inaccurate
measurement or
classification of exposure or
outcomes (e.g., recall errors,
uncalibrated tools), leading
to misrepresentation of true
effects.

Research where investigators do not intervene. They only observe, record, and analyze
data statistically.
I. Descriptive Observational Studies
No comparison group; used to describe disease patterns and generate hypotheses.
Case Report: Description of up to three patients.
Case Series: Description of more than three patients, often consecutive cases.
II. Analytic Observational Studies
Include a comparison or control group; used to test associations or hypotheses.
A. Group-Level Analysis
Ecological Study: Unit of analysis is a group (e.g., region or country); data are
aggregated, not individual.
B. Individual-Level Analysis
Cross-Sectional Study: Measures exposure and outcome at the same time; useful for
prevalence, not causation.
Case-Control Study: Starts with the outcome and looks backward to assess exposure.
Cohort Study: Starts with exposure and follows participants over time to observe
outcomes; can be prospective or retrospective.
UNDERSTANDING TYPES OF BIAS
Observational Studies

1. Selection Bias
Occurs when participants in exposed or
outcome groups differ systematically from
controls, affecting comparability and
generalizability.
UNDERSTANDING TYPES OF BIAS
Observational Studies
4. Performance Bias
Occurs when awareness of group
allocation changes participant or
researcher behavior.
Example: participants altering habits
during follow-up because they know they
are being observed.
2. Information Bias
Results from inaccurate measurement or
classification of exposure or outcome.
Often arises when assessors are unblinded
or instruments are poorly calibrated.
3. Confounding Bias
A third variable influences both exposure
and outcome, distorting the true
association.
Can create false associations, hide real
ones, or misestimate effects.
Often controlled with statistical
adjustments or careful study design.

Risk of Bias
Interventional
RCT
Observational
RCT
Jadad
PEDro
Cohort
JBI NOS
Methodological
Quality Assessment
GRADE
Prognosis
Diagnostic
NRSI
ROBINS-I RoB 2
Overviews
ROBIS
QUADAS-2
QUIPS
Overviews
AMSTAR 2
Interventional
Cross-sectional
Prevalence
Case-control
JBI
JBI NOS

Risk of Bias
Interventional
RCT
Methodological
Quality Assessment
GRADE
RoB 2

Tools for Systematic Reviews
1. Methodological Milestone
RoB 1.0, developed by Cochrane, was a key tool for assessing risk of bias in randomized trials, focusing solely on bias and using
separate domains without scoring systems.
2. Initial Methodological Superiority
It improved upon earlier tools that mixed bias with precision, applicability, or reporting completeness, requiring transparent
reasoning for each judgment.
3. Excessive Use of “Unclear Risk”
A common limitation was frequent assignment of “unclear risk,” leading to few studies being classified as low risk of bias.
4. Inconsistency and Limited Scope
Sometimes domains were inconsistently added or removed; the tool was less suitable for complex designs like cross-over or
cluster-randomized trials.
5. Interpretation Challenges and Outcome Focus
Difficulties included assessing incomplete outcomes and selective reporting, plus a lack of clear guidance for overall risk
assessment for specific outcomes, motivating the development of RoB 2.0.
Risk of Bias for RCTs
ROB 1.0 (COCHRANE RISK OF BIAS TOOL):

Tools for Systematic Reviews
Key Concepts and Changes Introduced by RoB 2.0
• Bias vs. Imprecision: The tool distinguishes bias (systematic error that threatens internal validity) from imprecision (random
error related to sample size and reflected in the confidence interval). An RCT can be well-reported (following CONSORT guidelines)
but still have a high risk of bias.
• Focus on the Outcome: A major shift from the previous version (RoB 1.0) is that RoB 2.0 assesses bias with a focus on the
specific result (outcome) rather than the study as a whole. For instance, lack of blinding may lead to high risk of bias for subjective
outcomes (like pain) but low risk for objective outcomes (like mortality).
• Structure: RoB 2.0 utilizes five fixed domains to cover all major sources of bias that can affect an RCT, followed by a formal
overall risk of bias judgment.
• Judgment Process: The tool uses signaling questions and a supporting algorithm to guide reviewers to a judgment for each
domain. The judgment options are "Low risk of bias," "High risk of bias," or "Some concerns" (the latter replacing the ambiguous
"unclear risk" found in RoB 1.0).
• Effect of Interest: Assessing bias depends on specifying the nature of the effect being analyzed: the effect of assignment
(intention-to-treat) or the effect of adherence (per-protocol).
Risk of Bias for RCTs
ROB 2.0 (COCHRANE RISK OF BIAS TOOL 2):

Tools for Systematic Reviews
Possible answers to the questions:
(1) Yes;
(2) Probably yes;
(3) Probably no;
(4) No;
(5) No information;
(6) Not applicable
Overall Risk of Bias Judgment RoB 2.0
The overall risk of bias for a specific result is determined based on the combination of the five domains:
• Low Risk: All five domains must be judged as having a low risk of bias.
• Some Concerns: One of the domains is judged to have some concerns, with no domains classified as high
risk of bias.
• High Risk: One or more domains are judged as "High Risk of Bias," or multiple domains are classified as
having "Some Concerns".
The results of the RoB 2.0 assessment should be reported narratively and typically illustrated using graphical
displays, such as the "traffic light" plot or "weighted bar" graphs.
Risk of Bias for RCTs
ROB 2.0 (COCHRANE RISK OF BIAS TOOL 2):

Tools for Systematic Reviews
Risk of Bias for RCTs
ROB 2.0 (COCHRANE RISK OF BIAS TOOL 2):
Overall judgment: the worst domain (cannot “average” scores)
Focus: specific outcome, not the whole study
No numeric scoring: qualitative, domain-based reasoning

Let’s take a look at...
RoB 2.0 tool

Risk of Bias
Interventional
RCT
RCT
Jadad
Methodological
Quality Assessment
GRADE
RoB 2
Interventional

Tools for Systematic Reviews
It was developed in 1996 ( University of Oxford), originated as a tool for assessing the methodological quality of RCTs, initially in pain
research, but later adapted for use across various health fields.
Key aspects of the Jadad Scale include:
• Popularity: It is a highly cited and frequently used instrument within the scientific community due to its perceived ease of
application and interpretation.
• Structure and Scoring: The scale uses five binary questions (Yes = 1 point, No = 0 points) to evaluate three specific methodological
domains: randomization, blinding (masking), and reporting of participant losses/withdrawals (dropouts). Points can be added or
subtracted if the reported method for randomization or blinding is considered inadequate or incorrect.
• Interpretation: The total score ranges from 0 to 5. Studies scoring 3 to 5 points are categorized as having "high methodological
quality," while scores of 0 to 2 points indicate "low methodological quality".
• Limitation: Despite its widespread use, a notable limitation of the Jadad Scale is that it does not evaluate allocation concealment,
which is a crucial element for assessing potential bias in RCTs.
• Reporting: The results of the Jadad Scale assessment can be presented in an additional column within the main descriptive table of
included studies in a Systematic Review (SR) or in a separate, dedicated table.
Methodological Quality Assessment for RCTs
JADAD SCALE

Tools for Systematic Reviews
Methodological Quality Assessment for RCTs
JADAD SCALE
Quick, simple checklist, mainly for RCTs
Focuses on reporting quality, not deep bias analysis
Easy but limited: may overrate well-written but biased
trials

Let’s take a look at...
Jadad scaleJadad

Risk of Bias
Interventional
RCT
RCT
Jadad
PEDro
Methodological
Quality Assessment
GRADE
RoB 2
Interventional

Tools for Systematic Reviews
• Origin and Purpose: The PEDro Scale was developed in 1998 by a team of physiotherapists at the Institute for
Musculoskeletal Health at the University of Sydney. Its initial goal was to maximize the effectiveness of physiotherapy
services by facilitating the clinical application of the best available evidence, but its use has since expanded to other health
areas.
• Methodology and Reliability: The scale is based on the work of Verhagen et al.. Its reliability has been classified as
acceptable, and it is considered suitable for assessing methodological quality in RCTs included in Systematic Reviews (SRs).
• Structure and Scoring: The PEDro Scale consists of 11 criteria.
◦ The first item ("eligibility criteria were specified") relates to external validity and is not counted in the final
methodological quality score.
◦ The final score ranges from 0 to 10 points, based on criteria 2 through 11.
Methodological Quality Assessment for RCTs
PEDRO SCALE (PHYSIOTHERAPY EVIDENCE DATABASE)

Tools for Systematic Reviews
Key Domains Assessed
◦ Random allocation (randomization).
◦ Allocation concealment.
◦ Baseline comparability of groups regarding prognostic indicators.
◦ Blinding of subjects, therapists, and outcome assessors.
◦ Adequate follow-up (measurements of at least one key outcome obtained from >85% of subjects).
◦ Intention-to-treat analysis (or adherence to assigned treatment/control condition).
◦ Reporting of between-group statistical comparisons.
◦ Reporting of both point measures and measures of variability for at least one key outcome.
Methodological Quality Assessment for RCTs
PEDRO SCALE (PHYSIOTHERAPY EVIDENCE DATABASE)

Tools for Systematic Reviews
• Interpretation of Scores:
◦ 0 to 4 points: Low quality.
◦ 5 to 6 points: Intermediate quality.
◦ 7 to 10 points: High quality.
• Training: Online training is available for the scale.
• Context in SRs: While the PEDro Scale is widely accepted and easy to apply, it is a tool for assessing
methodological quality, not assessing the risk of bias.
Therefore, for SRs of interventions, tools like the Cochrane RoB 2.0 should be the primary choice, although
the simultaneous use of methodological quality instruments like PEDro is not precluded.
Methodological Quality Assessment for RCTs
PEDRO SCALE (PHYSIOTHERAPY EVIDENCE DATABASE)

Tools for Systematic Reviews
Methodological Quality Assessment for RCTs
PEDRO SCALE (PHYSIOTHERAPY EVIDENCE DATABASE)
1: eligibility criteria and source of participants; 2: random allocation; 3: concealed allocation; 4: baseline
comparability; 5: blinded participants; 6: blinded therapists; 7: blind assessors; 8: adequate follow up; 9:
intention-to-treat analysis; 10: between group comparison; 11: point estimates and variability.
*Item 1 does not contribute to the total score.
The final score can range from 0 to 10
points.
The first criterion is not considered in
the final score, as it assesses the
external validity of the study.
Therefore, criteria 2 to 11, when
classified as satisfactory, are added
together, resulting in the final score.

Let’s take a look at...
PEDro scale

Risk of Bias
Interventional
RCT
RCT
Jadad
PEDro
Methodological
Quality Assessment
GRADE
NRSI
ROBINS-I RoB 2
Interventional

TOOLS FOR SYSTEMATIC REVIEWS
Designed specifically to assess the risk of bias in NRSI.
Purpose and Context:
• Necessity: NRSIs are the first choice when conducting RCTs is not feasible due to ethical, financial, or
practical constraints, such as assessing interventions in public health, long-term outcomes, rare adverse effects,
or real-world populations.
• Tool Development: ROBINS-I was created through expert consensus, drawing on prior experience with tools
like the RoB 1.0.
• Study Designs: The tool is specific for estimating the effectiveness (harm or benefit) of an intervention in
studies that did not use randomization. Although it can be applied to designs like case-control, cross-sectional,
and interrupted time series, its detailed guidance is specifically tailored for cohort-type designs where
individuals receiving different interventions are followed over time.
The application of ROBINS-I is based on evaluating the NRSI against a hypothetical "target randomized
controlled trial (RCT)".
Risk of bias for Non-randomized Studies of Interventions
ROBINS-I tool (Risk Of Bias in Non-randomized Studies of Interventions)

TOOLS FOR SYSTEMATIC REVIEWS
• Establishing Comparability: Reviewers must first establish the details of this ideal
target RCT (population, experimental intervention, comparator, and outcomes).
• Low Risk Equivalence: If an NRSI is judged to have a low risk of bias using ROBINS-I, it
is considered comparable to a well-executed RCT. This distinction is critical because it
allows NRSI that receive a "low risk of bias" judgment to start with high certainty of
evidence when assessed using the GRADE approach.
• Effect of Interest: Before evaluation, the reviewers must specify the desired effect: the
intention-to-treat effect (based on baseline allocation) or the per-protocol effect (based on
adherence to the treatment regimen).
Risk of bias for Non-randomized Studies of Interventions
ROBINS-I tool (Risk Of Bias In Non-randomized Studies of Interventions)

TOOLS FOR SYSTEMATIC REVIEWS
Risk of bias for Non-randomized Studies of Interventions
ROBINS-I tool (Risk Of Bias In Non-randomized Studies of Interventions)
Pre-Assessment Requirements
Reviewers must identify and list potential issues
that are endemic to non-randomized designs:
1. Confounding Factors: Prognostic factors that
predict both receiving the intervention and the
outcome (e.g., pre-existing disease severity,
socioeconomic status, access to care).
2. Co-interventions: Interventions or exposures
received by participants concurrently or after the
intervention of interest, which are related to the
intervention received and prognostic for the
outcome.
The Seven Domains of Bias
The ROBINS-I tool is composed of seven fixed domains of
bias, grouped by the timing of their occurrence relative to
the intervention:
Pre-intervention
1. Bias due to confounding
2. Bias in selection of study participants
At intervention
3. Bias in classification of intervention
Post-intervention
4. Bias due to deviation from intended intervention
5. Bias due to missing data
6. Bias due to measurement of the outcome
7. Bias due to selective reporting of results

TOOLS FOR SYSTEMATIC REVIEWS
Scoring and Interpretation
• Signaling Questions: Each domain contains hierarchical "signaling questions"
which guide the reviewer's judgment. Responses include "Yes," “Probably Yes,”
"No," “Probably No,” "No information," and "Not applicable".
• Domain Judgment: The judgment for each domain is based in categories: Low,
Moderate, Serious, or Critical. These judgments assess the potential gravity of the
bias and its consequences for the outcome.
• Overall Judgment: The Overall Risk of Bias for a specific outcome is based on
the highest risk assigned across the seven domains. For instance, if one domain
receives a "Critical risk of bias," the overall judgment is "Critical".
Risk of bias for Non-randomized Studies of Interventions
ROBINS-I tool (Risk Of Bias In Non-randomized Studies of Interventions)

TOOLS FOR SYSTEMATIC REVIEWS
Risk of bias for Non-randomized Studies of Interventions
ROBINS-I tool (Risk Of Bias In Non-randomized Studies of Interventions)

TOOLS FOR SYSTEMATIC REVIEWS
Risk of bias for Non-randomized Studies of Interventions
ROBINS-I tool (Risk Of Bias In Non-randomized Studies of Interventions)

TOOLS FOR SYSTEMATIC REVIEWS
Risk of bias for Non-randomized Studies of Interventions
ROBINS-I tool Version 2
Flaw in v1: ROBINS-I v1 (2016) was criticized for being complex, subjective, and inconsistent,
as it relied too much on assessor judgment without clear algorithms.
Improved structure: ROBINS-I v2 introduces explicit algorithms that map answers to risk-of-
bias judgments, improving transparency and reproducibility.
Clearer questions: It refines signalling questions and answer options (adding strong/weak
yes/no), reducing ambiguity and improving inter-rater agreement.
Efficiency feature: A new triage section allows quick identification of studies at critical risk of
bias, making the assessment process more efficient.
Domain updates: Bias domains are reorganized and expanded, addressing new issues such as
immortal time bias and clarifying differences between intention-to-treat and per-protocol
effects.
Overall benefit: ROBINS-I v2 enhances clarity, usability, and consistency, directly correcting
the main weaknesses of v1 and aligning better with the RoB 2 tool.

Let’s take a look at...
ROBINS-I v2

Risk of Bias
Interventional
RCT
RCT
Jadad
PEDro
Methodological
Quality Assessment
GRADE
Diagnostic
NRSI
ROBINS-I RoB 2
QUADAS-2
Interventional

TOOLS FOR SYSTEMATIC REVIEWS
QUADAS = Quality Assessment of Diagnostic Accuracy Studies
I. Background and Development
• Initial Tool: The original QUADAS tool was developed after a Delphi procedure involving
experts and was based on prior systematic reviews of diagnostic accuracy studies.
• Need for Revision: User feedback and the Cochrane Collaboration suggested improvements,
addressing issues like classifying certain items (e.g., patient spectrum, indeterminate results,
withdrawals) and overlap among domains.
• QUADAS-2 Creation: Developed through expert consensus, QUADAS-2 resulted from
refining the original tool and integrating new evidence on sources of bias and variation in
diagnostic accuracy studies. A key decision in the new version was to separate "quality" into
"risk of bias" and "applicability concerns".
Risk of Bias Analysis of Diagnostic Accuracy Studies
QUADAS 2 TOOL

II. Structure and Domains of QUADAS-2
The QUADAS-2 tool is composed of four domains, which are evaluated for Risk of Bias and, for the
first three domains, Applicability Concerns:
TOOLS FOR SYSTEMATIC REVIEWS
Risk of Bias Analysis of Diagnostic Accuracy Studies
QUADAS 2 TOOL
Patient Selection
Assesses if the selection of patients
could introduce bias (e.g.,
inappropriate exclusions, avoiding
case-control design) and if the
patients included match the target
population of the review question.
Index Test (s)
Evaluates if the conduct or
interpretation of the index test (the
test being studied) could introduce
bias.
It also assesses if the test methods
align with the review question.
Reference Standard
Determines if the conduct or
interpretation of the "gold standard"
test could introduce bias.
It also assesses if the target
condition, as defined by the
reference standard, matches the
review question.
Flow and Timing
Assesses the patient flow through
the study and the timing of the tests,
addressing potential biases such as
inappropriate intervals between the
index test and reference standard,
whether all patients received the
same reference standard, and
whether all patients were included
in the analysis.

TOOLS FOR SYSTEMATIC REVIEWS
III. Assessment and Interpretation
• Signaling Questions: Each domain includes "signaling questions" to guide the reviewer's judgment, answered as
"Yes," "No," or "Unclear".
• Risk of Bias Judgment: The risk of bias for each domain is judged as "Low," "High," or "Unclear". If all signaling
questions yield "Yes," the domain is judged "Low risk of bias".
• Applicability Judgment: Concerns about applicability are judged as "Low," "High," or "Uncertain". This judgment
determines if the study's patient population, index test, or reference standard differs significantly from what is specified
in the systematic review's question.
• Overall Judgment: An overall judgment of "low risk of bias" or "low concern regarding applicability" is considered
appropriate only if all domains are classified as "Low". If one or more domains are classified as "High" or "Uncertain," the
overall judgment may be "High risk of bias" or "High applicability concerns".
IV. Relevance to Systematic Reviews
QUADAS-2 is the recommended tool regardless of the specific study design included.
The results should be clearly reported using narrative descriptions, tables, or figures (like traffic light plots or weighted
bar graphs) to summarize the quality of the evidence.
Risk of Bias Analysis of Diagnostic Accuracy Studies
QUADAS 2 TOOL

QUADAS 2 TOOL

Let’s take a look at...
QUADAS-2
QUADAS-2

Risk of Bias
Interventional
RCT
RCT
Jadad
PEDro
Methodological
Quality Assessment
GRADE
Prognosis
Diagnostic
NRSI
ROBINS-I RoB 2
QUADAS-2
QUIPS
Interventional

TOOLS FOR SYSTEMATIC REVIEWS
I. Purpose and Development of QUIPS
QUIPS =Quality In Prognosis Studies
The goal of SRs of prognostic studies is to synthesize clinical variables that
predict future events or explore etiological risk factors.
Given the methodological complexities and lack of standardization in evaluating
these studies, the QUIPS tool was developed by a working group of
epidemiologists and statisticians to standardize and improve bias analysis.
QUIPS was based on a prior identification of six potential sources of bias in
prognostic research, which then formed the six main domains of the tool.
Risk of Bias Analysis of Prognostic Studies
QUIPS Tool

TOOLS FOR SYSTEMATIC REVIEWS
II. The Six Domains of the QUIPS Tool
The QUIPS tool assesses the risk of bias across six fixed domains for prognostic studies:
1. Study Participation: Assesses the representativeness of the sample relative to the target population.
2. Study Attrition (Loss to Follow-up): Determines if the participants who provide follow-up data
adequately represent the original recruited sample.
3. Prognostic Factor Measurement: Evaluates whether the prognostic factor was measured reliably and
similarly across all participants.
4. Outcome Measurement: Assesses if the outcome was measured reliably and similarly for all
participants.
5. Study Confounding: Evaluates whether important potential confounding factors have been appropriately
accounted for in the study design or analysis.
6. Statistical Analysis and Reporting: Addresses the appropriateness of the statistical analysis and the
completeness of the study report. A crucial concern here is selective reporting, where researchers may only
report prognostic factors positively associated with the outcome.
Risk of Bias Analysis of Prognostic Studies
QUIPS Tool

TOOLS FOR SYSTEMATIC REVIEWS
III. Interpretation of QUIPS Results
For each of the six domains, reviewers provide a judgment of "High," "Moderate," or
"Low" risk of bias. These judgments rely on responses to "prompting items" (signaling
questions) and extracted methodological comments from the study.
Drop down menu to rate the adequacy of reporting as “yes,” “partial,” “no,” or “unsure”.
The final overall risk of bias judgment for a prognostic study is typically defined by
classifying studies where all or the most important domains are deemed "Low risk of
bias". The source material advises against using summed scores or fixed cut-off points
for determining the overall quality.
A known limitation of QUIPS is that many prompting items relate strongly to the
completeness of reporting rather than strictly to inherent methodological flaws, which
can complicate the accurate judgment of bias in poorly reported studies.
Risk of Bias Analysis of Prognostic Studies
QUIPS Tool

TOOLS FOR SYSTEMATIC REVIEWS
Risk of Bias Analysis of Prognostic Studies
QUIPS Tool

Let’s take a look at...
QUIPS tool
QUIPS

Risk of Bias
Interventional
RCT
Observational
RCT
Jadad
PEDro
Cohort
NOS
Methodological
Quality Assessment
GRADE
Prognosis
Diagnostic
NRSI
ROBINS-I RoB 2
QUADAS-2
QUIPS
Interventional
Case-control
NOS

TOOLS FOR SYSTEMATIC REVIEWS
Key Characteristics and Purpose of NOS
• Necessity: Observational studies (such as cohort and case-control) are required in
SRs when ethical constraints, complexity, or feasibility issues prevent the use of RCTs,
particularly for questions concerning association or risk factors.
• Acceptance: The NOS is the only tool accepted by the Cochrane Collaboration for
assessing the quality of non-randomized observational studies.
• Development: The tool resulted from a collaboration between the Universities of
Newcastle (Australia) and Ottawa (Canada). Its design focused on incorporating
methodological quality assessment into the interpretation of meta-analysis results.
• Star system: The NOS utilizes a star system, where studies are rated across the
criteria to achieve a semi-quantitative quality score, ranging from zero to nine stars.
Analysis of the methodological quality of observational studies (cohort and case-control)
Newcastle-Ottawa Scale (NOS) tool

Structure and Scoring
The NOS is structured around three main dimensions to evaluate the internal validity of observational studies:
TOOLS FOR SYSTEMATIC REVIEWS
Analysis of the methodological quality of observational studies (cohort and case-control)
Newcastle-Ottawa Scale (NOS) tool
1. Selection: Assesses the
selection and definition of
the exposed/non-
exposed groups (for
cohort) or cases/controls
(for case-control). This
domain has a maximum
of four stars.
2. Comparability:
Assesses the
comparability between
groups based on the
study design or statistical
analysis, particularly
concerning the control of
confounding factors. This
domain can receive a
maximum of two stars.
3. Outcome (Cohort) /
Exposure (Case-Control):
Assesses how the
outcome (in cohort
studies) or the exposure
(in case-control studies)
was determined and
reported. This domain can
receive a maximum of
three stars.

TOOLS FOR SYSTEMATIC REVIEWS
Focus on Study Designs
• Cohort Studies: NOS emphasizes the representativeness of the exposed cohort, favoring studies derived from the general
community over selected groups. A high score also depends on the selection of the non-exposed cohort, the accurate
determination of exposure, and the demonstration that the outcome was absent at the start of the study.
• Case-Control Studies: High-quality scores are favored for studies that use community-based controls (population controls)
over hospital-based controls. The scale also assigns higher scores when the ascertainment of exposure is done blindly or
through structured interviews, and when the non-response rates are comparable between cases and controls.
The star rating system is used to classify the overall quality of each study. Although the
developers did not endorse rigid cut-off points, the scores are often categorized in the literature
to define study quality as "Good," "Fair," or "Poor".
The NOS has also seen adaptations for assessing cross-sectional studies. However, these
modified versions for cross-sectional research were not proposed or validated by the developers
of the NOS tool.
Analysis of the methodological quality of observational studies (cohort and case-control)
Newcastle-Ottawa Scale (NOS) tool

TOOLS FOR SYSTEMATIC REVIEWS
Analysis of the methodological quality of observational studies (cohort and case-control)
Newcastle-Ottawa Scale (NOS) tool

Let’s take a look at...
NOS tool

Risk of Bias
Interventional
RCT
Observational
RCT
Jadad
PEDro
Cohort
JBI NOS
Methodological
Quality Assessment
GRADE
Prognosis
Diagnostic
NRSI
ROBINS-I RoB 2
QUADAS-2
QUIPS
Interventional
Cross-sectional
Prevalence
Case-control
JBI
JBI NOS

The JBI tools are designed to determine the extent to which a study has addressed the possibility of bias in its design, conduct, and
analysis. These tools are checklists, and the responses to their questions are typically "Yes," "No," "Unclear," or "Not applicable". We will
focus in the checklists for three types of observational studies used in SRs of association/risk factors:
TOOLS FOR SYSTEMATIC REVIEWS
Analysis of the Methodological Quality of Observational Studies (Cohort, Case-Control and Cross-Sectional)
Joanna Briggs Institute (JBI) Tools
1. Checklist for Cohort Studies (11 Questions)
This tool assesses the quality of cohort studies, focusing
on factors crucial for minimizing selection and
confounding bias over time:
a. Group Comparability: Checks if the exposed and
unexposed groups were similar and recruited from the
same population, except for the exposure status.
b. Exposure Measurement: Evaluates if exposure was
measured similarly, validly, and reliably for both groups.
c. Confounding Factors: Assesses if potential
confounding factors were identified and if strategies
(e.g., statistical adjustment or stratification) were used to
deal with them.
d. Outcome Status at Baseline: Confirms that
participants were free of the outcome at the start of the
study.
e.Outcome Measurement: Verifies if outcomes were
measured validly and reliably.
f. Follow-up Adequacy: Ensures the follow-up time was
reported and long enough for the outcome to occur.
g. Completeness of Follow-up: Assesses if follow-up was
complete (generally ≥80% retention) and whether reasons
for loss were described and explored.
h. Statistical Analysis: Checks if appropriate statistical
analysis was used, particularly how confounding and
unequal follow-up were handled.

The JBI tools are designed to determine the extent to which a study has addressed the possibility of bias in its design, conduct, and
analysis. These tools are checklists, and the responses to their questions are typically "Yes," "No," "Unclear," or "Not applicable". The
chapter specifically details the checklists for three types of observational studies used in SRs of association/risk factors:
TOOLS FOR SYSTEMATIC REVIEWS
Analysis of the Methodological Quality of Observational Studies (Cohort, Case-Control and Cross-Sectional)
Joanna Briggs Institute (JBI) Tools
2. Checklist for Case-Control Studies (10 Questions)
This tool is tailored for case-control studies, where biases like
selection and recall bias are prominent:
a. Group Comparability: Determines if cases and controls
were comparable, except for the outcome/disease status.
b. Matching and Source: Checks if cases and controls were
appropriately matched and drawn from the same source
population.
c. Case/Control Definition: Assesses if the same criteria were
used for case and control identification.
d. Exposure Measurement: Evaluates if exposure was
measured in a standard, valid, and reliable way, and that the
method was the same for cases and controls.
e. Confounding Factors: Verifies if confounding factors
were identified and if strategies (e.g., matching,
multivariable regression) were used to deal with them.
f. Outcome Assessment: Asks if outcomes were assessed
in a standard, valid, and reliable way for both groups.
g. Exposure Period: Considers if the exposure period of
interest was long enough to be meaningful in the context of
the outcome.
h. Statistical Analysis: Assesses if appropriate statistical
analysis was used.

The JBI tools are designed to determine the extent to which a study has addressed the possibility of bias in its design, conduct, and
analysis. These tools are checklists, and the responses to their questions are typically "Yes," "No," "Unclear," or "Not applicable". The
chapter specifically details the checklists for three types of observational studies used in SRs of association/risk factors:
TOOLS FOR SYSTEMATIC REVIEWS
Analysis of the Methodological Quality of Observational Studies (Cohort, Case-Control and Cross-Sectional)
Joanna Briggs Institute (JBI) Tools
3. Checklist for Analytical Cross-sectional Studies
(8 Questions)
This tool addresses cross-sectional studies, recognizing the
challenge of establishing temporality:
a. Inclusion Criteria: Asks if the criteria for inclusion
were clearly defined.
b. Subject/Setting Description: Checks if the subjects
and the study setting were described in adequate detail for
generalizability.
c. Exposure Measurement: Assesses if exposure was
measured in a valid and reliable way.
d. Objective Criteria for Condition: Determines if
objective, standard criteria were used for the
measurement of the condition (outcome).
e. Confounding Factors: Checks if confounding factors
were identified and if strategies were used to address
them.
f. Outcome Measurement: Assesses if outcomes were
measured in a valid and reliable way.
g. Statistical Analysis: Verifies if appropriate
statistical analysis was used, including how
confounding was addressed.

TOOLS FOR SYSTEMATIC REVIEWS
Analysis of the Methodological Quality of Observational Studies (Cohort, Case-Control and Cross-Sectional)
Joanna Briggs Institute (JBI) Tools
Focus on Prevalence: The tool is designed to evaluate studies that report prevalence data (point, period, or cumulative/lifetime
prevalence), regardless of whether the original study design was cross-sectional or cohort.
4. Checklist for Prevalence Studies (9 Questions)
1. Sample Frame Appropriateness: Assesses if the study's
sample frame is appropriate to address the target population
(demographics and characteristics relevant to the condition).
2. Appropriate Sampling Method: Evaluates if participants
were sampled appropriately.
3. Adequate Sample Size: Determines if the sample size was
large enough to guarantee the precision of the prevalence
estimate, considering the effect on the confidence interval.
4. Detailed Description of Subjects and Setting: Checks if the
study subjects and setting were described in enough detail for
comparison with the target population.
5. Sufficient Data Coverage: Assesses if data analysis was conducted
with sufficient coverage of the identified sample, specifically
examining issues like non-response bias.
6. Valid Methods for Condition Identification: Valid methods
should be based on validated diagnostic criteria rather than
unvalidated or self-reported scales.
7. Standardized and Reliable Measurement: Determines if the
condition was measured in a standardized and reliable way for all
participants.
8. Appropriate Statistical Analysis: Confirms that the statistical
analysis was appropriate for the research method.
9. Adequate Response Rate Management: Assesses if the response
rate was adequate, and if not, whether the authors appropriately
managed or discussed the low response rate and potential biases.

Interpretation and Reporting
The JBI tools result in a quality score, but the developers emphasize flexibility in
interpretation. Reviewers must explicitly agree on the scoring system and criteria (e.g.,
determining what constitutes "High," "Moderate," or "Low" quality) before analysis
begins. The final results should be presented narratively, often supplemented by tables
or figures like "traffic light" or "weighted bar" graphs, to clearly indicate the
methodological limitations of the included studies.
TOOLS FOR SYSTEMATIC REVIEWS
Analysis of the Methodological Quality of Observational Studies (Cohort, Case-Control and Cross-Sectional)
Joanna Briggs Institute (JBI) Tools

Tools for Systematic ReviewsTools for Systematic Reviews Tools for Systematic Reviews
Joanna Briggs Institute (JBI) Tool

Let’s take a look at...
JBI
critical appraisal tools

Risk of Bias
Interventional
RCT
Observational
RCT
Jadad
PEDro
Cohort
JBI NOS
Methodological
Quality Assessment
GRADE
Prognosis
Diagnostic
NRSI
ROBINS-I RoB 2
Overviews
ROBIS
QUADAS-2
QUIPS
Overviews
AMSTAR 2
Interventional
Cross-sectional
Prevalence
Case-control
JBI
JBI NOS

GRADE

GRADE
Grading of Recommendations, Assessment,
Development and Evaluation approach

GRADE
Grading of Recommendations, Assessment,
Development and Evaluation approach
• Purpose: GRADE is a framework established by the GRADing Working Group of Cochrane to assess the certainty
of evidence derived from SRs. It helps clarify how confident researchers can be that the estimated effect of an
intervention reflects the true effect in reality.
• Certainty Levels: GRADE assigns certainty levels to the evidence, ranging from High, Moderate, Low, to Very Low.
◦ High certainty suggests that new research is unlikely to change the confidence in the final estimate.
◦ Moderate certainty means new research might have a significant impact on confidence and could change
the final estimate.
◦ Low certainty indicates that new research is highly likely to impact confidence and probably change the
final estimate.
◦ Very Low certainty means there is no confidence that the final estimate is accurate.
• Clinical Importance: High or moderate certainty levels suggest that the results are reliable enough for direct
clinical application. Low or very low certainty indicates that caution must be used, as the observed result might
differ significantly from the real-world effect.
• Risk of Bias Link: The confidence in the estimate of effect decreases if the studies contributing to the evidence
have methodological biases. The risk of bias is one of the five core domains analyzed by GRADE to determine the
certainty of evidence.

GRADE
Grading of Recommendations, Assessment,
Development and Evaluation approach
The starting certainty level for evidence depends on the study design included in the SR:
• RCTs: These studies typically start with High certainty of evidence because the
randomization process helps prevent confounding factors.
• NRSI and Observational Studies: These studies typically start with Low certainty of evidence
because they are susceptible to confounding factors due to the lack of randomization.
• Exception for NRSI: If an SR including NRSI uses the ROBINS-I tool to assess the risk of bias,
and the study is judged to have a low risk of bias, it might start with High certainty of
evidence. However, achieving a low-risk judgment with ROBINS-I, particularly for confounding
bias, is considered very difficult.

GRADE
Grading of Recommendations, Assessment, Development and Evaluation approach
Domains for downgrading certainty
Regardless of the starting point, the
certainty of evidence can be re-evaluated
(downgraded) based on five domains common
to all study designs:
1. Risk of Bias.
2. Inconsistency.
3. Indirect evidence.
4. Imprecision.
5. Publication Bias.
Domains for upgrading certainty
For NRSI and observational studies (which start
at a lower certainty level), the certainty can be
upgraded based on three additional domains:
1. Large magnitude of effect.
2. Dose-response gradient.
3. Residual confounding or bias (effect of
confounding and bias across studies that would
increase confidence).

GRADE
Grading of Recommendations, Assessment, Development and Evaluation approach
To judge the risk of bias using the GRADE approach, a detailed assessment of risk of
bias in the primary studies must first be performed.
The severity of problems related to the risk of bias dictates how many levels the
certainty of evidence should be downgraded (re-evaluated):
• No serious problems detected: Do not downgrade certainty (Decision 1).
• Serious problems detected: Downgrade certainty by one level (Decision 2).
• Very serious problems detected: Downgrade certainty by two levels (Decision 3)

GRADE
Grading of Recommendations, Assessment, Development and Evaluation approach
Sensitivity analysis is the most common strategy to determine the impact of bias on
the final results.
This involves removing studies judged to have a high risk of bias from the meta-analysis
and checking if the summarized effect changes significantly.
• If the overall estimate remains similar after removing high-risk bias studies, the risk of bias did
not impact the final estimate, and the certainty is not downgraded (Decision 1).
• If removing high-risk bias studies changes the overall estimate, this indicates a serious
problem with risk of bias, warranting a downgrade of at least one level (Decision 2).
• If removing the high-risk bias studies causes the estimate to switch dramatically (e.g., favoring
the opposite intervention), the problem is considered very serious, potentially leading to a two-
level downgrade (Decision 3).

GRADE
Grading of Recommendations, Assessment, Development and Evaluation approach
Sensitivity analysis

GRADE
Grading of Recommendations, Assessment, Development and Evaluation approach
Sensitivity analysis

GRADE
Grading of Recommendations, Assessment, Development and Evaluation approach

GRADE
Grading of Recommendations, Assessment, Development and Evaluation approach
For NRSI or observational studies where all studies are expected to have some degree of
bias, sensitivity analysis by excluding all studies may not be meaningful.
In these cases, reviewers must judge the severity of the risk of bias problems to
determine the appropriate downgrade level.

GRADE
The GRADE approach provides a structured way to determine how
much confidence can be placed in the evidence generated by an SR.
Incorporating the results of risk of bias analysis, preferably via
GRADE, ensures transparency and rigor in reporting findings, which
is essential for informed clinical decision-making.

Concept Level of Assessment Focus Typical Tools Outcome
Risk of Bias Individual study Internal validity
RoB 2.0, ROBINS-I,
QUADAS-2. QUIPS,
or JBI
Bias judgment
Quality
Assessment
Individual study
Methodological
rigor
NOS, Jadad, PEDro,
or JBI
Quality score
GRADE
Body of evidence
(outcome)
Certainty across
studies
GRADE Certainty rating
Key takeaway:
Risk of bias ≠ Quality score
GRADE ≠ Methodological assessment tool
They work sequentially:
Study quality → Risk of bias → Certainty of evidence (GRADE)

THANK YOU! [email protected]