1
Item Analysis - Outline
1. Types of test items
A. Selected response items
B. Constructed response items
2. Parts of test items
3. Guidelines for writing test items
2
Item Analysis - Outline
4. Item Analysis
A. Distracter measures
B. Item difficulty measures
C. Item discrimination measures
5. Item Response Theory
A. ICCS
B. Adaptive testing
3
1. Types of test items
A.Selected response
Multiple choice
Likert scale
Category
Q-sort
B.Constructed response
4
A. Selected response
•Multiple choice or
forced choice
•Task is to choose
between set answers
•Advantage: ease of
scoring
•Advantage: scoring
requires little skill
•Disadvantage: may
test memory rather
than comprehension
5
A. Selected response
•Multiple choice or
forced choice
•Correct response
must be distinct
•Distracters should not
be obvious or
ambiguous
•If distracters are bad,
more = less reliable
test
•Use 3-4 distracters
per item
6
A. Selected response
• Multiple choice or
forced choice
• Likert format
•Test-taker chooses a
point on a scale that
expresses their
attitude or belief
•Data lend themselves
to factor analysis
7
Likert scale example item
Parking costs at the university are fair
1 2 3 4 5
Stronglyagree neutral disagreestrongly
agree
disagree
8
A. Selected response
• Multiple choice or
forced choice
• Likert format
• Category
•Similar to Likert but
with more choices
•Test-taker’s
commitment
•Reliability depends on
good instructions & #
of categories (≤ 10)
•Scoring shows
context effects
9
A. Selected response
• Multiple choice or
forced choice
• Likert format
• Category
• Q-sort
•A large set of cards
each with statement
referring to a “target”
•Test-take sorts cards
into piles in terms of
how accurate
statements are as a
description of target
•Generally 9 piles
10
1. Types of test items
A.Selected response
B.Constructed response
Free response
Fill-in-the-blank
Essay tests
Portfolios
In-basket technique
11
B. Constructed response items
• Free response •Test-taker responds
without constraint
•Describes what is
important to him/her
12
B. Constructed response items
• Free response
• Fill-in-the-blank
•Used to test for
knowledge or to find
out about beliefs and
attitudes
13
B. Constructed response items
• Free response
• Fill-in-the-blank
• Essay tests
•Preferred when you
want to assess test-
taker’s ability to think
analytically, integrate
ideas, and express
himself
14
B. Constructed response items
• Free response
• Fill-in-the-blank
• Essay tests
• Portfolios
•Not really a test
•Collections of things
the person being
evaluated has
produced
•Let you evaluate
things you can’t
assess with a selected
response test
15
B. Constructed response items
• Free response
• Fill-in-the-blank
• Essay tests
• Portfolios
• In-basket technique
•Used in business
•Job candidate gets a
set of “everyday”
problems, says how
he or she would deal
with those problems
•Requires expert
raters to grade
response
16
B. Constructed response items
• Strengths •Assess higher-order
skills
•More useful feedback
to test-taker
•Positive influence on
study habits?
•Easier to create items
17
B. Constructed response items
• Weaknesses •Time consuming to
use
•Possible subjectivity
in scoring
18
2. Parts of test items
A.Stimulus or item stem
B.Response format or method
C.Conditions governing the response
D.Procedures for scoring the response
19
2. Parts of test items
A.Stimulus or item
stem
•What the subject
responds to
20
2. Parts of test items
B.Response format or
method
•Typically multiple
choice or constructed
response
21
2. Parts of test items
C.Conditions
governing the
response
•e.g., time limits;
allowing probes for
ambiguous
responses; how
response is
recorded...
22
2. Parts of test items
D.Procedures for
scoring the response
•particularly important
for constructed
response items
23
2. Parts of test items
•To some extent, your
choices on each of
these parts will be
dictated by:
•Precedent
What did you do last
time?
•Experience
Did that work?
•Practical considerations
How many people have
to be tested?
How much time is
available?
24
3. Writing test items – guidelines
A.Define clearly
B.Generate a pool of potential items
C.Monitor reading level
D.Use unitary items
E.Avoid long items
F.Break any response “set”
25
3. Writing test items – guidelines
A.Define clearly •Why are you testing?
•What do you want to
know?
26
3. Writing test items – guidelines
A.Define clearly
B.Generate a pool of
potential items
•The larger the pool of
items you select from,
the better the test
•Selection from this
pool based on item-
analysis (see below)
27
3. Writing test items – guidelines
A.Define clearly
B.Generate a pool of
potential items
C.Monitor reading level
•level too low?
more sophisticated
test-takers may get
bored
•level too high?
you’re testing reading
skill as well as domain
you think you’re testing
28
3. Writing test items – guidelines
A.Define clearly
B.Generate a pool of
potential items
C.Monitor reading level
D.Use unitary items
•Then the meaning of
the response is clear
29
3. Writing test items – guidelines
A.Define clearly
B.Generate a pool of
potential items
C.Monitor reading level
D.Use unitary items
E.Avoid long items
•Longer items are
more likely to be mis-
interpreted by test-
takers
•Short items are more
likely to be unitary
30
3. Writing test items - guidelines
A.Define clearly
B.Generate a pool of
potential items
C.Monitor reading level
D.Use unitary items
E.Avoid long items
F.Break any response
“set”
•Use reverse-scored
items to prevent test-
taker’s from getting
into a response set
such as just
responding “5” for
every item on a Likert
scale
31
4. Item analysis
A.Multiple choice distracter analysis
B.Item difficulty measure P
C.Discrimination index D
D.Item – total correlation
32
A. Multiple choice – distracter measures
•How many people
choose each
distracter?
•Distracters should be
equally attractive
•Correct choice should
be based on
knowledge
•Where knowledge is
lacking, choice should
be random
33
B. Item Difficulty Measure P
•Difficulty determined
by item and
population tested
P(i) = # got item correct
# taking test
34
B. Item Difficulty Measure P
•P = .50 is best •P = 0 or P = 1 – such
items do not
distinguish ability
levels
35
C. Item Discrimination Measures
• Discrimination index D
• Item-total correlation
36
Discrimination Index D
•Extreme groups
method
U = # getting item
correct in ‘top’ group
L = # getting item
correct in ‘bottom’
group
n
U
= # in top group
n
L = # in bottom group
D = U – L
n
U n
L
37
Item Total Correlation
•Good item
High correlation
People who get item
correct have high
score on the test
People who get item
wrong have low score
on the test
•Poor item
Low correlation: look
at wording – may be
testing reading skill
38
5. Item Response theory
A.Item characteristic curves
B.Adaptive testing using computers
39
A. Item characteristic curves
•Most important idea:
Item Characteristic
Curves (ICCs)
•One curve for each
test item
•X axis: test-taker
ability (given by test
score)
•Y axis: probability of
choosing an answer
Test Score
Probability of
correct
response
Item 1
Item 2
Item 3
41
A. Item Characteristic Curves
•Slope: how quickly
the curve rises.
•indicates how well
item discriminates
among persons of
differing abilities
•like P(i) in Classical
Test Theory
•but sample-invariant
42
Problems with Item Response Theory
•Obtaining stable
estimates of IRT
parameters requires
rather large samples
•Computationally
complex
•IRT model assumes
that the trait being
measured is one-
dimensional. It may
not be.
43
B. Adaptive Testing Using Computers
•computer selects
harder or easier
questions as test-
taker gets each
question right or
wrong
•lets you tailor
questions for each
test-taker
•test-taker does not
spend most of their
time with questions
that are too easy or
too difficult
44
B. Adaptive Testing Using Computers
•Facilitates testing of
diverse ability groups
•Output = level of
difficulty test-taker
can deal with