Item analysis in a testing and evaluation.ppt

IrpanBatubara2 11 views 44 slides Oct 15, 2024
Slide 1
Slide 1 of 44
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44

About This Presentation

item analysis


Slide Content

1
Item Analysis - Outline
1. Types of test items
A. Selected response items
B. Constructed response items
2. Parts of test items
3. Guidelines for writing test items

2
Item Analysis - Outline
4. Item Analysis
A. Distracter measures
B. Item difficulty measures
C. Item discrimination measures
5. Item Response Theory
A. ICCS
B. Adaptive testing

3
1. Types of test items
A.Selected response
Multiple choice
Likert scale
Category
Q-sort
B.Constructed response

4
A. Selected response
•Multiple choice or
forced choice
•Task is to choose
between set answers
•Advantage: ease of
scoring
•Advantage: scoring
requires little skill
•Disadvantage: may
test memory rather
than comprehension

5
A. Selected response
•Multiple choice or
forced choice
•Correct response
must be distinct
•Distracters should not
be obvious or
ambiguous
•If distracters are bad,
more = less reliable
test
•Use 3-4 distracters
per item

6
A. Selected response
• Multiple choice or
forced choice
• Likert format
•Test-taker chooses a
point on a scale that
expresses their
attitude or belief
•Data lend themselves
to factor analysis

7
Likert scale example item
Parking costs at the university are fair
1 2 3 4 5
Stronglyagree neutral disagreestrongly
agree
disagree

8
A. Selected response
• Multiple choice or
forced choice
• Likert format
• Category
•Similar to Likert but
with more choices
•Test-taker’s
commitment
•Reliability depends on
good instructions & #
of categories (≤ 10)
•Scoring shows
context effects

9
A. Selected response
• Multiple choice or
forced choice
• Likert format
• Category
• Q-sort
•A large set of cards
each with statement
referring to a “target”
•Test-take sorts cards
into piles in terms of
how accurate
statements are as a
description of target
•Generally 9 piles

10
1. Types of test items
A.Selected response
B.Constructed response
Free response
Fill-in-the-blank
Essay tests
Portfolios
In-basket technique

11
B. Constructed response items
• Free response •Test-taker responds
without constraint
•Describes what is
important to him/her

12
B. Constructed response items
• Free response
• Fill-in-the-blank
•Used to test for
knowledge or to find
out about beliefs and
attitudes

13
B. Constructed response items
• Free response
• Fill-in-the-blank
• Essay tests
•Preferred when you
want to assess test-
taker’s ability to think
analytically, integrate
ideas, and express
himself

14
B. Constructed response items
• Free response
• Fill-in-the-blank
• Essay tests
• Portfolios
•Not really a test
•Collections of things
the person being
evaluated has
produced
•Let you evaluate
things you can’t
assess with a selected
response test

15
B. Constructed response items
• Free response
• Fill-in-the-blank
• Essay tests
• Portfolios
• In-basket technique
•Used in business
•Job candidate gets a
set of “everyday”
problems, says how
he or she would deal
with those problems
•Requires expert
raters to grade
response

16
B. Constructed response items
• Strengths •Assess higher-order
skills
•More useful feedback
to test-taker
•Positive influence on
study habits?
•Easier to create items

17
B. Constructed response items
• Weaknesses •Time consuming to
use
•Possible subjectivity
in scoring

18
2. Parts of test items
A.Stimulus or item stem
B.Response format or method
C.Conditions governing the response
D.Procedures for scoring the response

19
2. Parts of test items
A.Stimulus or item
stem
•What the subject
responds to

20
2. Parts of test items
B.Response format or
method
•Typically multiple
choice or constructed
response

21
2. Parts of test items
C.Conditions
governing the
response
•e.g., time limits;
allowing probes for
ambiguous
responses; how
response is
recorded...

22
2. Parts of test items
D.Procedures for
scoring the response
•particularly important
for constructed
response items

23
2. Parts of test items
•To some extent, your
choices on each of
these parts will be
dictated by:
•Precedent
What did you do last
time?
•Experience
Did that work?
•Practical considerations
How many people have
to be tested?
How much time is
available?

24
3. Writing test items – guidelines
A.Define clearly
B.Generate a pool of potential items
C.Monitor reading level
D.Use unitary items
E.Avoid long items
F.Break any response “set”

25
3. Writing test items – guidelines
A.Define clearly •Why are you testing?
•What do you want to
know?

26
3. Writing test items – guidelines
A.Define clearly
B.Generate a pool of
potential items
•The larger the pool of
items you select from,
the better the test
•Selection from this
pool based on item-
analysis (see below)

27
3. Writing test items – guidelines
A.Define clearly
B.Generate a pool of
potential items
C.Monitor reading level
•level too low?
more sophisticated
test-takers may get
bored
•level too high?
you’re testing reading
skill as well as domain
you think you’re testing

28
3. Writing test items – guidelines
A.Define clearly
B.Generate a pool of
potential items
C.Monitor reading level
D.Use unitary items
•Then the meaning of
the response is clear

29
3. Writing test items – guidelines
A.Define clearly
B.Generate a pool of
potential items
C.Monitor reading level
D.Use unitary items
E.Avoid long items
•Longer items are
more likely to be mis-
interpreted by test-
takers
•Short items are more
likely to be unitary

30
3. Writing test items - guidelines
A.Define clearly
B.Generate a pool of
potential items
C.Monitor reading level
D.Use unitary items
E.Avoid long items
F.Break any response
“set”
•Use reverse-scored
items to prevent test-
taker’s from getting
into a response set
such as just
responding “5” for
every item on a Likert
scale

31
4. Item analysis
A.Multiple choice distracter analysis
B.Item difficulty measure P
C.Discrimination index D
D.Item – total correlation

32
A. Multiple choice – distracter measures
•How many people
choose each
distracter?
•Distracters should be
equally attractive
•Correct choice should
be based on
knowledge
•Where knowledge is
lacking, choice should
be random

33
B. Item Difficulty Measure P
•Difficulty determined
by item and
population tested
P(i) = # got item correct
# taking test

34
B. Item Difficulty Measure P
•P = .50 is best •P = 0 or P = 1 – such
items do not
distinguish ability
levels

35
C. Item Discrimination Measures
• Discrimination index D
• Item-total correlation

36
Discrimination Index D
•Extreme groups
method
U = # getting item
correct in ‘top’ group
 L = # getting item
correct in ‘bottom’
group
n
U
= # in top group
n
L = # in bottom group
D = U – L
n
U n
L

37
Item Total Correlation
•Good item
High correlation
People who get item
correct have high
score on the test
People who get item
wrong have low score
on the test
•Poor item
Low correlation: look
at wording – may be
testing reading skill

38
5. Item Response theory
A.Item characteristic curves
B.Adaptive testing using computers

39
A. Item characteristic curves
•Most important idea:
Item Characteristic
Curves (ICCs)
•One curve for each
test item
•X axis: test-taker
ability (given by test
score)
•Y axis: probability of
choosing an answer

Test Score
Probability of
correct
response
Item 1
Item 2
Item 3

41
A. Item Characteristic Curves
•Slope: how quickly
the curve rises.
•indicates how well
item discriminates
among persons of
differing abilities
•like P(i) in Classical
Test Theory
•but sample-invariant

42
Problems with Item Response Theory
•Obtaining stable
estimates of IRT
parameters requires
rather large samples
•Computationally
complex
•IRT model assumes
that the trait being
measured is one-
dimensional. It may
not be.

43
B. Adaptive Testing Using Computers
•computer selects
harder or easier
questions as test-
taker gets each
question right or
wrong
•lets you tailor
questions for each
test-taker
•test-taker does not
spend most of their
time with questions
that are too easy or
too difficult

44
B. Adaptive Testing Using Computers
•Facilitates testing of
diverse ability groups
•Output = level of
difficulty test-taker
can deal with
Tags