RELIABILITY AND VALIDITY OF ASSESSMENT.ppt

JayLagman3 347 views 88 slides Jul 03, 2024
Slide 1
Slide 1 of 88
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88

About This Presentation

RELIABILITY


Slide Content

RELIABILITY AND
VALIDITY OF
ASSESSMENT

ITEM ANALYSIS

•Item analysis has to be done, before a meaningful
and scientific inference about the test can be made
in terms of its validity, reliability, objectivity and
usability.
•Process which examines student’s response to
individual test items inorderto assess the quality
of the items and of the test as a whole.

•The tools include :
Item difficulty.
Item discrimination.
Item distractors.

THE PURPOSES OF ITEM
ANALYSIS

•Improve test items and identify unfair items.
•Reveal which questions were most difficult.
•If a particular distracter is the most often
chosen answer, the item must be examined.
•To identify common misconceptions among
students about a particular concept.

•To improve the quality of tests.
•If items are too hard, teachers can adjust the
way they teach.

Item Difficulty

•It is the percentage of students taking the test
who answered the item correctly.
•Higher the value, easier the item.

•D= R/N X 100
•R –Number of pupils who answered the item
correctly.
•N –Total number of pupils who tried them.

Example
•Number of pupils answered item correctly
= 40
Total number of pupils who tried them = 50
40/50 X 100 = 80 %

Ideal difficulty levels for multiple-
choice items
•Format IdealDifficulty
•Five-response multiple-choice 70
•Four-response multiple-choice 74
•Three-response multiple-choice 77
•True-false 85

Item Discrimination

•Ability of an item to differentiate among the
students on the basis of how well they know the
material being tested.
•A good item discriminates between those who do
well on the test and those who do poorly.
•Higher the discrimination index,betterthe item.

DI = RU –RL/1/2 X N
•RU –Number of correct responses from the upper
group
•RL –number of correct responses from lower
group
•N –total number of pupils who tried them.

Example
•Total score –60
Total sample –50
Upper group –25
Lower group –25
22 –10/1/2 X 50 = 0.29

Interpretation
•0.4 or higher –very good items.
•0.3 to 0.39 -good items.
•0.20 to 0.29 –fairly good items.
•0.19 or less –poor items.
•So the item in the example, is a fairly good
item.

Distractors

•Analyzing the distractors(i.e., incorrect
alternatives) is useful in determining the
relative usefulness of the decoys in each item.
•The alternatives are probably totally
implausible and therefore of little use as
decoys in multiple choice items.

•One way to study responses to distractorsis with
a frequency table that tells you the proportion of
students who selected a given distractor.
•Remove or replace distractorsselected by a few
or no students because students find them to be
implausible.

RELIABILITY

•Reliabilityis the degree to which an
assessment tool produces stable and consistent
results.

TYPES OF RELIABILITY

•Test –retest reliability
•Parallel reliability
•Inter –rater reliability
•Internal consistency
•Form equivalence (Alternate form)

Test-retest reliability

•Obtained by administering the same test twice
over a period of time to a group of
individuals.
•Scores from Time 1 and Time 2 can then be
correlated to evaluate the test for stability.
•Also known as temporal stability.

Parallel forms reliability

•It is obtained by administering different
versions of an assessment tool to the same
group of individuals.
•Scores from the two versions can then be
correlated to evaluate the consistency of results
across alternate versions.

Inter-rater reliability

•Used to assess the degree to which different
judges or raters agree in their assessment
decisions.
•Useful because human observers will not
necessarily interpret answers the same way.

Internal consistency reliability

•It used to evaluate the degree to which
different test items that probe the same
construct produce similar results.
•Two types are
Average inter-item correlation
Split-half reliability

Average inter-item correlation
•Obtained by taking all of the items on a test
that probe the same construct , determining
the correlation coefficient for each pairof
items, and finally taking the average of all
of these correlation coefficients.

Split-half reliability
•“Splitting in half” all items of a test to form
two “sets” of items.
•The total score for each “set” is computed.
•Determining the correlation between the two
total “set” scores to obtain split half
reliability.

Form equivalence (Alternate form)

•Also known as alternate form reliability.
•Two different forms of test, based on the same
content, on one occasion to the same
examinees.
•Reliability is stated as correlation between
scores of Test 1 and Test 2.

VALIDITY

•An indication of how well an assessment
actually measures what it is supposed to
measure.
•Refers to the accuracy of an assessment.
•It is the veracity of an assessment instrument.

TYPES OF VALIDITY

•Face validity
•Construct validity
•Content validity
•Criterion related validity
•Formative validity
•Sampling validity

Face Validity

•Measure of the extent to which an examination
looks like an examination in the subject
concerned and at the appropriate level.
•Candidates, teachers and the public have
expectations as to what an examination looks
like and how it is conducted.

Construct Validity

•The extent to which an assessment
corresponds to other variables, as predicted by
some rationale or theory.
•It is also known as theoretical construct.

Content Validity

•The extent to which a measure adequately
represents all facets of a concept.
•It is the extent to which the content of the test
matches the instructional objectives

Criterion-Related validity

•Degree to which content on a test (predictor)
correlates with performance on relevant
criterion measures (concrete criterion in the
"real" world?)

Formative Validity

•When applied to outcomes assessment it is
used to assess how well a measure is able to
provide information to help improve the
program under study.

Sampling Validity

•It is similar to content validity.
•It ensures that the measure covers the broad
range of areas within the concept under study.

FACTORS THAT CAN LOWER
VALIDITY

•Unclear directions
•Difficult reading vocabulary and sentence
structure
•Ambiguity in statements
•Inadequate time limits
•Inappropriate level of difficulty

Cont’d
•Poorly constructed test items.
•Test items inappropriate for the outcomes
being measured.
•Tests that are too short.
•Administration and scoring.

Cont’d
•Improper arrangement of items (complex to
easy?).
•Identifiable patterns of answers.
•Teaching.
•Students .
•Nature of criterion.

WAYS TO IMPROVE
VALIDITY AND RELIABILITY

IMPROVING RELIABILITY
•First, calculate the item-test correlations and
rewrite or reject any that are too low.
•Second, look at the items that did correlate
well and write more like them. The longer the
test, the higher the reliability up to a point

IMPROVING VALIDITY
•Make sure your goals and objectives are
clearly defined and operationalized.
•Expectations of students should be written
down.
•Match your assessment measure to your goals
and objectives.

Cont’d
•Have the test reviewed by faculty at other
schools to obtain feedback from an outside
party who is less invested in the instrument.
•Get students involved; have the students look
over the assessment for troublesome work.

RELATIONSHIP BEWEEN
RELIABILITY AND VALIDITY

•The two do not necessarily go hand-in-hand.
•We can illustrate it as follows.
oReliable but not valid -an archer who always
hits about the same place but not near the
bullseye.

oValid but not reliable -archer who hits various
places centered around the bullseye, but not
very accurately.
oNeither reliable nor valid -an archer who hits
various places all off to the same side of the
bullseye.

Cont’d
oBoth reliable and valid -archer who hits
consistently close to the bullseye.
•A valid assessment is always reliable, but a
reliable assessment is not necessarily valid.

FACTORS IN RESOLVING CONFLICTS
BETWEEN VALIDITY AND RELIABILITY

•Validity is paramount.
•Validity will not damage educational
effectiveness but excessive concern for
reliability or costs may do so.
•Staff costs are limited by the credits in the
workload planning system being used.

Cont’d
•Student time costs are limited by the planned
learning hours allocated to them.
•Reliability cannot be 100% for any one
assessment and may need to be compromised.
•Between-marker reliability can be improved
by marker training and monitoring.

Cont’d
•Clear, detailed criteria will maximise examiner
reliability and validity.
•Educationally effective coursework
assessments are often simultaneously designed
to prevent plagiarism detection.

Cont’d
•Where each student produces a number of similar
assignments they can be randomly sampled.
•Self and peer assessment can reduce staff costs
and uses as a learning activity.
•High-reliability assessment is costly and so
should be used only where it is critical.

Cont’d
•Programme-wide design of assessment can
avoid the worst of the conflicts.
•Designing good assessments is a creative,
challenging task that demands expertise in the
teaching of the subject,timeand is improved
by peer support and review.

QUESTIONS
Tags