•Item analysis has to be done, before a meaningful
and scientific inference about the test can be made
in terms of its validity, reliability, objectivity and
usability.
•Process which examines student’s response to
individual test items inorderto assess the quality
of the items and of the test as a whole.
•The tools include :
Item difficulty.
Item discrimination.
Item distractors.
THE PURPOSES OF ITEM
ANALYSIS
•Improve test items and identify unfair items.
•Reveal which questions were most difficult.
•If a particular distracter is the most often
chosen answer, the item must be examined.
•To identify common misconceptions among
students about a particular concept.
•To improve the quality of tests.
•If items are too hard, teachers can adjust the
way they teach.
Item Difficulty
•It is the percentage of students taking the test
who answered the item correctly.
•Higher the value, easier the item.
•D= R/N X 100
•R –Number of pupils who answered the item
correctly.
•N –Total number of pupils who tried them.
Example
•Number of pupils answered item correctly
= 40
Total number of pupils who tried them = 50
40/50 X 100 = 80 %
•Ability of an item to differentiate among the
students on the basis of how well they know the
material being tested.
•A good item discriminates between those who do
well on the test and those who do poorly.
•Higher the discrimination index,betterthe item.
DI = RU –RL/1/2 X N
•RU –Number of correct responses from the upper
group
•RL –number of correct responses from lower
group
•N –total number of pupils who tried them.
Example
•Total score –60
Total sample –50
Upper group –25
Lower group –25
22 –10/1/2 X 50 = 0.29
Interpretation
•0.4 or higher –very good items.
•0.3 to 0.39 -good items.
•0.20 to 0.29 –fairly good items.
•0.19 or less –poor items.
•So the item in the example, is a fairly good
item.
Distractors
•Analyzing the distractors(i.e., incorrect
alternatives) is useful in determining the
relative usefulness of the decoys in each item.
•The alternatives are probably totally
implausible and therefore of little use as
decoys in multiple choice items.
•One way to study responses to distractorsis with
a frequency table that tells you the proportion of
students who selected a given distractor.
•Remove or replace distractorsselected by a few
or no students because students find them to be
implausible.
RELIABILITY
•Reliabilityis the degree to which an
assessment tool produces stable and consistent
results.
•Obtained by administering the same test twice
over a period of time to a group of
individuals.
•Scores from Time 1 and Time 2 can then be
correlated to evaluate the test for stability.
•Also known as temporal stability.
Parallel forms reliability
•It is obtained by administering different
versions of an assessment tool to the same
group of individuals.
•Scores from the two versions can then be
correlated to evaluate the consistency of results
across alternate versions.
Inter-rater reliability
•Used to assess the degree to which different
judges or raters agree in their assessment
decisions.
•Useful because human observers will not
necessarily interpret answers the same way.
Internal consistency reliability
•It used to evaluate the degree to which
different test items that probe the same
construct produce similar results.
•Two types are
Average inter-item correlation
Split-half reliability
Average inter-item correlation
•Obtained by taking all of the items on a test
that probe the same construct , determining
the correlation coefficient for each pairof
items, and finally taking the average of all
of these correlation coefficients.
Split-half reliability
•“Splitting in half” all items of a test to form
two “sets” of items.
•The total score for each “set” is computed.
•Determining the correlation between the two
total “set” scores to obtain split half
reliability.
Form equivalence (Alternate form)
•Also known as alternate form reliability.
•Two different forms of test, based on the same
content, on one occasion to the same
examinees.
•Reliability is stated as correlation between
scores of Test 1 and Test 2.
VALIDITY
•An indication of how well an assessment
actually measures what it is supposed to
measure.
•Refers to the accuracy of an assessment.
•It is the veracity of an assessment instrument.
•Measure of the extent to which an examination
looks like an examination in the subject
concerned and at the appropriate level.
•Candidates, teachers and the public have
expectations as to what an examination looks
like and how it is conducted.
Construct Validity
•The extent to which an assessment
corresponds to other variables, as predicted by
some rationale or theory.
•It is also known as theoretical construct.
Content Validity
•The extent to which a measure adequately
represents all facets of a concept.
•It is the extent to which the content of the test
matches the instructional objectives
Criterion-Related validity
•Degree to which content on a test (predictor)
correlates with performance on relevant
criterion measures (concrete criterion in the
"real" world?)
Formative Validity
•When applied to outcomes assessment it is
used to assess how well a measure is able to
provide information to help improve the
program under study.
Sampling Validity
•It is similar to content validity.
•It ensures that the measure covers the broad
range of areas within the concept under study.
FACTORS THAT CAN LOWER
VALIDITY
•Unclear directions
•Difficult reading vocabulary and sentence
structure
•Ambiguity in statements
•Inadequate time limits
•Inappropriate level of difficulty
Cont’d
•Poorly constructed test items.
•Test items inappropriate for the outcomes
being measured.
•Tests that are too short.
•Administration and scoring.
Cont’d
•Improper arrangement of items (complex to
easy?).
•Identifiable patterns of answers.
•Teaching.
•Students .
•Nature of criterion.
WAYS TO IMPROVE
VALIDITY AND RELIABILITY
IMPROVING RELIABILITY
•First, calculate the item-test correlations and
rewrite or reject any that are too low.
•Second, look at the items that did correlate
well and write more like them. The longer the
test, the higher the reliability up to a point
IMPROVING VALIDITY
•Make sure your goals and objectives are
clearly defined and operationalized.
•Expectations of students should be written
down.
•Match your assessment measure to your goals
and objectives.
Cont’d
•Have the test reviewed by faculty at other
schools to obtain feedback from an outside
party who is less invested in the instrument.
•Get students involved; have the students look
over the assessment for troublesome work.
RELATIONSHIP BEWEEN
RELIABILITY AND VALIDITY
•The two do not necessarily go hand-in-hand.
•We can illustrate it as follows.
oReliable but not valid -an archer who always
hits about the same place but not near the
bullseye.
oValid but not reliable -archer who hits various
places centered around the bullseye, but not
very accurately.
oNeither reliable nor valid -an archer who hits
various places all off to the same side of the
bullseye.
Cont’d
oBoth reliable and valid -archer who hits
consistently close to the bullseye.
•A valid assessment is always reliable, but a
reliable assessment is not necessarily valid.
FACTORS IN RESOLVING CONFLICTS
BETWEEN VALIDITY AND RELIABILITY
•Validity is paramount.
•Validity will not damage educational
effectiveness but excessive concern for
reliability or costs may do so.
•Staff costs are limited by the credits in the
workload planning system being used.
Cont’d
•Student time costs are limited by the planned
learning hours allocated to them.
•Reliability cannot be 100% for any one
assessment and may need to be compromised.
•Between-marker reliability can be improved
by marker training and monitoring.
Cont’d
•Clear, detailed criteria will maximise examiner
reliability and validity.
•Educationally effective coursework
assessments are often simultaneously designed
to prevent plagiarism detection.
Cont’d
•Where each student produces a number of similar
assignments they can be randomly sampled.
•Self and peer assessment can reduce staff costs
and uses as a learning activity.
•High-reliability assessment is costly and so
should be used only where it is critical.
Cont’d
•Programme-wide design of assessment can
avoid the worst of the conflicts.
•Designing good assessments is a creative,
challenging task that demands expertise in the
teaching of the subject,timeand is improved
by peer support and review.