Welcome to Test Item Writing Training for Item writers and Reviewer
Course Syllabus: Test Item Writing Training course Description This Test Item Writing training is designed to equip item writers with all the necessary knowledge and skills required to carry out effective item development activities. The course emphasizes mainly the principles and guidelines of effective item writing to enables you effectively write well-structured items to assess the application of basic and clinical science knowledge This training module is organized in the following 4 session Session 1 : Introduction to assessment Session 2 : Written cognitive knowledge test (MCQ) and how to construct \ Session3 : MCQ flaws and how to avoid them Session4 : Writing well-structured MCQ using patient vignette to assess the application of knowledge Objective of the training After engaging in this training, you will be able to: Recognize basic principles of assessment-related item writing Recognize the structure of MCQ items and their principles for constructing each component Identify flaws that question writers inadvertently introduce into test questions Write a well-structured MCQ for basic and clinical sciences to assess the application of knowledge. Pre-requisites This Item writing training course require a successful completion of basic Effective Teaching skill(ETS) or Higher Diploma Program Teaching-learning methods and resources It is Self-paced e-learning. E‑contents which is complemented by some supplemental resources are offered to you. You are free to learn at your own pace based on your needs and interests. Supplemental resources for further reading and understanding is offered to you at the end of each session. Learning assessment methods Formative assessment: During each session, you will be given practice questions and exercises to assess your learning progress and check your achievement of objectives. Summative evaluation: The quality of the work you've submitted will be assessed using a set of criteria. As a result, as part of the summative assessment, you must complete and submit all of the assigned activities at the end of the course. N.B. In order to receive successful certification and earn the assigned ———— CEU for the module/course, you must complete all tasks according to the course requirements.
Test Item Writing Session 1: Introduction to Assessment This session will help you grasp the fundamental concept of assessment by focusing on assessment principles and threats to effective assessment. This session will last approximately 20 to 25 minutes. You must read and analyze all of the important content provided in the form of text and graphics, as well as practice the exercises.
Test Item Writing Session Objectives At the end of this session, you will be able to: Recognize basic principles of assessment related with item writing Recognize threats of assessment and thier remedies
Test Item Writing Introductory activity Before moving to thier concepts , please write the difference that you think between ASSESSMENT , MEASURMENT , EVALUATION and TEST in the box provided ( feel free to write what ever you think) You can compare your response with few points given by clicking the feedback buttom .
1.1. Assessment : Concept Assessment Is systematic process of collecting and evaluating information/evidence (direct or indirect) from tests and other sources and making judgments about the progress of learning, achievement of objectives, achievement of competency after learning program, etc... Keywords : Collecting evidence, evaluating evidence and making a judgment What is evidence?
1.1. Assessment : Concept… Evidence is information gathered in assessment process which provides proof of competence Proof that supports the candidate’s claim of competency. Direct evidence Direct evidence is evidence that can be observed or witnessed by the teacher/ assosser . This may include: Direct observation of Performance of a task, or range of tasks, either in the workplace or in a simulated work environment, witnessed directly by an assessor Direct observation of a student's performance during oral examination/questioning Direct observation student's performance while taking test exam In direct evidence Indirect evidence is evidence of the trainee’s work that can be reviewed or examined by the teacher(not directly observed while the students are doing but reviewed) . This may include: Review of reports, procedures, assignments etc. completed by the student (example : log book, portofoio ...) Review of photographs, videos, etc. showing performance of a task when the assessor cannot be present Review of technical qualities of finished product made by the trainees Review of project accomplished Supplementary evidences Third party reports: Documented and verified reports from supervisor, colleague, subject expert, trainer or others
Test Item Writing Assessment Assessment is used as a broad category that includes all of the various methods used to determine the extent to which the students are achieving the intended learning outcomes of instructions. This includes both testing ( knowledge test ) and performance assessment . The knowledge test tells how well the student knows what to do and performance assessment tells how skillfully the student can do
1.2. Effective assessment Write what effective assessment requires? Effective assessment requires: A clear conception of intended learning outcome A variety of assessment methods that are relevant to the instructions Adequate sample of task Procedure that are fairness to everyone Performance criteria (what the students are achieving and how well?), Timely and detailed feedback to the student Grading and reporting system that is in harmony with the assessment program
1.3. Purposes of assessment Different assessments have different purposes and benefits. Some of the benefits/purposes of an assessment include: To better prepare graduates for their carrier To Ensure safe and effective practice To Help students focus on critical components of the training To Encourage students to improve their skills To Tailor students to their future jobs To Create a sense of healthy competition among institutions interns of preparing students before an exam Select what other purpose of an assessment ( Multiple responses can apply) Which of the following that you think is the purpose of assessment ? To assess progress of learning To check achievement of objectives To determining competency after learning program To provides implicit feedback on where the student stands To improve of teaching methods / instruction , curricula and quality of learning To Identify placement, Advancement, Certify achievements etc …
Purpose of National Licensing Examination What do you think about the purpose of asssement of National Licensing Examination? Which could be the main purpose of NLE ( Multiple response can apply) The purpose of the National Licensing assessment is ? To assess progress of learning To check achievement of learning objectives To determining competency after learning program To provides implicit feedback on where the student stands To improvement of teaching methods / instruction , curricula and quality of learning
1.4. Principle of Assessment :Validity Among the several principles of assessment , validity and reliability are key principles of assessment particularly related with item development 1. Validity It is the appropriateness, meaningfulness, and usefulness of specific inferences made from test scores To what extent inferences we made from test scores is appropriate, meaningful, and useful? So, How do we know our test item is valid? What are the evidence to infer our test score is appropriate, meaningful, and useful ? Write some of evidence of validity related to items development Some of the EVIDENCE OF VALIDITY related to items Large number of questions Question contents are aligned to the objective Question contents comprise a good sample of the discipline’s objectives Questions are not too difficult or too easy Questions are well written and clear
Threats of Validity Any factors that interfere with the meaningful interpretation of assessment data are a threat to validity ! Can guess what factor that will affect test validity? ( threats of test Validity) Too few questions /cases Inadequate sample Ambiguous question Inappropriate difficulty Too easy Poorly crafted items/cases (incomplete statement, flaws, hetrogeneous options etc... Ill trained examiners Trivial questions
Validity Threats vs. Remedies
Test Reliability Test Reliability is: How consistent or reproducible are assessment results, scores, over time or occasions? How consistent and dependable our ratings of performance are? To what extent will the result be free from errors?...... Key reliability Questions Are you rating the same things similarly each time? similar to someone else rating the same behaviors? Remember! Assessment is unreliable if a candidate is likely to get a significantly different result If different examiners were used, or If the assessment were to be conducted at a different time, or in a different context using the same assessment task RELIABILITY: NECESSARY, BUT NOT SUFFICIENT, FOR VALID INFERENCES
Reliability Threats vs. Remedies
Session Summary Summary Test Validity is the appropriateness, meaningfulness, and usefulness of specific inferences made from test scores Any factors that interfere with the meaningful interpretation of assessment data are a threat to validity Test reliability is How consistent or reproducible are assessment results, scores, over time or occasions RELIABILITY: NECESSARY, BUT NOT SUFFICIENT, FOR VALID INFERENCES Remember Poorly crafted test items are one of the main threats of test validity Next up, we will see how to craft a well written cognitive test item ( MCQ) to ensure test validity
Session 2: Written cognitive knowledge test Session objectives At the end of this session. you will be able to: Describe desirable characteristics of written cognitive knowledge test Recognize the structure of MCQ Recognize principles of constructing well-structured MCQ item
Written Cognitive knowledge test(MCQ) structure and how to construct Cognitive knowledge test includes both testing (knowledge test) and performance assessment. The knowledge test tells how well the student knows what to do and the performance assessment tells how skillfully the student can do.
Assessment of cognitive knowledge Assessment of cognitive knowledge encompasses all testing intended to measure examinee mental learning (i.e.) Acquisition of new information Integration of new information with prior knowledge Change and/or modified Initial mental model Assessment of cognitive knowledge The most common and, arguably, the most important types of evaluation are used in medical education. Best assessed using written test formats . Assessment of cognitive domain has to go high order learning which is an assessment of the application of knowledge, synthesis, and problem-solving ability. Properly constructed written assessment can address all levels of the cognitive domain.
2.2. Desirable characterstics of written cognitive knowlodge test
Practice exercise: desirable characteristics Match the desired characteristics of written cognitive knowledge test with their correct meanings The correct keyed response is indeed correct – objectivity Ensuring the legitimacy and accuracy of the test scores and their inferences -defensibility Ensuring the test comprises adequate representation of in each domain and large enough -valid evidence A second administration of the same examination to the same examinees (assuming that they have learned or forgotten nothing) should produce about the same test scores -reproducibility
Practice exercise: desirable characteristics Match the desired characteristics of written cognitive knowledge test with their correct meanings Able to defend the keyed correct answer , its test scores and its pass-fail decisions to the students and in court . defensibility Ensuring questions are not too difficult or too easy and are well written and clear -valid evidence Little or no subjectivity or judgment involved in deciding whether a particular examinee's answer or response is right or wrong – objectivity Rating the same things similarly each time or similar to someone else rating the same behaviors- reproducibility
2.3. Written cognitive assessments formats Selected -response Items are test items that require examinees to select Constructed-response items require examines to create response
2.4. Strengths and limitation of selected-response item formats
2.5. Structure of MCQ and Tips for constructing well-structured MCQ item The three components of MCQ 1. Stem 2. Lead-in 3. Options sets
Tips : MCQ at a whole
Practice exercise Critically review the following sample MCQ and decide whether the question as a whole fulfill basic rules of MCQ A 65-year-old man has difficulty rising from a seated position and straightening his trunk, but he has no difficulty flexing his leg. Which of the following muscles is most likely to have been injured? Gluteus maximum Gluteus minimus Hamstrings Iliopsoa Obturator internus Is the question as a whole Relevant for health workers clinical work and practice? Adress important content ? Test application of knowledge ? Align with the aims and objectives of the health professional training? program? Is trivial content or simple recall? Write your response in the space provided
Evaluate the following sample stem using the above criteria Tips : Constructing STEM
Tips : Constructing Lead-in and Option
Sample well-structured Single Best Answer(SBA) A 47-year-old man is brought to the emergency department 2 hours after the sudden onset of shortness of breath, severe chest pain, and sweating. He has no history of similar symptoms. He has hypertension treated with hydrochlorothiazide. He has smoked one pack of cigarettes daily for 30 years. His pulse is 110/min, respirations are 24/min, and blood pressure is 110/50 mm Hg. A grade 3/6, diastolic blowing murmur is heard over the left sternal border and radiates to the right sternal border. Femoral pulses are decreased bilaterally. An ECG shows left ventricular hypertrophy. Which of the following is the most likely diagnosis? Acute myocardial infarction Aortic dissection Oesophageal rupture Mitral valve prolapse Pulmonary embolism
Session 3: MCQ FLAWS AND HOW TO AVOID THEM This section will teach you how to identify and prevent common faults. These are problems that question writers mistakenly put into their test questions since they are focused on the topic and don't consider test-wise trainees who may benefit from these cues and are able to correctly answer questions despite knowing very little about the material.
Session objectives At the end of this sessions, you will be able to: Recognize the two major types of technical question flaws Identify flaws that question writers inadvertently introduce into test questions
3.1.Technical question flaws: What is it? You want your trainees to demonstrate their content knowledge rather than their test-taking ability! Flaws provide an advantage to less knowledgeable trainees who are familiar with test-taking strategies. We recommend that you familiarize yourself with these flaws so that your questions are well-structured and do not contain flaws
3.2. The two major types of technical question flaws 3.2. The two major types of technical
3.3. Flaws related to test wiseness
3.3.1. Grammatical cues An asymptomatic 57-year-old man comes to the physician for a routine health maintenance examination. He has smoked one pack of cigarettes daily for 37 years. His blood pressure is 180/112 mm Hg, and pulse is 82/min. Abdominal examination shows a bruit in the right upper quadrant and no masses. His hematocrit is 42%, serum urea nitrogen concentration is 23 mg/dL, and serum concentration is 1.4 mg/dL. The most likely cause of this patient’s bruit is an: accumulation of lipids in the arterial wall hypertrophy of the arterial wall media giant cell infiltration in the arterial wall round cell infiltration in the arterial wall Where is the error in the design of this question? In this sample question, only the correct answer (A) grammatically follows from the lead-in This type of flaw is easily avoided by using “closed lead-ins” so they are written as complete sentences in question format
3.3.2.Logical cues Logical cues occur when a subset of the options is collectively exhaustive indicating that one of these options must be the answer. Example Crime is equally distributed among the social classes overrepresented among the poor overrepresented among the middle class and rich primarily an indication of psychosexual maladjustment reaching a plateau of tolerability for the nation Where is the error in the design of this question? Here, options A, B, and C are mutually exclusive and exhaustive, covering all possibilities, so one of these is probably the answer. In the example above, D and E should be omitted since they are not logical to include, and the test-wise trainee will simply rule them out You really don’t need to have five options! Use the number of options that is right for the stem.
3.3.3. Absolute terms Absolute terms such as "always" or "never" should be avoided. Test-wise trainees understand that options stated in absolutes are unlikely to be the correct answer. Example Which of the following statements about the memory defect associated with advanced dementia, Alzheimer’s type, is true? It can be treated adequately with phophatidylcholine (lecithin) It could be a sequela of early parkinsonism It is never seen in patients with neurofibrillary tangles at autopsy It is never severe It possibly involves the cholinergic system Where is the error in the design of this question? Here, trainees will gain an advantage by eliminating C and D as possible answers. In addition, inclusion of the word “adequately” in option A makes that option nearly absolute as well – there must be at least a few patients for whom lecithin is inadequate
3.3.4. Long options By long options we mean options that actually have more words in them than other options! Almost always, long options are the correct answer. Example An otherwise healthy 28-year-old woman presented with a two-day history of cough, fever and shortness of breath, and the following chest radiograph. What is the most likely diagnosis? A. Tuberculosis B. Community-acquired streptococcal pneumonia C. Varicella pneumonia Where is the error in the design of this question? When authors write their questions they tend to make the correct answer longer, more specific, or more complete than other options, and test-wise trainees know that long options are typically the correct answer. When you write your questions, be sure to check that the option length and specificity is similar so there is no cueing.
3.3.5. Clang association This is a very common flaw introduced by question authors who know the content and don’t think about repetition of words as they write their options. Test-wise trainees who do not know the answer are cued when a word or phrase used in the stem is repeated in the correct answer Example A 67-year-old woman is brought to the emergency department because of severe chest pain four hours after undergoing outpatient endoscopy and dilatation of an oesophageal stricture caused by reflux. At discharge, she reported no chest pain. Three hours later, she vomited a small amount of blood and had severe pain. She appears pale. Her temperature is 38°C (100.4°F), blood pressure is 140/85 mm Hg, pulse is 125/min, and respirations are 22/min. Examination shows crepitus in the neck and moderate epigastric tenderness. The lungs are clear to auscultation, and breath sounds are equal bilaterally. Rectal examination shows no masses; test of the stool for occult blood is positive. Which of the following is the most likely cause of these symptoms? A) Oesophageal perforation B) Mallory-Weiss syndrome C) Myocardial infarction D) Perforated gastric ulcer Where is the error in the design of this question? Can you see the clang? Here, the word ‘ oesophageal ’ is used in the stem and the options. In addition, the use of “perforated” in option D provides another clue that this is part of the correct answer. Adding this up, A looks like the right answer! And that flaw is a “clang” association
3.3.6. Convergence strategy Convergence strategy occurs when the correct answer includes the most elements in common with the other options Example Testing of which of the following nerves will provide the most useful information to establish the diagnosis? A) Median motor, peroneal motor, facial, and sural B) Median motor, peroneal motor, tibial H reflex, and facial C) Median motor, ulnar motor, facial, and sural D) Peroneal motor, tibial motor, peroneal F wave, and facial Where is the error in the design of this question? Can you see the convergence? Here, the repetition of phrases in the options clearly suggests that A is the correct answer because it includes the most commonly appearing terms in the other options. So, if we look more closely, the terms appear like this
3.4. Flaws related to Irrelevant difficulty Flaws that add irrelevant difficulty. Added difficulty in a question, which adds nothing and is irrelevant to the content being tested
3.4.1. Overlapping of options Example Following a second episode of salpingitis, what is the likelihood that a woman is infertile? A)Less than 20% B)20% to 30% C) Greater than 50% D) 90% Where is the error in the design of this question? can you see overlapping options? There is an overlapping of options. Also notice that on this question, option C includes option D, which in all probability rules out D as the correct answer. Mixing formats makes questions unnecessarily confusing and adds irrelevancy
3.4.2. Use of Vague terms Example Which of the following statements concerning rhabdomyosarcosis is true? A) It typically arises in the striated muscle B) It is the most common malignant pelvic tumour in children C) Orbital tumours are frequently invasive D) Lymph node involvement usually occurs in tumours of the extremities E) Metastasis to the liver rarely occurs. Where is the error in the design of this question? Can you see the use of vague terms? The vague terms included (typically, common, frequently, usually, rarely) make this poorly structured question even more problematic. Although the question is technically an SBA, it is structured as a TF question and does not satisfy the cover-the-options rule; the options are heterogeneous. The question author and the trainees responding to this question are unlikely to have the same interpretation of the vague terms used in the options, adding irrelevant difficulty. There is some research evidence to support this assertion. These findings illustrate that there is no shared definition for these vague terms. So, they should never be used in MCQ questions
3.4.3. Use of "non of the above" Example A 39-year-old man is brought to the Emergency Department by his brother because he has become increasingly forgetful and confused over the past 24 hours. The brother relays that the patient has been drinking heavily and eating very little for the past month and has become tremulous and slightly nauseous. He wanders at night because he cannot sleep. On admission, intravenous administration of 5% dextrose in water is initiated. Two hours later, the patient has ophthalmoplaegia and is completely confused. Which of the following is the most appropriate next step in patient care? A. Administration of an anticoagulant B. Administration of diazepam C. Administration of large doses of vitamin B1, intravenously D. Administration of large doses of vitamin C, intravenously E. None of the above. Where is the error in the design of this question? Can you see the problem? Use of “None of the above” as an option adds irrelevant difficulty. If trainees choose this option as the answer, it isnot clear whether they don’t know the answer or were thinking of a better answer than the options provided on the list. For this question, we would suggest changing the lead-in to: “Which of the following is the most appropriate next step in pharmacotherapy ” and modifying option E to: “ No pharmacotherapy is indicated at this time.
3.5. Additional practice excercises#1 Identify the FLAWS(no need to be content experts) Question #1 The purpose of the carm bar is to reinforce the: A. Carm B. Dentile C. Holtz D. Menton Which flaw you think the question illustrates? This is an example of the clang association.” The word, “ carm ” is given in the stem and repeated in the correct answer
excercises#2 Question #2 Identify the FLAWS(no need to be content experts) Who co-authored the series of mystery books featuring Richard Masters as the lead investigator? A. Cronbach & Ebel B. Cronbach & Linn C. Cronbach & Meehl D. Meehl & Byron Which flaw you think the question illustrates ? Convergence flaw. “Cronbach” is used three times and “ Meehl ” two times. The option with both Cronbach and Meehl is the correct answer.
excercises#3 Question # 3 Identify the FLAWS(no need to be content experts) The menton is structurally reinforced by a: A. Amthalen B. Immont C. Octone D. Swanz Which flaw you think the question illustrates? The flaw in this example is grammatical: only the correct answer ( Swanz ) grammatically follows from the lead-in. It is easy to avoid this flaw by simply using only complete sentences that end in a question mark. F or example, “Which of the following reinforces the menton ?” Consistent use of closed lead-ins will also help with satisfying the “cover-the-option” rule and with writing homogeneous options
excercises#4 Identify the FLAWS(no need to be content experts) The zirbser exhaust system occasionally backfires because: A. Its operating temperature is sometimes low B. No fuel ever reaches the exhaust pipe C. The pressure on it is always extreme D. The strain on it is never reduced Which flaw you think the question illustrates? The use of vague and absolute terms is cluing the correct answer in this question. Trainees know to avoid absolutes and will quickly pick-up that A is the correct answer because it includes the vague term, “sometimes.”
3.6. Your checklist for question writing
Session 4: Writing well-structured MCQ using Patient Vignettes to assess application of Basic and Clinical Science Knowledge This session gives emphasis on the important guidelines for writing effective MCQs for the basic and clinical sciences to assess application of knowlede of learners. In addition the session provides various templates that you can use to structure your MCQs for a variety of different purposes
Session Objectives At the end of this session, you will be able to: To describe basic guidelines for writing MCQs for the basic and clinical sciences Write well-structured MCQs for basic and clinical sciences to assess the application of knowledge using templates
4.1.Purpose of Basic and Clinical science exam Basic and clinical science exams should: • Be congruent with and reinforce major curricular goals and objectives • Influence what faculty teach and what students learn • Promote both horizontal integration across basic science disciplines and units, and vertical integration with clinical medicine Regardless of the purpose or context of testing, you should strive to write questions that assess application of basic and clinical science knowledge, rather than recall of isolated facts. General guidelines for writing MCQs for the basic and clinical science Test application of knowledge rather than recall of facts using vignettes Focus questions on key concepts and principles that are essential information Focus on relevant tasks Test material relevant to learning in clinical clerkships, graduate medical education, and beyond Focus on clinical situations that either occur frequently or are critically important
4.2. Assessing application of basic science knowledge using Patient Vignettes For many (but not all) basic science topics, it is useful to structure MCQs so that the stem consists of a description of a clinical situation – we term such stems “patient vignettes.” Structure of a patient vignette The specific information included in the stem varies from vignette to vignette, but the f irst sentence should generally include: • Age • Gender • Site of care • Presenting complaint(s) • Duration. This is followed by : 1. History of the chief complaint 2. Associated symptoms 3.Family history (if relevant) 4. Physical findings 5. Results of diagnostic studies 6. and/or the patient’s response to initial treatment
4.2.1.Templates to assess application of basic science knowledge You will have noticed already that we advocate consistent design in MCQs. Accordingly, there are various templates that you can use to structure your MCQs for a variety of different purposes The next screen will provide some more templets to assess application of basic science knowledge along with an example of a vignette.
Templates for assessing application of basic science knowledge
4.3.Assessing application of clinical science knowledge More typically, patients first describe their symptoms and history, and physicians elicit, interpret, and integrate the information (sorting contributory from incidental) and make some type of clinical decision (test, treat, reassure) based on the clinical picture. you can read two approaches to testing clinical sciences( We recommend that writers use this type of approach (Question 2) to writing : think first of a presenting problem around which a clinical vignette can be constructed.
4.3.1.Writing clinical Vignettes: Vignette Worksheet and Templets T he vast majority of your MCQs in clinical sciences are best written in the form of a clinical vignette, as follows on the templet
4.3.2.Writing clinical Vignettes: Vignette Worksheet and Templets The example below shows how the vignette worksheet is used as the basis of the patient vignette. Look at the information assembled in the worksheet
Based on the information given in table, write the patient vignette in the space that provided. Stem Lead-in Option set Stem: A 47-year-old man is brought to the emergency department 2 hours after the sudden onset of shortness of breath, severe chest pain, and sweating. He has no history of similar symptoms. He has hypertension treated with hydrochlorothiazide. He has smoked one pack of cigarettes daily for 30 years. His pulse is 110/min, respirations are 24/min, and blood pressure is 110/50 mm Hg. A grade 3/6, diastolic blowing murmur is heard over the left sternal border and radiates to the right sternal border. Femoral pulses are decreased bilaterally. An ECG shows left ventricular hypertrophy. Lead -in : Which of the following is the most likely diagnosis? Option Set A. Acute myocardial infarction B. Aortic dissection C. Oesophageal rupture D. Mitral valve prolapse E. Pulmonary embolism
4.3.3.Some more templates for you
Diagnosis template
Diagnostic studies template
Mechanism template
Final course assessment You have successfully completed the course Remember ! Completing the e-course is necessary but not sufficient for certification to be an item writer You are required to complete the following tasks and submit your work for the resposible body Activity Write a well-structure MCQs for basic and clinical sciences to assess application of knowlodge using the templates provided One well-structured MCQ in each types of templets (i.e.) 5 MCQ using clinical viginetts ( one in each type of template) 6 MCQ using basic science template(one each type of template) Plagarism is totally unacceptable and will be checked using a sofware Write the question by your own using a variety of resources GOOD LUCK