Presentation by: Jerson D. Jocutan, MASE Test-Retest Reliability EDF 204: Advanced Statistics
Reliability A value that expresses the degree to which a test consistency produces the same result . EDF 204: Advanced Statistics: Test-retest Reliability
R eliability and Validity are related concepts. If an instrument is unreliable, it cannot yet valid outcome, as reliability improves validity may improve (or may not). However, if an instrument is shown scientifically to be valid then it is almost certain that it is also reliable. EDF 204: Advanced Statistics: Test-retest Reliability
Types of Reliability Measures EDF 204: Advanced Statistics: Test-retest Reliability
Measure of Internal Consistency Split Half PROCEDURE: Give the test once. Score Equivalent half of the test. (e.g. odd and even numbered items) STATISTICAL MEASURE: Pearson r and Spearman-Brown Formula EDF 204: Advanced Statistics: Test-retest Reliability
Kuder-Richardson PROCEDURE: Give the test once, then correlate the proportion/percentage of the students passing and not passing a given item. STATISTICAL MEASURE: Kuder-Richardson Formula 20 and 21 EDF 204: Advanced Statistics: Test-retest Reliability
Cronbach Coefficient Alpha PROCEDURE: Give the test once, then estimate the reliability by using the standard deviation per item and standard deviation of the test scores. STATISTICAL MEASURE: Kuder-Richardson Formula 20 EDF 204: Advanced Statistics: Test-retest Reliability
Test-Retest with Equivalent Forms PROCEDURE: Give parallel form of test, with increased time interval between forms. STATISTICAL MEASURE: Pearson r Measure of Stability and Equivalence EDF 204: Advanced Statistics: Test-retest Reliability
Equivalent Forms PROCEDURE: Give parallel form of test at the same time between forms STATISTICAL MEASURE: Pearson r Measure of Equivalence
Measure of Stability PROCEDURE: Give a test twice to the same group with any time interval between sets, from several minutes to several years STATISTICAL MEASURE: Pearson r Test-Retest EDF 204: Advanced Statistics: Test-retest Reliability
Pearson Product Moment Correlation FORMULA: Whereas; N = number of respondents/examinee X = score in the test (Test 1) Y = score in the retest (Test 2) EDF 204: Advanced Statistics: Test-retest Reliability
Test-Retest: Measure of Stability using Pearson r Example: This test is for the reliability of teacher made test using the statistical measure Pearson R . STUDENTS N X Y XY 1 50 51 2 43 42 3 48 48 4 45 44 5 40 41 6 47 47 7 52 51 8 39 38 9 44 43 10 43 42 11 41 41 12 46 45 13 39 39 14 51 50 15 49 48 SUMMATION ( X Y XY STUDENTS N X Y XY 1 50 51 2 43 42 3 48 48 4 45 44 5 40 41 6 47 47 7 52 51 8 39 38 9 44 43 10 43 42 11 41 41 12 46 45 13 39 39 14 51 50 15 49 48 X Y XY EDF 204: Advanced Statistics: Test-retest Reliability
STUDENTS N X Y XY 1 50 51 2 43 42 3 48 48 4 45 44 5 40 41 6 47 47 7 52 51 8 39 38 9 44 43 10 43 42 11 41 41 12 46 45 13 39 39 14 51 50 15 49 48 SUMMATION ( X Y XY STUDENTS N X Y XY 1 50 51 2 43 42 3 48 48 4 45 44 5 40 41 6 47 47 7 52 51 8 39 38 9 44 43 10 43 42 11 41 41 12 46 45 13 39 39 14 51 50 15 49 48 X Y XY FORMULA: Whereas; N = number of respondents/examinee X = score in the test (Test 1) Y = score in the retest (Test 2) EDF 204: Advanced Statistics: Test-retest Reliability
Solution: r = 0.99 EDF 204: Advanced Statistics: Test-retest Reliability
INTERPRETATION Note: To pass a reliability test for a teacher-made test result should be 0.85 and above. VALUE DESCRIPTIVE EQUIVALENCE 0.00 = zero correlation 0.01 – 0.20 = negligible correlation 0.21 – 0.40 = low correlation 0.41 – 0.70 = moderate correlation 0.71 – 0.90 = high correlation 0.91 – 0.99 = very high correlation EDF 204: Advanced Statistics: Test-retest Reliability
RESULTS r = 0.99 INTERPRETATION: The r value is 0.99 denotes a very high relationship. This implies that the students who got a very high scores in the first administration of the test, got a very high score in the second administration of the test. Likewise, those who got low scores in the first administration of the test got low scores in the second administration of the test. Hence, the test is highly reliable EDF 204: Advanced Statistics: Test-retest Reliability
Important Notes A typical assessment would involve giving participants the same test on two separate occasions. If the same or similar results are obtained, then external reliability is established. EDF 204: Advanced Statistics: Test-retest Reliability
Important Notes The timing of the test is important; if the duration is to brief then participants may recall information from the first test which could bias the results. EDF 204: Advanced Statistics: Test-retest Reliability
Important Notes Alternatively, if the duration is too long it is feasible that the participants could have changed in some important way which could also bias the results. EDF 204: Advanced Statistics: Test-retest Reliability
Potential Bias in Test-Retest Reliability Practice Effect A practice effect occurs when participants simply gets better at some test due to practice. This means they’re likely to show better results during later tests because they’ve had time to practice and improve. The way to prevent this type of bias is to give individuals tests that are of equal difficulty but have a different variety of questions so that they can’t memorize the answers to the types of questions asked on the first test. EDF 204: Advanced Statistics: Test-retest Reliability
Potential Bias in Test-Retest Reliability Fatigue Effect A fatigue effect occurs when participants gets worse at some test because they get mentally drained or fatigued from taking previous tests. The way to prevent this type of bias is to provide plenty of time in between tests (ideally weeks or even months) so that participants are fresh when taking both tests. EDF 204: Advanced Statistics: Test-retest Reliability
Potential Bias in Test-Retest Reliability Differences in Conditions When participants take the two tests under different conditions (i.e. different lighting, different time of day, different time allowed to complete the test, etc.) it’s possible that they score differently on the tests simply due to differences in the testing environment. The way to prevent this type of bias is to ensure that participants take both tests under identical conditions, i.e. during the same time of day, with the same general lighting and environment, and given the same amount of time to complete the test. EDF 204: Advanced Statistics: Test-retest Reliability
STUDENTS N X Y XY 1 50 35 2 43 32 3 48 45 4 45 36 5 40 49 6 47 43 7 52 44 8 39 42 9 44 45 10 43 43 11 41 44 12 46 35 13 39 37 14 51 48 15 49 45 16 51 52 17 42 43 18 48 47 19 44 43 20 41 42 21 47 48 22 51 50 23 38 42 24 43 43 25 42 45 26 41 39 27 45 43 28 39 38 29 50 50 30 48 45 SUMMATION ( X Y XY STUDENTS N X Y XY 1 50 35 2 43 32 3 48 45 4 45 36 5 40 49 6 47 43 7 52 44 8 39 42 9 44 45 10 43 43 11 41 44 12 46 35 13 39 37 14 51 48 15 49 45 16 51 52 17 42 43 18 48 47 19 44 43 20 41 42 21 47 48 22 51 50 23 38 42 24 43 43 25 42 45 26 41 39 27 45 43 28 39 38 29 50 50 30 48 45 X Y XY Let’s Practice! Teacher Rica prepared a teacher-made test. She administered the test twice with 2 weeks interval and gathered the data as shown on the table. Determine the reliability of the test made by Teacher Rica. EDF 204: Advanced Statistics: Test-retest Reliability
References Everitt, B. S.; Skrondal , A. (2010), The Cambridge Dictionary of Statistics , Cambridge University Press. Lindstrom, D. (2010). Schaum’s Easy Outline of Statistics , Second Edition ( Schaum’s Easy Outlines) 2nd Edition. McGraw-Hill Education Vogt, W.P. (2005). Dictionary of Statistics & Methodology: A Nontechnical Guide for the Social Sciences . SAGE. Wheelan, C. (2014). Naked Statistics . W. W. Norton & Company EDF 204: Advanced Statistics: Test-retest Reliability
Thank You for Listening! Presentation By: JERSON D. JOCUTAN, LPT For inquiries please contact the following: E-mail: [email protected] Facebook/Messenger: https://www.facebook.com/jrxnjc.tan/