RELIABILITY Reliability is the degree to which a test consistently measures whatever it measures. In research the term reliability means repeatability or consistency. When a measurement procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which score are free of measurement errors.
It is the degree to which an assessment tool produces stable and consistent result. It refers to the extent to which a test is internally consistent and the extent to which it yields consistent result on testing and retesting A measure is considered reliable if it would give us the same result over and over again .
TYPES OF RELIABILITY 1:- Test–Retest reliability. 2:- Equivalent or parallel form of reliability. 3:- Inter- rater or Inter observer reliability. 4:- Internal consistency reliability.
TEST-RETEST RELIABILITY Test –Retest Reliability is the degree to which scores are consistent over time. It indicates score variation that occur from testing session to testing session as a result of error of measurement. Test-Retest Reliability obtain by administering the same test twice over a period of time to a group of individuals . Example – interview re-interview
Factor contributing to test-retest reliability Clear instruction for administrators, research participants, and raters. Tasks in participants, first language or target language at appropriate level of difficulty. Unambiguously phrased tasks/questions.
Parallel form Reliability Especially appropriate when the test is very long . The used commonly used method to split the test into two is using the odd even strategy. Since longer test tend to be more reliable and since split half reliability represent the reliability of a test only half as long as the actual test. Usually done in educational contexts where you need alternative forms because of the frequency of retesting and where you can sample from lots of equivalent questions.
The alternative forms technique to estimate reliability is similar to the test method , except that different measures of a behaviour are collected at different times. Different versions of an assessment tool is applied to the same group of individuals. Both versions must contain same construct ,skill, knowledge etc. It is widely used method.
Contributing factor > The development of equivalent forms from specifications the describe tool content. > Trial of tools before data collection to ensure equivalence.
INTER-RATER RELIABILITY Inter- rater Reliability is the extent to which two or more individuals agree . Inter- rater reliability addresses the consistency of the implementation of a rating system. This reliability related to the examiner criterion. Inter- rater reliability is the extent to which two or more individuals agree.
Cohen’s Kappa (k) It measures inter-rater agreement for qualitative (categorical) items . It is generally thought to be a more robust measure than simple percent agreement calculation, since k takes into account the agreement occurring by chance .
Where, P o = the relative observed agreement among raters. P e = the hypothetical probability of chance agreement
INTERNAL CONSISTENCY RELIABILITY Reliability refers to the consistency of scores obtain in an experiment. Specifically, the internal consistency method refers to the consistency of scores using only a single administration of an instrument. Types of internal consistency Split –half procedure Kuder-Richardson Approach Alpha Coefficient
Split-Half Coefficient In this procedure, the test is split in half and each half is scored separately- usually odd items versus even items. A coefficient is then calculated to determine if the two halves of the test have the same results.
Kuder -Richardson Approach Most frequently employed method. checks the internal consistency of measurements with dichotomous choices. The KR20 is used for items that have varying difficulty . If all questions in your binary test are equally challenging, use the KR-21. Reliability with this test should be .70 or higher.
Alpha coefficient Cronbach’s alpha is a measure of internal consistency, that is, how closely related a set of items are as a group . Example - Happiness Survey, you might have five questions all asking different things, but when combined, could be said to measure overall happiness . A highly reliable (consistent) test will produce the same or similar results when the same individual re-takes your survey, under the same conditions.
alpha coefficient ranges in value from 0 to 1.
VALIDITY The term validity refer to whether or not a test measure what intends to measure. Validity is the extent to which a test measure ,what it is supposed to measure. The question of validity is raised in the context of the three point. The form of the test. The purpose of the test. The population for whom it is intended.
Types of validity Internal validity External validity Content validity Face validity Test validity Construct validity
INTERNAL VALIDITY Internal validity occurs when it can be concluded that there is a casual relationship between the variables being studied. It is related to the design of the experiment.
EXTERNAL VALIDITY External validity occurs when the casual relationship discovered can be generated to other people, time and contexts. Correct sampling will allow generalization and hence give external validity.
CONTENT VALIDITY When we want to find out if the entire content of the behaviour/construct /area is represent in the test we compare the test task with the content of the behaviour. This is the logical method, not an empirical one. Ex- If we want to set knowledge on American geography it is not fare to have most questions limited to the geography of new England .
FACE VALIDITY Face validity occurs where something appears to be valid. This depend very much on the judgement of the observer.
TEST VALIDITY Criterion:- Correlation with the standard. Predictive:-Predict future value of criterion. Concurrent:- Correlates with other test.
CONSTRUCT VALIDITY Construct validity accurately represent reality. > Convergent :- Simultaneous measure of some construct correlate. >Discriminant:- Does not measure what it should not.
Relationship between validity and reliability Reliability and validity are closely related A test can not considered valid unless the measurements resulting from it are reliable . Likewise results from a test can be reliable and not necessary valid.