Norms and Interpretation of Test Scores.pptx

mianarslankasuri7800 81 views 27 slides Jun 30, 2024
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

This presentation very helpful for psychology student and others medical students


Slide Content

Norms and Interpretation of Test Scores

Scores on psychological tests are most commonly interpreted by reference to norms which represent the test performance of the standardization sample. Means average performance of a representative group. Raw score is converted into some relative measure. The difficulty level of the test also effect such a comparison. The individual relative performance in different tests or sub-tests can thus be compared.

Fundamentally, however, derived scores are expressed in one of two major ways: (1) developmental level attained; (2) relative position within a specified group .

STATISTICAL CONCEPTS A first step in bringing order into such a chaos of raw data is to tabulate the scores into a frequency distribution. Make convenient class intervals according to the spread of data. For instance, group of 5 or 10 frequencies. The sums of these frequencies will equal N, the total number of cases in the group.

The information provided by a frequency distribution can also be presented graphically in the form of a distribution curve. The graph has been plotted in two ways, both forms being in common use. In the histogram , the height of the column erected over each class interval corresponds to the number of persons scoring in that interval. We can think of each individual standing on another’s shoulders to form the column. In the frequency polygon , the number of persons in each interval is indicated by a point placed in the center of the class interval and across from the appropriate frequency. Except for minor irregularities, the distribution portrayed in Figure 1 resembles the bell-shaped normal curve.

Mean is the average in a specific group. As is well known, this is found by adding all scores and dividing the sum by the number of cases (N ). Another measure of central tendency is the m ode , or most frequent score. In a frequency distribution, the mode is the midpoint of the class interval with the highest frequency. A third measure of central tendency is the median, or middlemost score when all scores have been arranged in order of size. The median is the point that bisects the distribution, half the cases falling above it and half below . A much more serviceable measure of variability is the standard deviation (symbolized by either SD or a), in which the negative signs are legitimately eliminated by squaring each deviation. This procedure has been followed in the last column of Table 2. The sum of this column divided by the number of cases (f ) is known as the variance , or mean square deviation , and is symbolized by cr2. The variance has proved extremely useful in sorting out the contributions of different factors to individual differences in test performance.

SD and proportion of cases.

DEVELOPMENTAL NORMS Thus an 8-year-old who performs as well as the average 10-year- old on an intelligence test may be described as having a mental age of 10; In a different context, a fourth-grade child may be characterized as reaching the sixth-grade norm in a reading test and the third-grade norm in an arithmetic test. A child’s score on the test will then correspond to the highest year level that he can successfully complete.

Mental age norms have also been employed with tests that are not divided into year levels . Since intellectual development progresses more rapidly at the earlier ages and gradually decreases as the individual approaches his mature limit, the mental age unit shrinks correspondingly with age. Grade equivalents. Scores on educational achievement tests are often interpreted in terms of grade equivalents. This practice is understandable because the tests are employed within an academic setting. Grade norms are found by computing the mean raw score obtained by children in each grade. Thus, if the average number of problems solved correctly on an arithmetic test by the fourth graders in the standardization sample is 23, then a raw7 score of 23 corresponds to a grade equivalent of 4.

Grade norms are also subject to misinterpretation unless the test user keeps firmly in mind the manner in w7hich they were derived. For example, if a fourth-grade child obtains a grade equivalent of 6.9 in arithmetic, it does not mean that he has mastered the arithmetic processes taught in the sixth grade. He undoubtedly obtained his score largely by superior performance in fourth-grade arithmetic. It certainly could not be assumed that he has the prerequisites for seventh-grade arithmetic.

Ordinal scales Empirical observation of behavior development in infants and young children led to the description of behavior typical of successive ages in such functions as locomotion, sensory discrimination, linguistic communication, and concept formation. Its related to milestones of children and elders simultaneously. Gesell and his co-workers emphasized the sequential patterning of early behavior development. They cited extensive evidence of uniformities of developmental sequences and an orderly progression of behavior changes. For example, the child’s reactions toward a small object placed in front of him exhibit a characteristic chronological sequence in visual fixation and in hand and finger movements.

In accordance with Piaget’s approach, these instruments are ordinal scales, in which the attainment of one stage is contingent upon completion of the earlier stages in the development of the concept. The tasks are designed to reveal the dominant aspects of each developmental stage;

WITHIN-GROUP NORMS Percentiles . Percentile scores are expressed in terms of the percentage of persons in the standardization sample who fall below a given raw score. For example, if 28 percent of the persons obtain fewer than 15 problems correct on an arithmetic reasoning test, then a raw score of 15 corresponds to the 28th percentile. Percentage scores are raw scores, expressed in terms of the percentage of correct items; percentiles are derived scores, expressed in terms of percentage of persons.

Standard scores. Current tests are making increasing use of standard scores, which are the most satisfactory type of derived score from most points of view. Standard scores express the individual’s distance from the mean in terms of the standard deviation of the distribution . Linearly derived standard scores are often designated simply as “standard scores” or “s scores.” To compute a z score, we find the difference between the individuals raw score and the mean of the normative group and then divide this difference by the SD of the normative group.

To convert an original standard score to the new scale, it is simply necessary to multiply the standard score by the desired SD (100) and add it to or subtract it from the desired mean (500). Scores on the separate subtests of the Weehsler Intelligence Seales , for instance, are converted to a distribution with a mean of 10 and an SD of 3. All such measures are examples of linearly transformed standard scores . In order to achieve comparability of scores from dissimilarly shaped distributions, nonlinear transformations may be employed to fit the scores to any specified type of distribution curve

Normalized standard scores are expressed in the same form as linearly derived standard scores, viz., with a mean of zero and an SD of 1. Thus, a normalized score of zero indicates that the individual falls at the mean of a normal curve, excelling 50 percent of the group. If the normalized standard score is multiplied by 10 and added to or subtracted from 50, it is converted into a T score, a type of score first proposed by McCall (1922). On this scale, a score of 50 corresponds to the mean, a score of 60 to 1 SD above the mean, and so forth.

Another well-known transformation is represented by the stanine scale, developed by the United States Air Force during World War II. This scale provides a single-digit system of scores with a mean of 5 and an SD of approximately 2.3 The name stanine (a contraction of “standard nine”) is based on the fact that the scores run from 1 to 9. For more understanding of stanine, visit the link below https://www.youtube.com/watch?v=l01Y6QeiGFw&t=198s

The deviation I Q . In an effort to convert MA scores into a uniform index of the individual’s relative status, the ratio IQ (Intelligence Quotient) was introduced in early intelligence tests. Such an IQ was simply the ratio of mental age to chronological age, multiplied by 100 to eliminate decimals (IQ = 100 X MA/CA). Obviously, if a child’s MA equals his CA, his IQ will be exactly 100. An IQ of 100 thus represents normal or average performance. IQ’s below 100 indicate retardation, those above 100, acceleration. IQ’s will not be comparable at different age levels. An IQ of 115 at age 10, for example, may indicate the same degree of superiority as an IQ of 125 at age 12, since both may fall at a distance of 1 SD from the means of their respective age distributions.

In connection with the last point, a reexamination of the meaning of a ratio IQ on such a test as the Stanford- Binet will show that these IQ’s can themselves be interpreted as standard scores. If we know that the distribution of Stanford- Binet ratio IQ’s had a mean of 100 and an SD of approximately 16, we can conclude that an IQ of 116 falls at a distance of 1 SD above the mean and represents a standard score of +1.00. Similarly, an IQ of 132 corresponds to a standard score of +2.00, an IQ of 76 to a standard score of —1.50, and so forth. Moreover , a Stanford- Binet ratio IQ of 116 corresponds to a percentile rank of approximately 84, because in a normal curve 84 percent of the cases fall below +1.00 SD (Figure 4).

RELATIVITY OF NORMS A n individual's relative standing in different functions may be grossly misrepresented through lack of comparability of test norms. T est and a spatial aptitude test to determine his relative standing in the two fields. If the verbal ability test was standardized on a random sample of high school students, while the spatial test was standardized on a selected group of boys attending elective shop courses, the examiner might erroneously conclude that the individual is much more able along verbal than along spatial lines, when the reverse may actually be the case.

T he N ormative S ample Any norm, however expressed, is restricted to the particular normative population from which it was derived. The test user should never lose sight of the way in which norms are established. Psychological test norms are in no sense absolute, universal, or permanent. They merely represent the test performance of the subjects constituting the standardization sample. In choosing such a sample, an effort is usually made to obtain a representative cross section of the population for which the test is designed.

Specific Norms Another approach to the nonequivalence of existing norms—and probably a more realistic one for most tests—is to standardize tests on more narrowly defined populations, so chosen as to suit the specific purposes of each test. In such cases, the limits of the normative population should be clearly reported with the norms. Thus, the norms might be said to apply to “employed clerical workers in large business organizations” or to “first-year engineering students.”