Test standardization

20,198 views 15 slides Oct 12, 2014
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

No description available for this slideshow.


Slide Content

Test Standardization

Standardization is the process of trying out the test on a group of people to see the scores which are typically obtained. This standardization provides a mean (average) and standard deviation (spread) relative to a certain group. When an individual the test, she can determine how far above or below the average her score is, relative to the normative group. A standardized test is a test administered and scored in a consistent manner. Test are designed in such a way that the “questions, conditions for administering, scoring procedures, and interpretations are consistent and are administered and scored in a predetermined, standard manner.

Understanding Norms and Test Scores Standardization is the process of testing a group of people to see the scores that typically attained. With a standardized test, the participant can compare where that score fell compared to the standardization group‘s performance. With the standardization the normative group must reflect the population for which the test was designed. The group‘s performance is the basis for the test norms. What is standardized testing? Standardized tests are tools designed to allow measure of student performance relative to all others taking the same test.

History of Standardized Testing 1909- Thorndike Handwriting Scale was first popular standardized achievement test used in public schools. 1930- Most schools in the United States and Canada were using some form of standardized testing. 1950- Student would graduate from high school taking probably three standardized tests to the present where kids take between 18-21 tests, it is easy to believe that the “volume of testing has an annual growth rate of 10-20 percent”. 1965- Standardized tests were not used in early grades, because these were years of growth and development . 1980- Sixteen states and districts in 21 others now required children to take a standardized test before entering kindergarten and districts in at least 42 states requires students to pass a standardized test before “graduating” from kindergarten.

Types of Standardized Testing Norm-referenced Testing measures performance relative to all other students taking the same test. You can use it if you want to know how a student is compared to the rest . Criterion referenced Testing measures factual knowledge of a defined body of material. Multiple- choice tests that people take to get their license or a test in fractions are both examples of this type of testing.

Application in Classroom and Similar Settings Standardized test are intended to help a teacher, school, or district make decisions on what is working in the classroom, how to improve the education, and how to help a specific student. However, standardized test scores should not be the only thing a teacher, school, district, or school should look at when making a decision about programs or students. Other areas of consideration should be: observations in the classroom; evaluation of day-to-day class work, homework and assignments ; meetings with parents; and observation of student change and growth throughout the year.

Establishing Test Validity According to Calmorin the degree of validity is most important attribute of test. Validity refers to the degree to which test is capable of achieving certain aims. The validity must be determined with reference to the particular use for which the test being considered. The validity of test must always be considered in relation to the purpose it serves. Validity is always specific in relation to some definite situation. A test is always valid.

Item Analysis It is done after the first try out of the test. One method conducting item analysis is U-L Index Method . The teachers score the papers and rank the scores from highest to lowest according to the total score. Separate the upper 27% and lower 27% of the papers. Tally the responses made to each test item by each student in the upper 27% then do the same with the lower 27%. Compute the percentage of the upper group that got the item right. This is called the U. Compute the percentage of the lower group that got the item right. This is called L. Average U and L percentage. The result is the difficulty index. Subtract the L percentage from the U percentage. The result is discrimination index.  

After the item analysis, the tester uses the following table of equivalents interpreting the difficulty index:   .00- .20 - Very Difficult .21- .80 - Moderately Difficult .81- 1.00 - Very Easy

Item Revision On the basis of the item analysis data, test items are revised for improvement. After revising the test items that need revision, the tester needs another try out. The revised must be administered to the same set of samples.   Third try out After two revisions, the test is considered ready for the final form. The test is good in terms of difficulty index and discrimination indices. At this time, the test is ready for it reliability testing.  

How to Establish Reliability Reliability may be estimated through a variety of methods that fall into two types ; single-administration and multiple-administration. Multiple –administration methods require that two assessments are administered . Test-retest reliability Is estimated as the Pearson Product-moment Correlation Coefficient between two administrators of the same measure. This is sometimes known as the Coefficient of Stability. Alternative forms reliability Is estimated by the Pearson product-moment correlation coefficient of two different forms of a measure, usually administered together. This is sometimes known as the Coefficient of Equivalence.

Single- administration methods include split-half and internal consistency . Split- half reliability Treats the two halves of a measure as alternative forms. This “halves reliability” estimate is then stepped up to the full test length using the Spearman Brown Prediction Formula. This is sometimes referred to as the Coefficient of Internal Consistency . Internal Consistency Measure is Cronbach’s alpha, which is usually interpreted as the mean of all possible split-half coefficients. Cronbachs alpha which is a generalization of an earlier form of estimating internal consistency, Kuder -Richardson Formula 20.

Reliability Estimation Using a Split-half Methodology The split-half design in effect creates two comparable test administrations. The items in a test are split into two test that are equivalent in content and difficulty. Often this is done by splitting among odd and even numbered items. This assumes that the assessment is homogenous in content.

Estimating Reliability using Kuder - Richardson Formula 20 The rationale for Kuder and Richardson’s most commonly used procedure is roughly equivalent to: Securing the mean inter-correlation of the number of items (k) in the test. Considering this to be the reliability coefficient for the typical item in the test. Stepping up this average with the Spearman- Brown formula to estimate the reliability coefficient of an assessment of k items.   Formula for Kuder - Richardson Formula 20:   Where: k - the number of items in the test SD – standard deviation of the test p – the proportion of examinees who got an item correctly q – the proportion of those who got the item incorrectly  

SUBMITTED BY: Aileen B. Ferriols S UBMITTED TO: MRS. KATHERINE PARANGAT
Tags