Characteristics of Effective Selection Techniques IT Has four characteristics: reliable, valid, cost-efficient, and legally defensible.
Reliability is the extent to which a score from a selection measure is stable and free from error. If a score from a measure is not stable or errorfree, it is not useful.
1. Test-Retest Reliability Scores from the first administration of the test are correlated with scores from the second to determine whether they are similar If they are, it has, Temporal Stability The time should not be long enough so that the specific test answers have not been memorized, but short enough so that the person has not changed significantly 3 days TO 3 months Longer time interval = lower reliability co-efficienT Not appropriate for all kinds of tests T rait Anxiety – the amount of anxiety an individual has all the time S tate Anxiety – the amount of anxiety an individual has at any given moment each one of several people takes the same test twice
2. Alternate-Forms Reliability Two forms of the same test are constructed This counterbalancing of test-taking order is designed to eliminate any effects that taking one form of the test may have on scores on the second form If the scores of the two forms are similar when correlated, it has form stability Applicants retaking the same cognitive ability test, will increase their scores about twice as much as applicants taking an alternate form of the cognitive ability test It appears that with knowledge tests, retaking the test will still increase tests scores, but the increase is at the same level whether the second test is the same test, or an alternate form of the same test The time interval should be as short as possible (the test could lack either form stability or temporal stability if longer) Two forms of a test should also have the same mean and standard deviation Any changes in a test potentially change its reliability, validity, difficulty, or all three
3. Internal Reliability Extent tow which the similar items are answered in similar ways is referred to as internal consistency and measures item stability Longer tests = higher internal consistency Item Homogeneity – do all items measures the same thing? Or do they have different constructs? More homogenous items = higher internal consistency Split-Half Method – items in the test are split into two groups Spearman-Brown Prophecy – researchers used this formula in Split-Half Method because the number of items are reduced to adjust the correlation Cronbach’s Coefficient Alpha and K-R 20 – more popular and accurate methods of determining internal reliability looking at the consistency with which an applicant responds to items measuring a similar dimension or construct
3. Internal Reliability Complicated and calculated by a computer program K-R 20 is used for tests containing dichotomous items Co-efficient Alpha can be used not only for dichotomous items but for tests containing interval and ratio items such as five-point rating scales
4. Scorer Reliability An issue in projective or subjective tests in which there is no one correct answer, but even tests scores with the use of keys suffer from scorer mistakes Interrater Reliability – when human judgement of performance is involved Whether a test demonstrates sufficient reliability, two factors must be considered: the magnitude of the reliability and the people who will be taking the test To evaluate the coefficient, you can compare it with reliability coefficients typically obtained for similar types of tests For example, if you will be using the test for managers, but the reliability coefficient in the test manual was established with High school students, you would have less confidence that the reliability coefficient would generalize well to your organization a test or inventory can have homogeneous items and yield heterogenous scores and still not be reliable if the person scoring the test makes mistakes
Validity The degree two which inferences from scores on tests or assessments are justified by the evidence Just because a test is reliable does not mean it is valid The potential validity of a test is limited by its reliability
1. Content Validity The appropriate content for a test or test battery is determined by the job analysis The readability of a test is a good example of how tricky content validity can be the extent to which test items sample the content they are supposed to measure
2. Criterion Validity Commonly used criteria: supervisor ratings of performance, actual measures of performance, attendance, tenure, training performance, and discipline problems Concurrent Validity – a test is given to a group of employees who are already on the job Predictive Validity – the test is administered to a group of job applicants who are going to be hired The restricted range of performance scores makes obtaining a significant validity coefficient more difficult Validity Generalization – the extent to which a test is found valid for a job in one location is valid for the same job in a different location -Used only if a job analysis has been conducted the extent to which a test score is related to some measure of job performance
3. Construct Validity Most theoretical Concerned with inferences about test scores (unlike content validity with test construction) Determined by correlating scores on a test with the scores from other tests Known-group validity – a test is given to two groups of people who are known to be different on the trait in question -If you hear that the test is valid, you should obtain copies of the research reports -What validity should one be using depends on the situation and the person conducting the validity study -If you conduct a criterion validity study and do not get significance, that failure could be deadly if you are taken to the court -To get a significant validity coefficient, you need a good measure of performance, good test, and a decent sample size the extent tow which a test actually measures the construct that purports to measure
4. Face Validity If a test or its items do not appear valid, the test-takers and administrators will not have confidence in the results Motivates the applicants to do well on tests Barnum Statements – statements are so general that they can be true of almost everyone S eventeenth Mental Measurement Yearbook – the most common source of test information and contains information about thousands of different psychological tests as well as reviews by test experts If two or more tests have similar validities, then cost should be considered Computer-Adaptive testing – common computer testing -Fewer items required -Less time to complete -Finer distinctions in applicant ability can be made -Test-takers can receive immediate feedback -Test scores can be interpreted not only on the number of questions answered correctly, but on which questions were correctly answered the extent to which a test appears to be job related
Establishing the Usefulness of a selection device 1. Taylor-Russell Tables – designed to estimate the percentage of future employees who will be successful on the job if an organization uses a particular test The first information needed is the test’s criterion validity coefficient – that is by conducting a criterion validity study with the test scores correlated with some measure of job performance Higher validity coefficient, the greater the possibility the test will be useful
Establishing the Usefulness of a selection device number hired selection ratio = ----------------------- number of applications The final piece of information needed is the base rate – the percentage of employees currently on the job who are considered successful a. Employees are split into two equal groups based on their scores on some criterion such as tenure or performance b. Choose criterion measure score above which all employees are considered successful
Establishing the Usefulness of a selection device 2. Proportion of correct Decisions – the only info needed is employee test scores and the scores on criterion Easier but less accurate Lines are drawn from the point on the y-axis (criterion score) that represents a successful applicant, and from the point on the x-axis that represents the lowest score of a hired applicant Quadrant 1: employees who scored poorly on the test but performed well on the job Quadrant 2: employees who scored well on the test and were successful on the job Quadrant 3: employees who scored well on the test and were successful on the job Quadrant 4: employees who scored low on the test and did poorly on the job
Establishing the Usefulness of a selection device 3. Lawshe Tables – the probability that a particular applicant will be successful Needs validity coefficient, base rate, and applicant’s test score
Establishing the Usefulness of a selection device 4. Brogden-Cronbach-Gleser Utility Formula – computing the amount of money an organization would save if it used the test to select employees Utility Formula – estimate the monetary savings to an organization a. Number of employees hired per year b. Average tenure – average amount of time that employees in the position tend to stay with the company c. Test Validity – criterion validity coefficient d. Standard Deviation of performance in dollars – total salaries of current employees in the position in question should be averaged e. Mean standardized predictor score of selected applicants – obtained by the average score on the selection test for both applicants who are hired and the applicants who are not hired
Determining the fairness of a Test 1. Bias or Unbiased – refers to the technical aspects of a test if there are group differences in test scores that are unrelated to the construct being measured 2. Fairness – can include bias, but also includes political and social issues Equal probability of success on a job and have an equal chance of being hired I/O psychologists agree that a test is fair if it can predict performance equally well for all races, genders, and national origins 3. The first step in determining a test’s potential bias is finding out whether it will result in adverse impact (occurs if the selection rate for any group is less than 80% of the highest scoring group and the difference is statistically significant) Comparing the hiring rates of two groups Three criteria for a minimum qualification: it must be needed to perform the job, must be formally identified and communicated prior to the start of the selection process, and it must consistently applied 4. The organization might also determine whether a test has Single-group validity – the test will significantly predict performance for one group and not others To test for single-group validity, separate correlations are computer between the test and the criterion for each group If both significant, it passes the fairness hurdle Very rare and usually result of small sample sizes Occurs by chance
Determining the fairness of a Test 5. Differential Validity – a test is valid for two groups but more valid for one than for the other The test is valid for both group, but it is more valid for one than for the other Occurs usually in occupations dominated by single sex If occurs: (1) not use the test; or (2) use the test with separate regression equations for each group
Making the Hiring Decision If more than one criterion-valid test is used, the scores on the tests must be combined Done by Multiple Regression A. Unadjusted Top-Down Selection – applicants are rank-ordered on the basis of their test scores -Selection is then made by starting with the highest score and moving down until all openings have been filled -Organization will gain most utility -This approach can result in high levels of adverse impact and reduces an organization’s flexibility to use nontest factors - Compensatory Approach – the assumption is that if multiple test scores are used, the relationship between a low score on one test can be compensated for by a high score on another - Multiple regression is used to determine whether a score on one test can compensate the other
Making the Hiring Decision B. Rule of Three – the names of top three scorers are given to the person making the hiring decision -The decision maker will then choose among the three C. Passing scores – a means for reducing adverse impact and increasing flexibility -Determines the lowest score on a test that is associated with acceptable performance on the job -Allows to reach affirmative action goals -The most common method in determining the passing scores is to require job experts to reach each item on a test and provide an estimation about the percentage of minimally qualified employees that could answer the item correctly Multiple-cutoff approach – the applicants would be administered all of the test at one time -costly Multiple-Hurdle Approach – to reduce the costs associated with applicants failing one or more tests -Applicant is administered one test at a time -May bring unintended adverse impact, and affirmative action goals may not be met
Making the Hiring Decision D. Banding – attempts to hire the top scorers while still allowing some flexibility for affirmative action -Consider the degree of error associated with any test score - Standard Error – used to determine how many points should the applicants have to say that their test scores are significantly different -Can result in lower utility and may not actually reduce adverse impact