research methods for business, descriptive statistics

MonaHashim6 30 views 104 slides Jul 29, 2024
Slide 1
Slide 1 of 104
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104

About This Presentation

research methods for business


Slide Content

Research Methods, Design, and Analysis Thirteenth Edition Chapter 15 Descriptive Statistics Copyright © 2020, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Descriptive Statistics

Learning Objectives 15.1 Describe the purpose of descriptive statistics. 15.2 Explain the concept of a frequency distribution. 15.3 Differentiate among the types of graphic representations of data and when they should be used. 15.4 Calculate the mean, median, and mode of a data set. 15.5 Calculate the variance and standard deviation of a data set. 15.6 Summarize the techniques used to determine relationships among variables.

Field of Statistics (1 of 2) Two broad categories Descriptive statistics Inferential statistics Figure 15.1 Major divisions of the field of statistics.

Field of Statistics (2 of 2) Descriptive statistics The type of statistical analysis focused on describing, summarizing, or explaining a set of data Allows you to make sense of your set of data and to make the key characteristics easily understandable to others Inferential statistics The type of statistical analysis focused on making inferences about populations based on sample data Subdivided into Estimation Point estimation Interval estimation Hypothesis testing

Let's Begin... In this chapter, we will explain descriptive statistical analysis. In Chapter 16, we will explain inferential statistical analysis. We assume no prior knowledge of the material. Both chapters are written so that everyone can understand the material. Discussion requires very little mathematical background. Focus on showing you What statistical procedures to select to understand your data How to interpret and communicate your results Before moving to the next section please read Exhibit 15.1 to see why you must always conduct your statistical analyses intelligently.

Exhibit 15.1 (1 of 6) Simpson’s Paradox Demonstrate how statistical analysis, if not conducted properly, can deceive people. Example is based on a real case of purported gender discrimination at the University of California, Berkeley, several decades ago. Written up in Science (Bickel, 1975) Data shown below refer to men and women admitted to graduate school in the Department of Psychology at a hypothetical university.

Exhibit 15.1 (2 of 6) Simpson’s Paradox Combined or “Aggregated” Results Blank Number Applied Number Admitted Percentage Admitted Men 180 99 55 Women 100 45 45 55% of the men who applied to this department were admitted to graduate school. Only 45% of the women who applied were admitted. Assume that their qualifications were the same.

Exhibit 15.1 (3 of 6) Simpson’s Paradox If this were the case, you might conclude that gender discrimination has occurred because men had a much higher rate of acceptance than women. Assume that the 280 students applying to the Psychology Department applied to two different graduate programs. Doctoral program in clinical psychology Doctoral program in experimental psychology The researcher decides to break down the data separately for each program and obtains the two tables shown next.

Exhibit 15.1 (4 of 6) Simpson’s Paradox Results Separated by Program (“Disaggregated Results”) Clinical Psychology Program Blank Number Applied Number Admitted Percentage Admitted Men 60 9 15 Women 60 12 20 Experimental Psychology Program Blank Number Applied Number Admitted Percentage Admitted Men 120 90 75 Women 40 32 80 What do you see in these two program tables? Women (not men) had the higher acceptance rates in both degree programs! If there is any discrimination, it is in favor of the women applicants.

Exhibit 15.1 (5 of 6) Simpson’s Paradox What’s going on? The overall/combined data suggested one conclusion. When the data were more carefully analyzed (they were “disaggregated” in the clinical and experimental program tables), a completely different conclusion became apparent. How could it be that opposite conclusions are suggested in the two exhibits based on the same data? A statistical phenomenon known as Simpson’s paradox Women tended to apply to the program that was harder to get into Men tended to apply to the program that was easier to get into Aggregated data produced one conclusion Disaggregated data produced the opposite and more accurate conclusion.

Exhibit 15.1 (6 of 6) Simpson’s Paradox Moral of this story Be cautious when you examine and interpret descriptive data. Always look at the data. Critically In multiple ways Until you are able to draw the most warranted conclusion

Descriptive Statistics (1 of 4) 15.1 Describe the Purpose of Descriptive Statistics Data set - a set of data, where the rows are “cases” and the columns are “variables” The researcher uses descriptive statistics to understand and summarize the key numerical characteristics of the data set. Example Calculate the averages of your treatment and control group scores in an experiment. If you conducted a survey, you might want to know the frequencies of the responses for each question. Want to use graphs to pictorially communicate some of your results

Descriptive Statistics (2 of 4) 15.1 Describe the Purpose of Descriptive Statistics In the next chapter (inferential statistics) Will learn how to determine if the difference between the treatment and control groups means is statistically significant If other observed results are statistically significant In this chapter Focus on taking whatever set of data you currently have and showing how to summarize the key characteristics of the data Key question in descriptive statistics How can I communicate the important characteristics of my data? One way would be to supply a printout of all of your data, but that would be very inefficient.

Descriptive Statistics (3 of 4) 15.1 Describe the Purpose of Descriptive Statistics Data set in Table 15.1 will be used in several places in this chapter. “College graduate data set.” Hypothetically say Data came from a survey research study Conducted with 25 recent college graduates You asked participants Starting salaries, undergraduate G P A, college major (you only surveyed three majors), gender, the S A T scores they had when they entered college, number of days they believe they missed during college Goal in this survey research study To determine what variables predicted the starting salaries of psychology, philosophy, and business majors

Table 15.1 (1 of 3) Hypothetical Data Set for Nonexperimental Research for 25 Recent College Graduates Four quantitative variables Salary G P A S A T scores Days of school missed Two categorical variables College major Gender Standard format Cases in rows Variables in columns

Table 15.1 (2 of 3) Hypothetical Data Set for Nonexperimental Research for 25 Recent College Graduates Person Salary G P A Major Gender S A T Days Missed 1 24,000 2.5 1 1,110 36 2 25,000 2.5 1 1,100 26 3 27,500 3 point 0 1 1,300 31 4 28,500 2.4 2 1 1,100 18 5 30,500 3 point 0 2 1,150 26 6 30,500 2.9 2 1 1,130 18 7 31,000 3.1 1 1,180 16 8 31,000 3.3 1 1,160 11 9 31,500 2.9 2 1,170 25 10 32,000 3.6 1 1,250 12 11 32,000 2.6 1 1 1,230 26 12 32,500 3.1 2 1,130 21 13 32,500 3.2 2 1 1,200 17 14 32,500 3 point 0 3 1 1,150 14

Table 15.1 (3 of 3) Hypothetical Data Set for Nonexperimental Research for 25 Recent College Graduates Person Salary G P A Major Gender S A T Days Missed 15 33,000 3.7 1 1,260 29 16 33,500 3.1 2 1 1,170 21 17 33,500 2.7 2 1 1,140 22 18 34,500 3 point 0 3 1,240 14 19 35,500 3.1 3 1,330 16 20 36,500 3.5 2 1 1,220 21 37,500 3.4 3 1 1,150 4 22 38,500 3.2 2 1,270 10 23 38,500 3 point 0 3 1 1,300 24 40,500 3.3 3 1 1,280 5 25 41,500 3.5 3 1 1,330 2 Note: For the categorical variable “major,” 1 = psychology, 2 = philosophy, and 3 = business. For the categorical variable “ gender,” 0 = male and 1 = female.

Descriptive Statistics (4 of 4) 15.1 Describe the Purpose of Descriptive Statistics Enter data into a spreadsheet such as Excel (which can be used by a statistical program such as S P S S) S P S S We used the popular statistical program S P S S for most of the analyses in this and the next chapter. Most universities provide access to S P S S or another statistical program in their computer labs.

Frequency Distributions 15.2 Explain the Concept of a Frequency Distribution Frequency distribution - data arrangement in which the frequencies of each unique data value is shown First column shows the unique data values for the variable. Second column the frequencies for each of these values Third column the percentages Example - Table 15.2 Variable - starting salary Lowest salary is $24,000. Highest is $41,500. Most frequently occurring salary − $32,500 Three of the 25 recent graduates had this starting salary 4% of the 25 cases had a salary of $24,000. 8% of the cases had a salary of $32,000.

Graphic Representations of Data (1 of 9) 15.3 Differentiate Among the Types of Graphic Representations of Data and When They Should Be Used Graphs Pictorial representations of data Can be used for one or more variables Used to help communicate the nature of data Example Program evaluators often include graphs in their reports because their clients often like to see graphic representations of the data.

Graphic Representations of Data (2 of 9) 15.3 Differentiate Among the Types of Graphic Representations of Data and When They Should Be Used Bar Graphs Graph that uses vertical bars to represent the data values of a categorical variable Figure 15.2 A bar graph of undergraduate major.

Graphic Representations of Data (3 of 9) 15.3 Differentiate Among the Types of Graphic Representations of Data and When They Should Be Used Figure 15.2 Bar graph of the categorical variable college major Horizontal axis shows the three categories in the variable. Frequencies of each category are shown on the vertical axis. Bars provide graphical representations of the frequencies of the three majors. 8 psychology majors 10 philosophy majors 7 business majors Can easily convert these numbers into percentages 32% were psychology majors (8 divided by 25). 40% were philosophy majors (10 divided by 25). 28% were business majors (7 divided by 25).

Graphic Representations of Data (4 of 9) 15.3 Differentiate Among the Types of Graphic Representations of Data and When They Should Be Used Histograms Graph depicting frequencies and distribution of a quantitative variable A presentation of a frequency distribution in bar format Advantage over a frequency distribution More clearly shows the shape of the distribution Histogram for starting salary in Figure 15.3 In contrast to bar graphs, the bars in histograms are placed next to each other with no space in between.

Figure 15.3 Histogram of Starting Salary

Graphic Representations of Data (5 of 9) 15.3 Differentiate Among the Types of Graphic Representations of Data and When They Should Be Used Line Graphs A graph relying on the drawing of one or more lines connecting data points A useful way to graphically depict the distribution of a quantitative variable Line graph of starting salary in Figure 15.4 Useful to visually show and aid in the interpretation of interaction effects Figure 15.4 Line graph of starting salary.

Graphic Representations of Data (6 of 9) 15.3 Differentiate Among the Types of Graphic Representations of Data and When They Should Be Used Line Graph interaction example Conduct an experiment to test a new social skills training program. Pretest–posttest control group design D V = the number of appropriate social interactions I V = social skills training (training v ersu s. no training) Data shown in Table 15.3 Some results of this hypothetical experiment are shown in Figure 15.5.

Table 15.3 (1 of 2) Hypothetical Data Set for Experimental Research Study Examining the Effectiveness of Social Skills Training Person Pretest Scores Treatment Condition Posttest Scores 1 3 1 4 2 4 1 4 3 2 1 3 4 1 1 2 5 1 1 2 6 1 7 2 1 2 8 4 1 4 9 4 1 4 10 3 1 4 11 2 1 3 12 5 1 5 13 3 1 3 14 3 1 3 15 2 2 4 16 3 2 5

Table 15.3 (2 of 2) Hypothetical Data Set for Experimental Research Study Examining the Effectiveness of Social Skills Training Person Pretest Scores Treatment Condition Posttest Scores 17 1 2 2 18 2 2 4 19 1 2 2 20 2 2 4 21 2 2 3 22 3 2 5 23 5 2 6 24 2 2 4 25 4 2 2 26 4 2 5 27 2 2 4 28 5 2 6 Note: Pretest = number of appropriate interactions at the beginning of the experiment; posttest = number of appropriate interactions after the experimental intervention; treatment condition = 1 for control group (did not receive social skills training) and 2 for treatment group (did receive social skills training).

Figure 15.5 Line Graph of Results from Pretest–Posttest Control Group Design Studying Effectiveness of Social Skills Treatment

Graphic Representations of Data (7 of 9) 15.3 Differentiate Among the Types of Graphic Representations of Data and When They Should Be Used Line Graph interaction example Both groups started low on the number of appropriate skills they exhibited. At the end of the study After the treatment group received social skills training The participants in the treatment group have higher scores than the participants in the control group. The number of appropriate social skills Increased for the treatment group No (or very little) increase for the control group Treatment seems to work. You must also determine if the result is statistically significant.

Graphic Representations of Data (8 of 9) 15.3 Differentiate Among the Types of Graphic Representations of Data and When They Should Be Used Scatterplots A graphical depiction of the relationship between two quantitative variables Dependent variable on the vertical axis Independent or predictor variable on the horizontal axis Dots within the graph represent the cases (i.e., participants) in the data set. Scatterplot of the two quantitative variables grade point average and starting salary in Figure 15.6 Appears to be a positive relationship between G P A and starting salary As G P A increases, starting salary also increases.

Figure 15.6 Scatterplot of Starting Salary by College G P A (Positive Relationship)

Graphic Representations of Data (9 of 9) 15.3 Differentiate Among the Types of Graphic Representations of Data and When They Should Be Used Scatterplots Positive relationship The data values tend to start at the bottom left side of the graph and end at the top right side. Scatterplot of days of school missed during college and starting salary is shown in Figure 15. Appears to be a negative relationship between days missed and starting salary As days missed increases starting salary decreases Negative relationship The data values tend to start at the top left side of the graph and end at the lower right side.

Figure 15.7 Scatterplot of Starting Salary by Days Missed (Negative Relationship)

Measures of Central Tendency (1 of 7) 15.4 Calculate the Mean, Median, and Mode of a Data Set Measure of central tendency Numerical value expressing what is typical of the values of a quantitative variable One of the most important ways to describe and understand data Example College G P A is the value expressing what is typical for your grades. Three most common measures of central tendency Mode The median The mean.

Measures of Central Tendency (2 of 7) 15.4 Calculate the Mean, Median, and Mode of a Data Set Mode The most frequently occurring number Most basic, and the crudest, measure of central tendency Example 0, 2, 3, 4, 5, 5, 5, 7, 8, 8, 9, 10 mode is 5 occurs three times If there is a tie for the most frequently occurring number Need to report both Point out that the data for the variable are bimodal.

Measures of Central Tendency (3 of 7) 15.4 Calculate the Mean, Median, and Mode of a Data Set Mode Practice Determine the mode for the following set of numbers 1, 2, 2, 5, 5, 7, 10, 10, 10 If you said 10, then you are right The mode in this case is not a very good indicator of the central tendency of the data. If the data are normally distributed Most people fall toward the center of the distribution of numbers. The mode works much better than in this case In practice, research psychologists rarely use the mode.

Measures of Central Tendency (4 of 7) 15.4 Calculate the Mean, Median, and Mode of a Data Set Median The center point in an ordered set of numbers Odd number of numbers the median is the middle number example 1, 2, 3, 4, 5 median is 3 Even number of numbers The median is the average of the two centermost numbers Example 1, 2, 3, 4 Median is 2.5 (i.e., the average of 2 and 3 is 2.5)

Measures of Central Tendency (5 of 7) 15.4 Calculate the Mean, Median, and Mode of a Data Set Median An interesting property of the median is that it is not affected by the size of the highest and lowest numbers Example The median of 1, 2, 3, 4, 5 is the same as the median of 1, 2, 3, 4, 500 In both cases the median is 3!

Measures of Central Tendency (6 of 7) 15.4 Calculate the Mean, Median, and Mode of a Data Set Mean The arithmetic average The average of 1, 2, and 3 = 2 Psychologists sometimes refer to the mean as (called X bar) Our formula for getting the mean X stands for the variable you are using n is the number of numbers you have is a sum sign (add up the numbers that follow it)

Measures of Central Tendency (7 of 7) 15.4 Calculate the Mean, Median, and Mode of a Data Set Mean Simple case where the three values of our variable are 1, 2, and 3 Psychologists frequently calculate the means for the groups that they want to compare e.g.. The mean performance level for treatment and control groups Figure 15.5 Each of the four points in the graph is a group mean Means for the treatment and the control groups at the pretest Means for these two groups at the posttest

Measures of Variability (1 of 11) 15.5 Calculate the Variance and Standard Deviation of a Data Set Also important to find out how much your data values are spread out Variability Numerical value expressing how spread out or how much variation is present in the values of a quantitative variable If all of the data values for a variable were the same, then there is no variability. example- 4, 4, 4, 4, 4, 4, 4, 4, 4, 4 Variability in these numbers 1, 2, 3, 3, 4, 4, 4, 6, 8, 10 The more different your numbers, the more variability you have.

Measures of Variability (2 of 11) 15.5 Calculate the Variance and Standard Deviation of a Data Set Which of the following sets of data have the most variability present? Group one: 44, 45, 45, 45, 46, 46, 47, 47, 48, 49 Group two: 34, 37, 45, 51, 58, 60, 77, 88, 90, 98 The data for group two have more variability than group one. Homogeneous - little variability in scores in a group Heterogeneous - a lot of variability in scores in a group Three of the types of variability Range Variance Standard deviation

Measures of Variability (3 of 11) 15.5 Calculate the Variance and Standard Deviation of a Data Set Range The highest number minus the lowest number The simplest measure of variability, but also the most crude Formula Range = H – L H is the highest number L is the lowest number Example Data for group one shown in the previous section Range is equal to 5 (49 − 44) Range for group two Range is 64 (98 − 34) Crude index of variability because it takes into account only two numbers

Measures of Variability (4 of 11) 15.5 Calculate the Variance and Standard Deviation of a Data Set Variance and Standard Deviation Two most popular measures of variability Superior to the range because they take into account all of the data values for a variable Both provide information about the dispersion or variation around the mean value of a variable Variance - the average deviation of data values from their mean in squared units Is popular because it has nice mathematical properties

Measures of Variability (5 of 11) 15.5 Calculate the Variance and Standard Deviation of a Data Set Variance and Standard Deviation Standard deviation - the square root of the variance Turns the variance into more meaningful units An approximate indicator of the average distance that your data values are from their mean Example if you have a mean of 5 a standard deviation of 2 data values tend to be approximately 2 units above or below 5 For the variance and the standard deviation The larger the value, the greater the data are spread out The smaller the value, the less the data are spread out How to calculate the variance and standard deviation in Table 15.4

Table 15.4 (1 of 3) Calculating the Variance and Standard Deviation Blank (1) (2) (3) (4) Blank ( X ) Left parenthesis X bar right parenthesis. Left parenthesis X minus X bar right parenthesis. Left parenthesis X minus X bar right parenthesis superscript 2 end superscript. Blank 2 6 −4 16 Blank 4 6 −2 4 Blank 6 6 Blank 8 6 2 4 3 10 6 4 16 Blank 30 Blank 40 Sums Summation, superscript up arrow end superscript, of X. Blank Blank Summation of left parenthesis X minus X bar (up arrow) right parenthesis.

Table 15.4 (2 of 3) Calculating the Variance and Standard Deviation Steps: Insert your data values in the X column. Calculate the mean of the values in column 1, and place this value in column 2. In our example, the mean is 6. Subtract the values in column 2 from the values in column 1, and place these into column 3. Square the numbers in column 3 (i.e., multiply the number by itself), and place these in column 4. ( Note: You can ignore the minus signs in column 3 because a negative number multiplied by a negative number produces a positive number

Table 15.4 (3 of 3) Calculating the Variance and Standard Deviation Steps: Insert the appropriate values into the following formula for the variance: Variance where is the sum of the numbers in column 4, and n is the number of numbers. In this example, the variance The standard deviation is the square root of the variance In this example, the variance is 8 (see step 5), and the standard deviation is 2.83 (i.e., the square root of 8 = 2.83).

Measures of Variability (6 of 11) 15.5 Calculate the Variance and Standard Deviation of a Data Set Standard Deviation and the Normal Curve If the data were fully normally distributed, the standard deviation would have additional meaning. Examine the standard normal distribution in Figure 15.8 The normal curve or normal distribution has a bell shape. It is high in the middle and it tapers off to the left and the right.

Figure 15.8 Areas Under the Normal Distribution Z scores −3 −2 −1 1 2 3 Percentile ranks 0.1 2 16 50 84 98 99.9

Measures of Variability (7 of 11) 15.5 Calculate the Variance and Standard Deviation of a Data Set Standard Deviation and the Normal Curve If the data were fully normally distributed You would be able to apply the 68, 95, 99.7 percent rule 68% of the cases fall within one standard deviation from the mean. 95% fall within two standard deviations. 99.7% fall within three standard deviations. It is important to understand sample data are never fully normally distributed. Can be called the theoretical normal distribution. The normal distribution also has many applications in more advanced statistics courses.

Measures of Variability (8 of 11) 15.5 Calculate the Variance and Standard Deviation of a Data Set z scores A score that has been transformed into standard deviation units Transformed from their original “raw scores” into a new “standardized” metric Mean of zero and a standard deviation of one Data values now can be interpreted in terms of how far they are from their mean If a data value is +1.00, we can say that this value falls one standard deviation above the mean. A value of +2.00 falls two standard deviations above the mean A value of -1.5 falls one and a half standard deviations below the mean “Standardized units” or “z scores” were used with the normal curve just shown in Figure 15.8.

Measures of Variability (9 of 11) 15.5 Calculate the Variance and Standard Deviation of a Data Set Formula To use this formula Convert raw scores to z scores Need to know the mean and standard deviation

Measures of Variability (10 of 11) 15.5 Calculate the Variance and Standard Deviation of a Data Set Example Set of scores - 2, 4, 6, 8, 10 Mean = 6 S d = 2.83 Convert 10 to a z score convert 2 to a z score:

Measures of Variability (11 of 11) 15.5 Calculate the Variance and Standard Deviation of a Data Set Negative sign indicates that the number is below the mean. All of the z scores for our set of five numbers −1.413, −.707, 0, +.707, +1.413 The average of these numbers is zero. Key point You can take any set of numbers. Convert the numbers to z scores. They will always have a mean of zero and a standard deviation of one. Helps psychologists when They want to compare scores across different variables and different data sets. They want to know how far a data value falls above or below the mean.

Examining Relationships Among Variables (1 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Rarely is a psychologist interested in a single variable. Typically are interested in determining whether I V s and D V s are related Use I V s to “explain variance” in D V s Determining what I V s predict or cause changes in D V s is perhaps the primary goal of science. Practitioners can apply this knowledge to produce changes in the world. Use new psychotherapy techniques to reduce mental illness To determine how to predict who is “at risk” for future problems so that early interventions can be started

Examining Relationships Among Variables (2 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Describe several approaches used to examine relationships among two or more variables. Vast majority of the time D V in psychological research is a quantitative variable. e.g., Response time, performance level, level of stress Most of the indexes of relationship described here are used for quantitative D V s. Will explain one exception in which you have a categorical D V and a categorical I V

Examining Relationships Among Variables (3 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Unstandardized and Standardized Difference Between Group Means Example, in our college graduate data set Mean (i.e., the average) starting salary for males is $34,791.67 Mean starting salary for females is $31,269.23 Unstandardized difference between these two means “There appears to be a sizable relationship between gender and starting salary such that males have higher salaries than females” The difference between the means is often transformed into a standardized measure.

Examining Relationships Among Variables (4 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Cohen’s d The difference between two means in standard deviation units One of many effect size indicators Effect size indicator Index of magnitude or strength of a relationship or difference between means

Examining Relationships Among Variables (5 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Cohen’s d formula M1 is the mean for group 1 M2 is the mean for group 2 S D is the standard deviation of either group Traditionally it’s the control group’s standard deviation in an experiment. Some researchers prefer a pooled standard deviation. Rough starting point for interpreting d d = .2 as “small” d = .5 as “medium” d = .8 as “large”

Examining Relationships Among Variables (6 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Calculate Cohen’s d to compare the average male and female incomes Gender is the categorical I V Starting salary is the quantitative D V Mean starting salary Males = $34,791.67 Females = $31,269.23 Unstandardized difference between the means is $3,522.44 Standard deviation for females = $4,008.40 Mean starting salary for men is .88 standard deviations above the mean for females Criteria for interpretation, “large” difference between the means

Exhibit 15.2 (1 of 4) Using Cohen’s D in a Pretest–Posttest Control-Group Experimental Research Design I V = treatment and control conditions Purpose - treatment o improve the social skills of the participants D V = number of appropriate interactions in a 1-hour observation session (pretest and posttest) Figure 15.5 Pretest and posttest means for the treatment and control groups It appears that the treatment worked After the intervention the social skills performance of the treatment group improved quite a bit more than that for the control group At the pretest, the two groups’ means were similar. Suggesting that random assignment to the groups worked well

Exhibit 15.2 (2 of 4) Using Cohen’s D in a Pretest–Posttest Control-Group Experimental Research Design Calculate Cohen’s d for pretest and posttest means Pretest mean for the treatment group (M1) = 2.71 Pretest mean for the control group (M2) = 2.64 Standard deviation ( S D) for the control group = 1.39 Posttest the mean for the treatment group (M1) Posttest mean for the control group (M2) = Standard deviation ( S D) for the control group = 1.27

Exhibit 15.2 (3 of 4) Using Cohen’s D in a Pretest–Posttest Control-Group Experimental Research Design Interpreting these data The difference between the means was very small at the pretest Standardized mean difference (Cohen’s d) = The treatment group was only of a standard deviation larger than the control group mean The posttest Cohen’s d was .73 Indicates the treatment group mean was .73 standard deviation units above the control group mean A moderately large difference

Exhibit 15.2 (4 of 4) Using Cohen’s D in a Pretest–Posttest Control-Group Experimental Research Design Although the results just presented appear to support the efficacy of the social skills training, we still cannot trust this experimental finding. Problem - the observed differences between the means might represent nothing more than random or chance fluctuation in the data. In the next chapter on inferential statistics, we will check to see if this difference is statistically significant.

Examining Relationships Among Variables (7 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Correlation Coefficient Have a quantitative D V and a quantitative I V, you need to either obtain A correlation coefficient Or a regression coefficient Correlation coefficient - index indicating the strength and direction of linear relationship between two quantitative variables A numerical index ranging from Absolute size of the number indicates the strength Sign (positive or negative) indicates the direction of relationship Endpoints, stand for “perfect” correlations Strongest possible correlations Zero indicates no correlation

Figure 15.9 Strength and Direction of a Correlation Coefficient

Examining Relationships Among Variables (8 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables “Which correlation is stronger, +.20 or +.70?” The latter is stronger because +.70 is farther away from zero. Which of these correlations is stronger, +.20 or −.70? The latter because −.70 is farther away from zero. “Which correlation is stronger, +.50 or −.70?” The latter because −.70 is farther from zero. When judging the relative strength of two correlation coefficients, Ignore the sign and determine which number is farther from zero.

Examining Relationships Among Variables (9 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Negative correlation Correlation in which values of two variables tend to move in opposite directions Example The more hours students spend partying the night before an exam, the lower their test grades tend to be Positive correlation Correlation in which values of two variables tend to move in the same direction Example The more hours students spend studying for a test, the higher their test grades tend to be.

Examining Relationships Among Variables (10 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Checkpoint questions “Is the correlation between education and income positively or negatively correlated?” Positive because the two variables tend to move in the same direction “Is the correlation between empathy and aggression positive or negative?” Negative because people with more empathy tend to be less aggressive

Examining Relationships Among Variables (11 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Scatterplots are used to visually determine the direction of correlations. Figure 15.6 - scatterplot of college G P A and starting salary As college G P A increases, starting salary also tends to increase. The correlation coefficient is +.6 1. A moderately strong positive correlation Figure 15.7 - scatterplot of days missed during college and starting salary As the number of days missed during college increases, starting salary tends to decrease Correlation coefficient is − .8 1 A strong negative correlation

Figure 15.10 Correlations of Different Strengths and Directions

Examining Relationships Among Variables (12 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Pearson correlation coefficient Works only if your data are linearly related Curvilinear relationship - a nonlinear (curved) relationship between two quantitative variables If you calculate the Pearson correlation coefficient on a curved relationship, It generally will tell you that your variables are not related. When in fact they are related. You would draw an incorrect conclusion about the relationship. Figure 15.11 A curvilinear relationship.

Exhibit 15.3 (1 of 7) How to Calculate the Pearson Correlation Coefficient Earlier we showed how to obtain z scores. A z score tells you how far a data value is from the mean of its variable Example A z score of +2.00 says that the score is two S D s above the mean A z score of −2.00 says the score is two S D s below the mean To use the following formula for calculating the correlation coefficient First convert your I V (X) and D V (Y) data values to z scores = z score of the value of the X or I V = z score of the value of the Y or D V n = number of cases

Exhibit 15.3 (2 of 7) How to Calculate the Pearson Correlation Coefficient Positive relationship Some cases have low X and low Y values. Some have high X and high Y values. Pattern provides a positive value for the numerator of the formula. (a) Positive correlation

Exhibit 15.3 (3 of 7) How to Calculate the Pearson Correlation Coefficient Negative relationship Some cases have low X and high Y values. Some have high X and low Y values. Pattern provides a negative value for the numerator of the formula. (b) Negative correlation

Exhibit 15.3 (4 of 7) How to Calculate the Pearson Correlation Coefficient Researchers do not calculate correlation coefficients by hand these days. It is helpful to calculate the correlation coefficient once to get a better feel for how the numerical value is produced. Table showing how to calculate the correlation between two variables At the end of the chapter, we list a practice exercise where you can apply this procedure to obtain your own correlation coefficient.

Exhibit 15.3 (5 of 7) How to Calculate the Pearson Correlation Coefficient Step 1. Convert the X and Y variable scores to z scores. We already obtained the z scores for the X variable when we introduced the concept of z scores. Here are those z scores: ‒ 1.413, ‒ .707, 0, +.707, +1.413. Using that same procedure, here are the z scores for variable Y: ‒ 1.750, ‒ 343, .453, .453, 1.187. Step 2. Calculate the sum of the cross products of the z scores A three-column procedure works well for this step: Step 3. Divide the sum of the third column by the number of cases (i.e., n ).

Exhibit 15.3 (6 of 7) How to Calculate the Pearson Correlation Coefficient

Exhibit 15.3 (7 of 7) How to Calculate the Pearson Correlation Coefficient Correlation between hours spent studying (X) and test grades (Y) is +.943 The two variables are very strongly correlated. As the number of hours spent studying increases, so do test grades.

Examining Relationships Among Variables (13 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Partial Correlation Coefficient The correlation between two quantitative variables controlling for one or more variables Widely used in areas of psychology where the use of experiments for some research questions is difficult e.g., Personality, social, and developmental psychology Good, strong theory is required to use partial correlation analysis. The researcher must know the variable(s) that he or she needs to control for.

Examining Relationships Among Variables (14 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Partial correlation example In applied social psychology The relationship between The number of hours spent viewing or playing violence The number of aggressive acts performed Want to control for variables such as Personality type School grades Exposure to violence in the family Exposure to violence in the neighborhood Built on Bandura, Ross, & Ross (1963) Classic experimental research showing that children act aggressively after being exposed to an adult model acting aggressively

Examining Relationships Among Variables (15 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Value of the partial correlation coefficient indicates The strength and direction of relationship between two variables After controlling for the influence of one or more other variables Just like with the Pearson correlation coefficient Partial correlation coefficient has a range of to where Zero indicates there is no relationship Sign indicates the direction of the relationship Key difference Partial correlation coefficient indicates the linear relationship between two variables After controlling for another variable

Examining Relationships Among Variables (16 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Researchers use statistical programs to calculate partial correlation coefficients. If you are curious how to calculate the partial correlation coefficient (or the regression coefficients discussed in the next section), we recommend Cohen, Cohen, West, and Aiken (2003) Keith (2019) Called “partial” correlation coefficient because the technique statistically removes or “partials” out the influence of the other variables statistically controlled for

Examining Relationships Among Variables (17 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Regression Analysis When all variables are quantitative, the technique called regression analysis is often appropriate. Regression analysis - use of one or more quantitative I V s to explain or predict the values of a single quantitative D V Two main types of regression analysis Simple regression - regression analysis with one D V and one I V Multiple regression - regression analysis with one D V and two or more I V s Regression equation - the equation that defines a regression line Regression line - the line of “best fit” based on a regression equation Regression analysis can be used with curvilinear data. We only discuss linear relationships in this text.

Figure 15.12 Regression Line Showing the Relationship Between G P A and Starting Salary

Examining Relationships Among Variables (18 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Figure 15.12 - scatterplot of college G P A and starting salary with the regression line inserted Two important characteristics of a line Slope - tells you how steep the line is Y-intercept - the point at which a regression line crosses the Y (vertical) axis Regression equation (called Y-hat) is the predicted value of the D V is the Y-intercept is the slope (it’s called the regression coefficient) is the single I V

Examining Relationships Among Variables (19 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Regression equation for the regression line shown in Figure 15.12 D V (Y) is starting salary I V is G P A Researchers rarely, if ever, calculate the regression equation by hand! Y-intercept is $9,405.55; this is the predicted starting salary if a person had a G P A of 0.

Examining Relationships Among Variables (20 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Regression coefficient - the slope or change in Y given a one unit change in X Regression coefficient or slope in our example is $7,687.48. Starting salary is expected to increase by $7,687.48 for every one unit increase in G P A. Or decrease by $7,687.48 for every one unit decrease in G P A Example A student with a 3 on the G P A variable (i.e., a B) is predicted to start at a salary of $7,687 more than a student with a 2 (i.e., a C). Used the traditional grading scale (A = 4, B = 3, C = 2, D = 1, F = 0)

Examining Relationships Among Variables (21 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Regression equation can be used to obtain predicted values for the D V for specific values of the I V. Example Let’s see what the predicted starting salary is for A student with a college G P A of 3 (i.e., a B average) Expected starting salary is $32,467.99 Someone with a C average (i.e., a G P A value of 2) Insert a 2 into the equation and solve it Predicted starting salary is $24,780.51 Notice that the difference between the starting salary for someone with a C and a B is equal to the value of the regression coefficient. $32,467.99 − $24,780.51 = $7,687.48

Examining Relationships Among Variables (22 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Multiple regression Multiple regression equation includes one regression coefficient for each I V. Useful difference between the simple and multiple regression Multiple regression coefficient shows the relationship between the D V and the I V controlling for the other I V s in the equation. Analogous to the idea discussed earlier with partial correlation Multiple regression coefficient is called the partial regression coefficient.

Examining Relationships Among Variables (23 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Useful difference between the simple and multiple regression Simple regression analogous to a Pearson correlation which does not control for any confounding variables Multiple regression provides one way that you can control for one or more variables. The difference in the actual values of the correlation and regular (unstandardized) regression coefficients Correlation coefficients are in standardized units that vary from to Regular regression coefficients are in natural units

Examining Relationships Among Variables (24 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Multiple Regression example The partial correlation coefficient expressing the relationship between starting salary and G P A controlling for S A T scores is .413 The partial regression coefficient is $4,788.90 Controlling for S A T scores, each unit change in G P A is predicted to lead to a $4,788.90 change in income. Using the data from our hypothetical college student data set, we used S P S S to provide the following multiple regression equation. Based on the D V of starting salary and the I Vs of G P A and high school S A T = G P A = high school S A T

Examining Relationships Among Variables (25 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables First partial regression coefficient in the preceding regression equation is $4,788.90 After controlling for S A T scores, starting salary increases by $4,788.90 for each one-unit increase in G P A Second partial regression coefficient is $25.56. After controlling for G P A, starting salary increases by $25.56 for each one-unit increase in S A T

Examining Relationships Among Variables (26 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Obtain a predicted starting salary using our multiple regression equation by inserting the values for G P A and S A T and solve for Y-hat. B student 1100 on S A. T Predicted starting salary is $30,047.11.

Examining Relationships Among Variables (27 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Contingency Tables A categorical D V and a categorical I V Construct a contingency table (also called cross-tabulation) Contingency table- table used to examine the relationship between categorical variables Two-dimension contingency table Two variables Rows represent the categories of one of the variables. Columns represent the categories of the other variable. Various types of information can be placed into the cells of a contingency table. Cell frequencies Cell percentages Row percentages Column percentages

Table 15.5 Personality Type by Gender Contingency Tables Blank Blank Gender Female Gender Male Personality Type A 2,972 2,460 Type Type B 1,921 971 Blank Blank 4,893 3,431 (a) Contingency Table Showing Cell Frequencies (hypothetical data) Blank Blank Gender Female Gender Male Personality Type A 60.7% 71.7% Type Type B 39.3% 28.3% Blank Blank 100% 100% (b) Contingency Table Showing Column Percentages (based on the data in part (a)

Examining Relationships Among Variables (28 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Contingency Tables Column variable is gender (i.e., female or male) Row variable is personality type Type- A personality Likely to be impatient, competitive, irritable, high achieving, engage in multitasking, and feel a sense of urgency Type-B personality Likely to be cooperative, less competitive, more relaxed, more patient, more satisfied, and easygoing Research question Whether there is a relationship between gender and personality types Does gender seem to predict personality type? Do you think that women tend to be type A more than men tend to be type A? Very difficult to determine how the two variables are related based on cell frequencies alone

Examining Relationships Among Variables (29 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Table 15.5(b) Calculated what are called column percentages for females and males Type A personality column percentages 60.7% of females were type A. 71.7% of the men were type A. Men had a greater rate of type-A personality than women. Type-B personality column percentages 39.3% for females 28.3% for men Women have a higher rate of type-B personality than men.

Examining Relationships Among Variables (30 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables We recommend that you Make your predictor variable ( I V) the column variable and your D V the row variable Calculate column percentages and compare the rates across the rows In order to correctly read a contingency table, you need to remember these two simple rules. If the percentages are calculated down the columns, then compare across the rows. If the percentages are calculated across the rows, then compare down the columns.

Examining Relationships Among Variables (31 of 31) 15.6 Summarize the Techniques Used to Determine Relationships Among Variables Rates are frequently reported in The news Some types of research (e.g., epidemiology) More advanced research Add another (a third) I V Construct the two-way table.

Copyright This work is protected by United States copyright laws and is provided solely for the use of instructors in teaching their courses and assessing student learning. Dissemination or sale of any part of this work (including on the World Wide Web) will destroy the integrity of the work and is not permitted. The work and materials from it should never be made available to students except by instructors using the accompanying text in their classes. All recipients of this work are expected to abide by these restrictions and to honor the intended pedagogical purposes and the needs of other instructors who rely on these materials.
Tags