Module 4-CORRELATION REGRESSION.pptx vvvb

manangupta10c 0 views 46 slides Oct 13, 2025
Slide 1
Slide 1 of 46
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46

About This Presentation

Jwnb snsn


Slide Content

1

2 Correlation and Regression This section is focused on correlation and regression. What is Correlation? Two or more variables considered to be related, in a statistical context, if their values change so that as the value of one variable increases or decreases so does the value of the other variable (although it may be in the opposite direction). For example, for the two variables "hours worked" and "income earned" there is a relationship between the two if the increase in hours worked is associated with an increase in income earned. If we consider the two variables "price" and "purchasing power", as the price of goods increases a person's ability to buy these goods decreases (assuming a constant income). Correlation is a statistical measure (expressed as a number) that describes the size and direction of a relationship between two or more variables. A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable.

The word correlation is used in everyday life to denote some form of association. We might say that we have noticed a correlation between foggy days and attacks of wheeziness. However, in statistical terms we use correlation to denote association between two quantitative variables. We also assume that the association is linear, that one variable increases or decreases a fixed amount for a unit increase or decrease in the other. The other technique that is often used in these circumstances is regression, which involves estimating the best straight line to summarise the association. Correlation and Regression 3 What is Correlation? (continued)

A positive (or direct) correlation refers to the same direction of change in the values of variables. In other words, if values of variables are varying (i.e., increasing or decreasing) in the same direction, then such correlation is referred to as positive correlation. A negative (or inverse) correlation refers to the change in the values of variables in opposite direction. Correlation and Regression 4 What is Correlation? (continued)

Scatter Diagram The scatter diagram method is a quick at-a-glance method of determining of an apparent relationship between two variables, if any. A scatter diagram (or a graph) can be obtained on a graph paper by plotting observed (or known) pairs of values of variables x and y, taking the independent variable values on the x-axis and the dependent variable values on the y-axis. Correlation and Regression 5

scatter plot (X-Y graph)(continued) The scatter diagram graphs pairs of numerical data, with one variable on each axis, to look for a relationship between them. If the variables are correlated, the points will fall along a line or curve. The better the correlation, the tighter the points will hug the line. This cause analysis tool is considered one of the seven basic quality tools. WHEN TO USE A SCATTER DIAGRAM When you have paired numerical data When your dependent variable may have multiple values for each value of your independent variable When trying to determine whether the two variables are related, such as: When trying to identify potential root causes of problems Correlation and Regression 6

Correlation and Regression Straight line regression line Not a Straight line regression line 7 This slide discusses the meaning of positive, negative, and no correlation .

Correlation and Regression A linear correlation implies a constant change in one of the variable values with respect to a change in the corresponding values of another variable. In non-linear , there is no linear relationship. 8 This slide discusses the meaning of non-linear correlation, positive correlation. .

Correlation Coefficient: Degree of association is measured by a correlation coefficient, denoted by r. It is sometimes called Pearson’s correlation coefficient after its originator and is a measure of linear association. Karl Pearson Coefficient of correlation is given by the following formula: Correlation and Regression 9 The application of the formula has been discussed in slides in 11-16.

Correlation and Regression 10

Correlation and Regression Solution Given on next page 11

Correlation and Regression Production is taken as x variable, Number of unemployed is taken as y variable. 12

Correlation and Regression 13

Karl Pearson’s coefficient of Correlation (Grouped and Ungrouped) Solution Given on next page 14

Karl Pearson’s coefficient of Correlation (Grouped and Ungrouped) 15

Spearman's Rank Correlation Coefficient This method of finding the correlation coefficient between two variables was developed by the British psychologist Charles Edward Spearman in 1904. This method is applied to measure the association between two variables when only ordinal (or rank) data are available. In other words, this method is applied in a situation in which quantitative measure of certain qualitative factors such as judgement, brands personalities, TV programmes , leadership, colour , taste, cannot be fixed, but individual observations can be arranged in a definite order. This method involves developing rank of variables. 16 Spearman's Rank Correlation Coefficient

17 With the help of rank correlation, you can find an association between two distinguishing traits. The rank correlation coefficient assesses the significance of the relationship between two rankings by measuring the similarities between them. With the help of rank correlation, you can find an association between two distinguishing traits. There are two possible scenarios: Rank Correlation using not Repeated Ranks Rank Correlation using Repeated Ranks Not repeated cases rankings are easily applied. It is challenging to assign rankings to two or more items with the same value (i.e., a tie). In these circumstances, the objects are assigned an average of the ranks they would have obtained. For example, if two people are ranked equal in the seventh place, they are given the rank [7+8] / 2 = 7.5 each, which is a common rank to be assigned, and the next rank will be 9. If three people are ranked equal in the seventh place, they are given the rank [7+ 8 +9] /3 = 8 each, which is a common rank to be assigned, and the next rank will be 10. Spearman's Rank Correlation Coefficient

18 With the help of rank correlation, you can find an association between two distinguishing traits. The rank correlation coefficient assesses the significance of the relationship between two rankings by measuring the similarities between them. With the help of rank correlation, you can find an association between two distinguishing traits. There are two possible scenarios: Rank Correlation using not Repeated Ranks Rank Correlation using Repeated Ranks Not repeated cases rankings are easily applied. It is challenging to assign rankings to two or more items with the same value (i.e., a tie). In these circumstances, the objects are assigned an average of the ranks they would have obtained. Spearman's Rank Correlation Coefficient

19 For example, if two people are ranked equal in the seventh place, they are given the rank [7+8] / 2 = 7.5 each, which is a common rank to be assigned, and the next rank will be 9. If three people are ranked equal in the seventh place, they are given the rank [7+ 8 +9] /3 = 8 each, which is a common rank to be assigned, and the next rank will be 10. Spearman's Rank Correlation Coefficient

Spearman's Rank Correlation Coefficient There are two cases, one in which ranks are repeated , and where ranks are not repeated. There are two formulas: Rank is Repeated Rank is Not-Repeated Given below is the formula (formula 1) for when rank is not repeated. - formula 1 (when rank is not repeated) 20

Spearman's Rank Correlation Coefficient Given below is the formula for the case when rank is repeated (formula2). formula 2 (when rank is repeated) 21

Spearman's Rank Correlation Coefficient Find Spearman’s Correlation Coefficient for the following data: x 12 17 22 27 31 y 113 119 117 115 121 This example is based on formula 1 (when rank is not repeated). 22

x y R1 R2 d = R1 – R2 d 2 12 113 1 1 17 119 2 4 -2 4 22 117 3 3 27 115 4 2 2 4 31 121 5 5 = 8 R1 and R2 are ranks of X & Y respectively. How do we calculate Rank: Either go with ascending Order, or Descending Order. Here, we are going for ascending order for X & Y. For example for X column, 12 is the smallest number , it will have a rank 1, 17 is the next higher number will have a rank of 2. This process is followed till all the elements of X are ranked. Similarly ranks are applied for Y i.e. R2. Spearman's Rank Correlation Coefficient Find Spearman's Rank Correlation Coefficient for the data given on the right. 23

In the table in the previous slides none of the ranks are repeated, so we apply ranks not repeated formula Spearman's Rank Correlation Coefficient R Inference 0.1< R <0.29 low Correlation 0.3<R<0.49 moderate Correlation 0.5<R<0.99 High 1 perfect 24

x y R1 R2 d = R1 – R2 d 2 10 15 1 1 12 19 2 2 18 25 5.5 4 1.5 2.25 18 30 5.5 6.5 -1 1 15 25 3 4 -1 1 17 25 4 4 40 30 7 6.5 0.5 0.25 = 4.5 Spearman's Rank Correlation Coefficient (Rank Repeated Case) (example for formula2) Find Spearman's Rank Correlation Coefficient for the data given on the right. As discussed earlier, we begin by calculating Rank R1. Position 1, 2,3,4 is assigned to numbers 10, 12, 15, and 17 respectively. However, position 5 and 6 can be given to two 18’s, which can occupy rank 5 and 6. So, we take average (5+6)/2 = 5.5. So, the value of m1 is 2, as the number 18 is repeated twice. Next available position of 7 is assigned to number 40. Now we calculate positions for rank R2. Position 1, 2 can be easily assigned to numbers 15, and 19 respectively. However, 25 is repeated thrice, available positions 3,4,5 can be assigned to the number 25. So we take average “(3 + 4+ 5)/3 = 4” is assigned to 25. The value of m2 is 3, as the number 25 is repeated thrice. Next, 30 is repeated twice, so it is going to be assigned positions average “(6+7)/2 = 6.5. The value of m3 is 2 , as the number 30 is repeated twice. 25

Spearman's Rank Correlation Coefficient In the formula given above, the numerator has the continuing term ……. Because, we do not know the number of repeated terms. = 0.866 R = 0.866 means that X and Y are strongly correlated. 26

27

28

Regression The regression is the statistical technique that expresses the relationship between two or more variables in the form of an equation to estimate the value of a variable, based on the given value of another variable, is called regression analysis. The variable whose value is estimated using the algebraic equation is called dependent (or response) variable and the variable whose value is used to estimate this value is called independent (regressor or predictor) variable. The linear algebraic equation used for expressing a dependent variable in terms of independent variable is called linear regression equation. 29

Regression Formulating a regression analysis helps you predict the effects of the independent variable on the dependent one. Example of regression (1): we can say that age and height can be described using a linear regression model. Since a person's height increases as age increases, they have a linear relationship. Example of regression (2): we can say that advertisement spend and company sales can be described using a linear regression model. Since the advertisement spend by a company increases , sales increases. they have a linear relationship. Figure shows how a regression equation is fitted between the points on a graph between dependent variable and independent variable. 30

Introduction to Concept of Regression Line The fundamental aim of regression analysis is to determine a regression equation (line). Regression Regression Equation Purpose of Regression Line y on x = a + bx is used for estimating the value of dependent variable y for given values of independent variable x. b = slope of regression line a = y-intercept when x = 0. y on x = c + dy is used for estimating the value of dependent variable x for given values of independent variable y. d = slope of regression line c = x-intercept when y = 0. Regression Equation Purpose of Regression Line y on x y on x 31

Introduction to Concept of Regression Line The fundamental aim of regression analysis is to determine a regression equation (line). Regression The regression coefficient ‘b’ is also denoted as: • b yx (regression coefficient of y on x) in the regression line, y = a + bx • b xy (regression coefficient of x on y) in the regression line, x = c + dy In the equation for regression line y on x (y = a +bx) , regression coefficient b = b yx . In the equation for regression line x on y (x = c + dy ) , regression coefficient d = b xy . 32

Regression Introduction to Concept of Regression Line 33

Regression 34

Regression Assumed Mean of x variable = 60; Assumed mean of y variable = 50 35

Regression   Calculating regression coefficient, 36

Regression Regression Coefficients in Terms of Correlation Coefficient . The regression coefficients - b xy and b yx can also be calculated using the following formula: b xy = r( s x / s y ) [x on y] b yx = r( s y / s x ) [y on x] In the above formulae, regression coefficients (b xy and b yx) are related to correlation coefficient (r) and standard deviations ( s y , s x ). s y , s x are the standard deviation of y and x, respectively. “r” is the Correlation coefficient. are mean values of variables y and x, respectively.   Regression Equation (y on x) y on x-> y is dependent variable , and x is independent variable. x on y -> x is dependent variable , and y is independent variable. Regression Equation (x on y) 37

Example: The General Sales Manager of Kiran Enterprises—an enterprise dealing in the sale of readymade men’s wear—is toying with the idea of increasing his sales to Rs 80,000. On checking the records of sales during the last 10 years, it was found that the annual sale proceeds and advertisement expenditure were highly correlated to the extent of 0.8. It was further noted that the annual average sale has been Rs 45,000 and annual average advertisement expenditure Rs 30,000, with a variance of Rs 1600 and Rs625 in sales and advertisement expenditure respectively. In view of the above, how much expenditure on advertisement would you suggest the General Sales Manager of the enterprise to incur to meet his target of sales? Regression Solution given on the next slide 38

Solution: Here we are trying to fit a regression line between advertisement expenditure, and annual sale. Regression equation is given by the formula given below: Regression Assume advertisement expenditure (y) as the dependent variable and sales (x) as the independent variable. Then the regression equation advertisement expenditure on sales is given by Regression coefficient (r) = 0.8; s y = 25; s x = 40; = 30000; = 45000 x = target sale = 80000. Plugging the values in equation: y – 45000) = Rs 47500   39

Regression Example: You are given the following information about advertising expenditure and sales : Advertisement (x) (Rs in lakh) Sales(x) (Rs in lakh) Arithmetic mean, 10 90 Standard deviation, 3 12 Advertisement (x) (Rs in lakh) Sales(x) (Rs in lakh) 10 90 3 12 Correlation coefficient = 0.8 Obtain the two regression equations. Find the likely sales when advertisement budget is Rs 15 lakh. What should be the advertisement budget if the company wants to attain sales target of Rs 120 lakh Solution given on the next slide 40

Regression Solution: (a) Regression equation of x on y is given by Given = 10, r = 0.8, σ x = 3, σ y = 12, = 90. Substituting these values in the above regression equation, we have x – 10 = 0.8 ( )(y – 90) or x = – 8 + 0.2y Regression equation of y on x is given by   y – 90 = 0.8 ( )(x – 10) or y = 58 + 3.2x   Solution given on the next slide(continued) 41

Regression Solution: (b) Substituting x = 15 in regression equation of y on x. The likely average sales volume would be y = 58 + 3.2 (15) = 58 + 48 = 106 Thus the likely sales for advertisement budget of Rs 15 lakh is Rs 106 lakh (c) Substituting y = 120 in the regression equation of x on y. The likely advertisement budget to attain desired sales target of Rs 120 lakh would be x = – 8 + 0.2 y = – 8 + 0.2 (120) = 16 Hence, the likely advertisement budget of Rs 16 lakh should be sufficient to attain the sales target of Rs 120 lakh. 42

Regression Example: In a partially destroyed laboratory record of an analysis of regression data, the following results only are legible : Variance of x = 9 Regression equations : 8x – 10y + 66 = 0 and 40x – 18y = 214. Find on the basis of the above information: Mean value of x and y, Coefficient of correlation between x and y, and Standard deviation of y Solution given on the next slide 43

Regression Solution: (a) Since two regression lines always intersect at a point ( x y , ) representing mean values of the variables involved, solving given regression equations to get the mean values x and y as shown below: 8x – 10y = – 66 40x – 18y = 214 Multiplying the first equation by 5 and subtracting from the second, we have 32y = 544 or y = 17, i.e. = 17 Substituting the value of y in the first equation, we get 8x – 10(17) = – 66 or x = 13, that is, = 13   44

Regression (b) To find correlation coefficient r between x and y, we need to determine the regression coefficients b xy and b yx . Rewriting the given regression equations in such a way that the coefficient of dependent variable is less than one at least in one equation. 8x – 10y = – 66 or 10 y = 66 + 8x or y= (66/10) + (8/10)x byx = (8/10) = 0.8 40x – 18y = 214 or 40x = 214 + 18y or x = (214/40) + (18/40)y bxy = (18/40) = 0.45 (c) To determine the standard deviation of y, consider the formula: 45

Regression The method of finding the regression coefficients bxy and byx would be little different than the method discussed earlier for the case when data set is grouped or classified into frequency distribution of either variable x or y or both. The values of bxy and byx shall be calculated using the formulae: where h = width of the class interval of sample data on x variable k = width of the class interval of sample data on y variable 46
Tags