correlation and regression

9,605 views 15 slides Sep 02, 2017
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

This power point presentation is about correlation and coefficient and the live example of it.


Slide Content

SUBJECT – Maths TOPIC – Correlation and regression BY - keyur tejani

INTRODUCTION TO CORRELATION The study which involves two variables, if we find that the change in value of one variable bring simultaneous change in value of another variable, then we say that variables are Correlated with each other. The measure of correlation is called Correlation Coefficient. With the help of Correlation Coefficient, we find:-1. The Direction of Relation.2. The Degree of Linear Relation among the variables.

The term "correlation" refers to a mutual relationship or association between quantities. In almost any business, it is useful to express one quantity in terms of its relationship with others. For example, sales might increase when the marketing department spends more on TV advertisements, or a customer’s average purchase account on an e-commerce website might depend on a number of factors related to that customer. Often, correlation is the first step to understanding these relationships and subsequently building better business and statistical models. The correlation expresses rates between the groups of items but not between the individual items. The relationship between two variables is not functional. The coefficient of correlation is denoted by r.

DIFFERENT TYPES OF CORRELATION 1. SIMPLE CORRELATION :- Simple correlation is defined as a variation related amongst any two variables. Eg :- Income & Expenditure... 2. MULTIPLE CORRELATION :- The correlation among three or more variable is called Multiple Correlation. Eg :- Production of Rice Amount of Rainfall Average daily temperature 3. PARTIAL CORRELATION :- Correlation between two variables where three or more variables are included is called Partial Correlation. Eg .:- Correlation between Production of Rice and Amount of Rainfall after removing the effect of third variable as Average Daily Temperature...... 4. LINEAR CORRELATION :- If the relation between x and y is expressed as y= a+bx or if values of x and y are close to straight line in graph, it is known as Linear Correlation. Eg .:- y= 3x+5... 5. NON LINEAR CORRELATION :- When the amount of change in one variable is not in a constant ratio to the change in the other variable, we say that the correlation is non linear.

SCATTER DIAGRAM METHOD Type of Scatter Diagram 1, Perfect Positive Correlation 2. Partial Positive Correlation 3, Perfect Negative Correlation 4. Partial Negative Correlation

Limitations of a Scatter Diagram   The following are a few limitations of a scatter diagram Scatter diagrams are unable to give you the exact extent of correlation. Scatter diagram does not show you the quantitative measure of the relationship between the variable. It only shows the quantitative expression of the quantitative change. This chart does not show you the relationship for more than two variables. Benefits of a Scatter Diagram   The following are a few advantages of a scatter diagram It shows the relationship between two variables. It is the best method to show you a non-linear pattern. The range of data flow, i.e. maximum and minimum value, can be easily determined. Observation and reading is straightforward. Plotting the diagram is relatively simple.

Merits and Demerits of Pearson’s method of studying correlation   Merits: This method indicates the presence or absence of correlation between two variables and gives the exact degree of their correlation. 2. In this method, we can also ascertain the direction of the correlation; positive, or negative. 3. This method has many algebraic properties for which the calculation of co-efficient of correlation, and other related factors, are made easy. Demerits: It is more difficult to calculate than other methods of calculations. 2. It is much affected by the values of the extreme items. 3. It is based on a many assumptions, such as: linear relationship, cause and effect relationship etc. which may not always hold good. 4. It is very much likely to be misinterpreted in case of homogeneous data.

P ROPERTIES OF CORRELATION COEFFICIENT   The correlation coefficient is symmetrical with respect to X and Y i.e. rXY = rYX . The correlation coefficient is the geometric mean of the two regression coefficients . The correlation coefficient is independent of origin and unit of measurement i.e. rXY = rUV . The correlation coefficient lies between −1 and +1. i.e. −1⩽ r ⩽+1 . Co-efficient of correlation measures only linear correlation between X and Y . If two variables X and Y are independent, coefficient of correlation between them will be zero.  

PROBABLE ERROR:-     Probable Error is used to check significance of correlation in population based on sample correlation coefficient. It can be defined as, “ Measure of the error of estimate for a sample from a normal distribution, it is computed by multiplying the standard error with 0.6745”....... ** r<PE, then correlation is not significant in population. **r<6(PE), then correlaton is significant in population allso and its expected range is given by R+- PE...  

INTRODUCTION TO REGRESSION Regression is a statistical technique to determine the linear relationship between two or more variables. Regression is primarily used for prediction and causal inference.   It is important to recognize that regression analysis is fundamentally different from ascertaining the correlations among different variables. Correlation determines the strength of the relationship between variables, while regression attempts to describe that relationship between these variables in more detail. LINES OF REGRESSION A line of regression is a line which gives the best estimate for the values of X for any given value of Y. There are two lines on Regression:- Line X on Y Line Y on X

Properties of Regression Coefficient   The correlation coefficient is the geometric mean The value of the coefficient of correlation cannot exceed unity i.e. 1. The sign of both the regression coefficients will be same , i.e. they will be either positive or negative. The coefficient of correlation will have the same sign as that of the regression coefficients. The average value of the two regression coefficients will be greater than the value of the correlation . Symbolically, it can be represented as The regression coefficients are independent of the change of origin, but not of the scale . By origin, we mean that there will be no effect on the regression coefficients if any constant is subtracted from the value of X and Y. By scale, we mean that if the value of X and Y is either multiplied or divided by some constant, then the regression coefficients will also change.  

  DIFFERENCE BETWEEN CORRELATION AND REGRESSION CORRELATION REGRESSION Correlation is a statistical measure which determines co-relationship or association of two variables Regression describes how an independent variable is numerically related to the dependent variable. *represent linear relationship between two variables. *Its usage is to fit a best line and estimate one variable on the basis of another variable.   *No difference in both the variables, rxy = ryx *Both variables are different, bxy╪byx *It is a relative measure *It is an absolute measure * It does not imply cause and effect relationship *It indicates the cause and effect relationship * It has limited application * It has wider applications

DATA REGARDING AREA AND PRODUCTIONOF FOOD GRAINS IN INDIA SINCE 1950 to 2010 Year Area(Million Hectare) Production (Million Tons) Year Area(Million Hectare) Production (Million Tons) 1950-51 30.81 20.58 1980-81 40.15 53.63 1951-52 29.83 21.3 1981-82 40.71 53.25 1952-53 29.97 22.9 1982-83 38.26 47.12 1953-54 31.29 28.21 1983-84 41.24 60.1 1954-55 30.77 25.22 1984-85 41.16 58.34 1955-56 31.52 27.56 1985-86 41.14 63.83 1956-57 32.28 29.04 1986-87 41.17 60.56 1957-58 32.3 25.53 1987-88 38.81 56.86 1958-59 33.17 30.85 1988-89 41.73 70.49 1959-60 33.82 31.68 1989-90 42.17 73.57 1960-61 34.13 34.58 1990-91 42.69 74.29 1961-62 34.69 35.66 1991-92 42.65 74.68 1962-63 35.69 33.21 1992-93 41.78 72.86 1963-64 35.81 37 1993-94 42.54 80.3 1964-65 36.46 39.31 1994-95 42.81 81.81 1965-66 35.47 30.59 1995-96 42.84 76.98 1966-67 35.25 30.44 1996-97 43.43 81.74 1967-68 36.44 37.61 1997-98 43.45 82.53 1968-69 36.97 39.76 1998-99 44.8 86.08 1969-70 37.68 40.43 1999-00 45.16 89.68 1970-71 37.59 42.22 2000-01 44.71 84.98 1971-72 37.76 43.07 2001-02 44.9 93.34 1972-73 36.69 39.24 2002-03 41.18 71.82 1973-74 38.29 44.05 2003-04 42.59 88.53 1974-75 37.89 39.58 2004-05 41.91 83.13 1975-76 39.48 48.74 2005-06 43.66 91.79 1976-77 38.51 41.92 2006-07 43.81 93.36 1977-78 40.28 52.67 2007-08 43.91 96.69 1978-79 40.48 53.77 2008-09 45.54 99.18 1979-80 39.42 42.33 2009-10* 41.85 89.13 2010-11** 36.95 80.41

SUMMARY OUTPUT Regression Statistics Multiple R 0.919982193 R Square 0.846367236 Adjusted R Square 0.843763291 Standard Error 9.351859842 Observations 61 ANOVA   df SS MS F Regression 1 28426.47365 28426.47365 325.0326655 Residual 59 5159.979668 87.4572825 Total 60 33586.45332     Significance F 1.12431E-25     Coefficients Standard Error t Stat P-value Intercept -138.0151497 10.84968124 -12.72066402 1.48215E-18 Area(Million Hectare) 5.002883065 0.277496077 18.02866233 1.12431E-25 Lower 95% Upper 95% Lower 95.0% Upper 95.0% -159.7253117 -116.3049877 -159.7253117 -116.3049877 4.447614698 5.558151432 4.447614698 5.558151432

THANK YOU
Tags