Biostats coorelation vs rREGRESSION.DIFFERENCE BETWEEN CORRELATION AND REGRESSION EQUATIONpptx
Payaamvohra1
89 views
13 slides
Dec 31, 2023
Slide 1 of 13
1
2
3
4
5
6
7
8
9
10
11
12
13
About This Presentation
CORRELATION
REGRESSION
BIOSTATISTICS
SEMESTER 8
M PHARMACY
CORRELATION VS REGRESSION
REGRESSION ANALYSIS
LINEAR AND MULTIPLE REGREISSIO
CORRELATION COEFFICIENT
Size: 118.86 MB
Language: en
Added: Dec 31, 2023
Slides: 13 pages
Slide Content
BIOSTATISTICS CORRELATION AND REGRESSION By Mr Payaam Vohra NIPER AIR 11 Gold Medalist in MU MET AIR 07 ICT MTECH SCORE RANK 01 CUET-PG AIR 01 IIT BHU AIR 08 GATE AND BITS HD QUALIFIED GPAT AIR 43
IV B.PHARMACY (BIO STATISTICS) X 2 - × Y 2 - ( X ) 2 ( Y ) 2 n n d × d X 2 2 Y Calculated Formula for Karl Pearson ' s Coefficient of Correlatio ( r ) X Y - X Y Correlation coefficient (r) = n 1 2 = d X d Y PROPERTIES OF CORRELATION: The limits of the Karl Pearson’s coefficient of correlation are 1 . i.e., -1 r 1. The coefficient of correlation is independent of change of origin and scale. Two independent random variables are uncorrelated but converse is not true. PROBLEMS ON CORRELATION Problem-21: The following data relate to the pod length and the number of seeds per pod are given below. Calculate the correlation coefficient for the following data. Pod’s length(cms) 4.5 4 5.2 4.6 5.2 5.2 4.3 4 4.5 5.5 No. of s e eds / p lant 5 5 6 6 6 7 4 4 5 6
d × d X Y 2 2 2.62 ×8.4 1 3 IV B.PHARMACY (BIO STATISTICS) SOLUTION: Pod’s length(X) No. of seeds /plant(Y) X 2 Y 2 XY 4.5 5 20.25 25 20.5 4 5 16 25 20 5.2 6 27.04 36 31.2 4.6 6 21.16 36 27.6 5.2 6 27.04 36 31.2 5.2 7 27.04 49 36.4 4.3 4 18.49 16 17.2 4 4 16 16 16 4.5 5 20.25 25 22.5 5.5 6 30.25 36 33 X = 47 Y = 54 X 2 = 223. 52 Y 2 = 300 X Y = 257. 6 ( X ) 2 ( 4 7 ) 2 d X 2 2 = X - = 223.52 - = 2.62 n ( Y ) 2 ( 5 4 ) 10 2 d Y 2 2 = Y - = 300 - = 8.4 n 10 d d = XY - X Y = 257.6 - 47×54 = 3.8 X Y n 1 XY - X Y Correlation coefficient (r) = n X 2 - ( X ) × Y 2 - ( Y ) 2 2 n n = d X d Y = 3 . 8 = 0. 8 1 since, r > then the given data is positive correlation
IV B.PHARMACY (BIO STATISTICS) d × d X 2 2 Y 160 358 Problem-22: Calculate the correlation coefficient between X and Y from the following data. X 5 9 13 17 21 Y 12 20 25 33 35 SOLUTION: X Y X 2 Y 2 XY 5 12 25 144 60 9 20 81 400 180 13 25 169 625 325 17 33 289 1089 561 21 35 441 1225 735 X = 65 Y = 1 2 5 X 2 = 1 00 5 Y 2 = 34 8 3 X Y = 1 8 61 ( X ) 2 ( 6 5 ) 2 d X 2 = X 2 - = 1005 - = 160 n 5 ( Y ) 2 ( 12 5 ) 2 d Y 2 = Y 2 - = 3483 - = 3 58 n 5 d d = X Y - X Y = 18 6 1 - 65 × 1 25 = 236 X Y n 5 X Y - X Y Correlation coefficient (r) = n X 2 - ( X ) × Y 2 - ( Y ) 2 2 n n = d X d Y = 236 = . 9 861 since, r > then the given data is positive correlation 30
IV B.PHARMACY (BIO STATISTICS) r 12 2 + r 13 2 2 r 12 r 13 r 23 1 r 2 23 Problem-23: Calculate the correlation coefficient between height of fathers and daughters both from the following Anuragian Family members.(Homework) 1 5 Heights of f a t h e r 64 65 66 67 68 69 70 Heights of d au g h t e r s 66 67 68 69 70 71 72 MULTIPLE CORRELATION The study of quantitative assessment of the magnitude and direction of correlation between a given variable and the joint influence of two or more variables is called multiple correlation. 1. 2 3 R = T h e s q u a r ed v a l ue o f mul t i p le corre l a t ion Coef f i c i e n ts ( R 2 1 . 2 3 ) is c a l l e d t h e coefficient of determination. Problem-24: the product moment r scores (r 12 ) between gill weights(X 1 ) and body weight (X 2 ) was found to be 0.80 in a sample of 33 fishes, the r scores (r 13 ) between their gill weights (X 1 ) and body length (X 3 ) amounted to 0.20, while the r scores (r 23 ) between their body weight (X 2 ) and body length (X 3 ) was found to be 0.30. Find if there is significant multiple linear correlation of (X 2 ) and (X 3 ) [ = 0.05 ] SOLUTION:
IV B.PHARMACY (BIO STATISTICS) UNIT-II CURVE FITTING Suppose that a data is given in the two variables x and y. The problem of finding an analytical expression of the form y = f(x) which fits the given data is called curve fitting. (Or) Curve Fitting means an exact relationship between two variables by algebraic equations. This relationship is equation of the curve. Curve Fitting means to form an equation of the curve from the given data. Principle of Least square technique: “The sum of the squares of the differences between observed values and expected values should be minimum” is called Residual Error. i. e . , E = y f ( x ) 2 is mi n i m um. The method of least squares aims at minimizing the value of the error E. REGRESSION Regression is used to denote estimation or prediction of the average value of one variable for a specified value of the other variable. One of the variables is called independent variable or the explained variable and the other is called dependent variable or explaining variable. Definition of Regression: Regression is the Measure of the Average relationship between two or more variables in terms of the original units of the data. 33
IV B.PHARMACY (BIO STATISTICS) LINES OF REGRESSION: Lines of Regression are the line which gives the best estimate of the value of one variable for any given value of the other variable. In case of two variables X and Y. we shall have two lines of Regression. Regression line Y on X Regression line X on Y Regression Line Y on X: The Regression equation or form of the line Y on X is Y = a + bX. Where, ‘Y’ is dependent variable, ‘X’ is independent variable, the values of ‘a’ and ‘b’ are unknown constants. Regression Line X on Y: The Regression equation or form of the line Y on X is X = a + bY. Where, ‘X’ is dependent variable, ‘Y’ is independent variable, the values of ‘a’ and ‘b’ are unknown constants. There are two types of Regression equations: 1. R e g r e s sion e qua t ion of X on Y i s ( x x ) = b xy ( y y ) wh e re : x = v a l ue o f x x = Mean of x y = value of y y = Mean of y b = r x d d xy b x y y = x y y x = s t a n da r d d ev i a t i on of x s e r i e s y = s t a n d ar d d ev i ati on of y ser i es d 2 1 7
b = r IV B.PHARMACY (BIO STATISTICS) 2. Reg r e s s ion equation of Y on X is ( y y ) = b y x ( x x ) wh e re : x = v a l ue of x x = Mean of x y = value of y y = Mea n of y b y d d yx x y x d 2 = x y x x = s t a n da r d d eviation of x series y = s tan d a rd d ev i a t ion o f y s e r i e s PROPERTIES OF REGRESSION COEFFICIENT: PRO P E R T I E S O F R EG R E S S I O N COEFF I C I E NT : 1. The correlation coefficient is Geometric mean of two regression coefficients. i . e . , r = b Y X × b XY 2. The Arithmetic mean of the regression coefficient is grater than or equal to correlation coefficient (or) The average of two regression coefficients will also be greater than the correlation coefficient. i . e . , b YX + b X Y r 2 As the coefficient of correlation cannot exceed one, in case of regression one of the regression coefficient is greater than one than the other must be less than one. Both the regression coefficient will have the same sign either positive (or) negative. If one regression coefficient is positive, then the other should also be positive and vice versa. 6.Regression coefficients are independent change of origin, but not of a scale. USES OF REGRESSION ANALYSIS: The Regression analysis technique is very useful in predicting the probable value of an unknown variable in response to some known related variable. The Regression device is useful in establishing the nature of relationship between the two variables. Regression analysis is extensively used for measurement and estimating the relationship among variables. Regression analysis provides Regression coefficient which are generally used in calculation of correlation coefficient. 1 8
IV B.PHARMACY (BIO STATISTICS) CORRELATION ANALYSIS “V/S” REGRESSION ANALYSIS 1 9 CORRELATION ANALYSIS REGRESSION ANALYSIS 1. Correlation analysis attempts to determine the degree of relationship between the two variables. 1. Regression analysis is a mathematical measure of average relationship between two or more variables in terms of original units of the data. 2. The Correlation analysis tests the closeness of the variable. 2. Regression analysis measures extent of change in dependent variable due to change in independent variable. 3. In Correlation analysis, the casual relationship in variables moving in the same direction (or) opposite direction is studied. 3. In Regression analysis the study is made by taking into consideration the cause-and-effect relationship between two variables. 4. In Correlation, there is a chance of nonsense correlation between the two variables. 4.In Regression, there is no chance of exi s t e n c e o f such t y p e of relation between two variables. 5. Correlation Coefficient is independent of change of origin and scale. 5. Regression Coefficients are independent of change of origin but not a scale.
Y ( 1 r 2 ) x 2 ( x ) 2 n y 2 ( y ) 2 n d 2 y IV B.PHARMACY (BIO STATISTICS) S tan dard Error of Estimate ( or ) Re gression : S tan dard Error of Estimate of X is given by s X = X ( 1 r ) 2 S tan dard Error of Estimate of Y is given by 1 10 s Y = Wh e r e X = = d 2 x = = Y PROBLEMS ON REGRESSION PROBLEM-1: Height and weight are recorded for 10 students. The results are given below. ( i ii ) Calculate correlation coefficient and test the level of significance. Obtain regression equation for X on Y & Y on X.and also Calculate regression coefficient and test the level of significance. Calculate Standard Error of Estimate (or) Regression. HEIGHT 62 72 78 58 65 70 66 63 60 72 WEIGHT 50 65 63 50 54 60 61 55 54 65 SOLUTION: Height(x) Weight(y) x 2 y 2 xy 62 50 3844 2500 3100 72 65 5184 4225 4680 78 63 6084 3969 4914 58 50 3364 2500 2900 65 54 4225 2916 3510 70 60 4900 3600 4200 66 61 4356 3721 4026 63 55 3969 3025 3465 60 54 3600 2916 3240 72 65 5184 4225 4680 x = 6 66 y = 5 77 x 2 = 4 4 710 y 2 = 3 3 5 97 xy = 3 8 7 15
IV B.PHARMACY (BIO STATISTICS) 35 4 .4 304.1 1 r 2 Now we ha v e t o f i n d ( x ) 2 ( 66 6 ) 2 ( ) 2 2 i d = x - = 44710- = 354 . 4 x n 1 ( y ) 2 ( 57 7 ) 2 d y 2 = y 2 - = 33597 - = 304 . 1 n 10 d d = x y - x y = 38715 - 666 × 5 77 = 286.8 x y n 10 xy - x y Correlation coefficient (r) = n x 2 - ( x ) × y 2 - ( y ) 2 2 n n d x 2 × d y 2 = d x d y = = 0.8736 286.8 since, r > then the given data is positive correlation SignificanceTest : t C al = r n 2 ~ t ( n 2 ) d . f = 0.8736 10 2 1 ( . 873 6 ) 2 = 5 . 0774 The tabulated value at 5% level of significance with ( n 2) = 8 d . f is 2.31 Inference : if t Cal > t Ta b then we Re ject Null hypothesis ( Accept Alternative hypothesis ) i . e ., 5.0744 > 2 . 31 There is Significant between the Height & Weight 1 11
IV B.PHARMACY (BIO STATISTICS) ( ii ) N o w we ha v e to f i nd wh e re : x = Mean of x = 666 = 66.6 10 y = Mean of y = 577 = 57 . 7 10 d x d y = 286 . 8 = . 9431 b = xy d y 2 304.1 yx b = d x 2 d x d y = 286 . 8 = . 8093 354 . 4 Regression equation of X on Y is ( x x ) = b x y ( y y ) ( x x ) = b x y ( y y ) ( x 6 6. 6 ) = 0.9431 ( y 5 7 .7 ) = 0.9431 y 54.4175 = 0.9431 y 54.4175 + 66.6 x = 12.1875 + . 9431 y Regression equation of Y on X is ( y y ) = b yx ( x x ) ( y y ) = b yx ( x x ) ( y 5 7 . 7 ) = 0.8093 ( x 6 6 . 6 ) = 0.8093 x 53.8994 = 0.8093 x 53.8994 + 57.7 y = 3.8006 + 0.8093 x 1 12
IV B.PHARMACY (BIO STATISTICS) 2 d d ) 2 1 n 2 d y ( x y d x 2 x d 2 x 2 ( x ) 2 n x y 2 n 286.8 Si g n i f i c a n c e Test : N o w , we have to f ind S . E of b yx = 1 304. 1 ( 286 . 8 ) 2 1 2 354.4 t C a l = 354 . 4 = . 1596 = . 8093 = 5.0708 . 1596 The tabulated value at 5% l . o . s with ( n 2 ) d . f = 8 d . f at 2.31 Inference : If t Cal > t Tab then Re ject Null Hypothesi ( Accept Altrenative Hypothesis ) i . e ., 5.0708 > 2.31 T h e r e g r e s sion C o e f f c i e nt i s S i gni f ic a nt ( iii ) W he re X = = d 2 = 354 . 4 = 18 . 82 ( y ) 2 = 40 2 = = 16.93 = Y d y S tan dard Er r or of Estimate ( o r ) Re gres s i on : ( 1 ) S ta n da r d E rr o r of E st i m a te of X i s g i ven by s X = X ( 1 r ) = 2 18 . 82 ( 1 ( 0.873 6 ) 2 ) = 2 .1112 ( 2 ) S tan dard Error of Estimate of Y is given by s Y = Y ( 1 r 2 ) = 16 . 9 3 ( 1 ( 0.873 6 ) 2 ) = 2 . 0024