P G STAT 531 Lecture 10 Regression

667 views 27 slides Jan 16, 2021
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

It will give idea about Simple linear Regression and Multiple Regression


Slide Content

Lecture 10
Regression
Dr. Ashish. C. Patel
Assistant Professor,
Dept. of Animal Genetics & Breeding,
Veterinary College, Anand
STAT-531
Data Analysis using Statistical Packages

•It is sometime very important to determine that
how the change in one variable brings about the
change in another variable. For examples:
Experiment Dependent/Response
Variable (Y)
Independent/Predictor
variable (X)
How alteration of air
temperature affects feed
intake?
Feed intake Air temperature
How increasing the protein
level in a feed intake affects
daily gain?
Daily gain Protein level
How the quality of concentrate
affects the milk yield in cow?
Milk yield Concentrate quality

•Inalltheexamplestherelationshipbetweenvariablescan
bedescribedwithfunction,afunctionoftemperatureto
describethefeedintake,afunctionofproteinlevelto
describethedailygain,orafunctionofconcentrateto
affectthemilkyield.
•Afunctionthatexplainssuchrelationshipiscalleda
regressionfunctionandanalysisofsuchproblemsand
estimationoftheregressionfunctioniscalledregression
analysis.
•Thusregressionanalysishelpsinpredictingtheresponse
variablegivensomepredictor/independentvariable.
•Inotherwords,regressionisthestudyoffunctional
relationshipbetweentwovariablesofwhichoneis
dependentandotherisindependent.Thedependent
variableisusuallydenotedby“Y”andindependent
variableis“X”.
•Regression:Itreferstothechangeindependentvariable
foraunitincreaseintheindependentone.

•Whenthechangeofthedependentvariableis
describedwithjustoneindependentvariableandthe
relationshipbetweentwoislinear,theapproximate
procedurearecalledsimplelinearregression.
•Multipleregressionproceduresareutilizedwhenthe
changeofadependentvariableisexplainedbychanges
oftwoormoreindependentvariables.
•ThistermwasintroducedbyFrancisGalton,whenhe
studiedtherelationshipbetweentheheightsoffather
andsons.
•Wemaybeinterestedinestimatingthevalueofanother
variablebygivingvalueofoneormorevariables.Thisis
donewiththehelpofregression.
•Theregressionistofindoutequationsofthelines(or
curves)withfunctionalrelationshipbetweentwo
variables,independent-xanddependent-y.

•Thesimplelinearregressionisoftheformy=a+bx,
where‘b’representstheslopeoftheline(alsocalled
asregressioncoefficient)and'a'theinterceptofthe
line.
•Inastudywheredataonageandweightofanimalsare
available,agecouldbeconsideredastheindependent
variable,whileweightasthedependentvariable.
•Itmeansthatweightregressesonage.
•Unlike the two correlation coefficients (r
xy
and r
yx
) that
have the same computational formula, the two
regression coefficients (b
yx
and b
xy
) are obtained by
different formulas.

Twomainapplicationsofregressionanalysisare:
•Estimationoffunctionofdependencybetweenvariables
•Predictionofvaluesofthedependentvariableusingnew
measurementoftheindependentvariables
Thesimpleregressionmodel
•Aregressionthatexplainsthechangeofadependentvariable
basedonchangesofoneindependentvariableiscalledasimple
linearregression
•Asimplelinearregressionmodel:y=a+bx+e,
Where
Y=dependentvariable(orResponsevariable)
X=independentvariable(orPredictorvariable)
e=randomerror,
A=Constant,y-interceptoftheline
B=Regressioncoefficient,slopeoftheline
•In the regression line y = a + bx, a and b are calculated by

•Assumptions of Simple linear regression
model:
•Linearityof the relationship between
dependent and independent variables
•Independenceof the errors (no serial
correlation)
•Homoscedasticity(constant variance) of the
errors
•Normalityof the error distribution

•Test of significance of regression coefficient (b
yx)
•We can test whether a sample regression coefficient
is significantly different from zero either by using a
“t” test or by F-test as described below:
•H
0
: b = 0
•H
a
: b ≠ 0

Propertiesofregressioncoefficients
•Thegeometricmeanofthetworegressioncoefficientsgivesthe
coefficientofcorrelation.Thus
????
=r
•Boththeregressioncoefficients(
??
and
??
)havethesamesign.
•Regressioncoefficientrangesfrom–αto+α.
•Theunitofregressioncoefficientisasofunitofdependent
variable
•Thenatureoftheregressioncoefficientsisreflectedonthenature
ofthecoefficientofcorrelation.Thus,ifb’sarepositive,therwill
bepositiveandifb’sarenegativethenrwillbenegative.
•Ifoneregressioncoefficientisgreaterthan1,theothermustbe
lessthan1.Thisisbecauser=
????
cannotexceed±1.
•Thearithmeticmeanoftworegressioncoefficientsiseitherequal
toorgreaterthanthecorrelationcoefficient.i.e.[(b
yx
+b
xy
)/2
]≥r.

•Regressioncoefficientsareindependentofchangeoforiginbut
notofscale.Thismeansthatadditionorsubtractionofa
constantvaluetoorfromtheoriginalvalueswouldnotchange
theregressioncoefficientsbutmultiplicationordivisionof
originalvaluesbyaconstantvaluewillchangethevaluesof
regressioncoefficients.Rxy=ryx
•Unlikecorrelationcoefficients,thetworegressioncoefficients
arenotequalexceptwhenthestandarddeviationsofXandY
aresame.b
yx
≠b
xy
UsesofRegression
•Itprovidesafunctionalrelationshipbetweentworelated
variableswiththehelpofwhichwecaneasilyestimateor
predicttheunknownvalueofonevariablefromknownvaluesof
anothervariable.

•Itprovidesameasureoferrorsofestimatesmadethrough
regressionlines.
•Itprovidesameasureofcoefficientofcorrelation
=r
•Itprovidesameasureofcoefficientofdeterminationwhich
speaksoftheeffectoftheindependentvariable(explainatoror
regressingvariable)onthedependentvariable(explainedor
regressedvariable)whichgivesusanideaaboutthepredictive
valuesoftheregressionanalysis(r
2
).Greaterthevalueofr
2
,the
betteristhefitandmoreusefularetheregressionequationsas
theestimatingdevices.
•Itprovidesavaluabletoolformeasuringandestimatingthe
causeandeffectrelationshipamongtheeconomicvariables.

•Aregressionthatexplainslinearchangeofadependent
variablebasedonchangeofoneindependentvariableis
calledasimplelinearregression.Forexample,theweightof
cowscanbepredictedbyusingmeasurementofheartgirth.
Theaimistodeterminealinearfunctionthatwillexplain
changesinweightasheartgirthchanges.Heartgirthisthe
independentvariableandweightisthedependentvariable.

Y = -514.14 + 5.41 X
•For X = 230
•Ŷ = -514.14 + 5.41 (230)
= -514.14+1244.3
=730.16 kg

Regression Analysis: How Do I Interpret R-
squared and Assess the Goodness-of-Fit?
•Linearregressioncalculates
anequationthatminimizes
thedistancebetweenthe
fittedlineandallofthe
datapoints.
•Technically,ordinaryleast
squares(OLS)regression
minimizesthesumofthe
squaredresiduals.
Residual = Observed value -Fitted value

What is R-squared value?
•R-squared is a statistical measure of how close the data are
to the fitted regression line. It is also known as the
coefficient of determination, or the coefficient of multiple
determination for multiple regression.
•R-squared = Explained variation / Total variation
•R-squared is always between 0 and 100%:
•Higher the R-squared, the better the model fits your data.

•Theregressionmodelontheleftexplained
38.0%ofthetotalvariancewhiletheoneon
therightexplained87.4%oftotalvariance.
•Themoreexplainedvarianceindicatedthe
regressionmodelwillgivecloserthedata
pointsofobservedandpredictedthrough
regressionline.
•Theoretically,ifamodelcouldexplain100%of
thevariance,thefittedvalueswouldalways
equaltotheobservedvalues.
Tags