Function lm() in R and its Basic Parameters Jiong Xun
Learning Objectives Understand linear regression Understand purpose of lm() function Using lm() to fit regression models Interpret output of lm() function
Ever Wondered How… Maps are able to estimate your travelling time Surcharge pricing are determined to meet demands for taxi HDB resale prices are forecasted
What is Linear Regression? Interested in the relationship between a dependent variable (y) and one or more independent variables (x) Models relationships between variables Simple Linear Regression (1 independent variable) Multiple Linear Regression (2 or more independent variables) Finds best-fit line that minimises distance between observed data values and predicted values
How do we obtain best-fit line?
Ordinary Least Squares Find the best-fit line ⇒ want the line to be as close to data points as possible Minimise vertical distance between each point to line Residual Sum of Squares (RSS) ⇒ squared sum of residuals for all data points Squared as we do not want residuals to “cancel off” one another Minimise RSS Minimum total distance between line and points Best-fit line
Visualised on a Simple Plot Actual Data Points Residuals Regression Line
How to We Use R to plot Linear Regression Model?
Syntax of lm() Function lm() ⇒ stands for linear model
Example Dataset “trees” Inches ft Cubic ft
Example Dataset “trees” Y variable (response) X variable (explanatory) Name of data frame that model is using
Example Dataset “trees”
Example Dataset “trees” Difference between observed and predicted values
Example Dataset “trees”
Example Dataset “trees” Used to predict value of the response variable
Example Dataset “trees”
Example Dataset “trees” Average amount that estimate varies from actual value
Example Dataset “trees”
Example Dataset “trees” t value = Estimate / std. Error
Example Dataset “trees”
Example Dataset “trees” p-value for the t-test to determine if coefficient is significant
Example Dataset “trees”
Example Dataset “trees” Standard deviation of the residuals Number of data points that went into estimation
Example Dataset “trees”
Example Dataset “trees” Gives a measurement of what % of variance in response can be explained by the regression
Example Dataset “trees”
Example Dataset “trees” Indicates if model as a whole is statistically significant
Example Dataset “trees” Predict Volume of tree based on Girth and Height of tree?