Statistical Metrics in Regression (R-sq vs Adj. R-sq vs Pred R-sq)
sadikus
5 views
6 slides
Jun 17, 2024
Slide 1 of 6
1
2
3
4
5
6
About This Presentation
Regression metrics
Size: 441.95 KB
Language: en
Added: Jun 17, 2024
Slides: 6 pages
Slide Content
R- sq (vs) R- sq ( adj ) (vs) R- sq ( pred )
Original Model WHAT IS R- sq ? WHAT IS R- sq ( adj )? WHAT IS R- sq ( pred )?
Reduced Model WHAT IS R- sq ? WHAT IS R- sq ( adj )? WHAT IS R- sq ( pred )?
R-squared R- sq is the measure in ANOVA that explains the percentage of variation explained by all the factors included in the study/analysis. Calculate by Here 1.1138 / 1.29778 = 87.67% ( 8 DF) 1.06889 / 1.29778 = 82.36% (4DF) Regardless of whether factor is significant or not, R- sq is calculated It means, the more factors (even wrong ones) you add, R- sq value keeps increasing So real fit of your regression or ANOVA model is to be decided based on Adj.R-sq !
Adjusted R- sq R- sq can be used to know if the factor you add has significance or not Once you know certain factors are not significant (higher p values), you can remove them from your model. This causes R- sq to reduce definitely! So when you keep reducing your model, Adjusted R- sq is a better measure of the fit! SO for our Reduced model in Slide 3, which measure represents the fit for the model? Adjusted R- sq of 76.94% Because, it considers the non-significant factors (4 DF of Temp*Press) that was removed. And so, it improved from 76.71% to 76.94% even when R- sq decreased by 5%!
Predicted R- sq R- sq tells you about the overall fit of the model Adjusted R- sq tells you about the actual fit of the model Both do not tell you about the predicted fit of the model! If you want to know the fit of the predicted values as per the regression model equation you generated, evaluate R- sq ( pred )! If the R- sq ( pred ) is too low than R- sq or R- sq ( adj ) it means, that your model is only good enough to explain the current relationship and not much about the predicted values by regression. So if you want to optimize or predict, make sure R- sq ( pred ) is high or closer to R- sq or Adjusted R-sq. Here, in Model 1, R- sq ( pred ) was at 51% only. But in Model 2, R- sq ( pred ) improved to 67% (+16% improvement) by just removing the non significant factors This could be further improved by adding a third factor probably!