Understanding GBM and XGBoost in Scikit-Learn

ssuserd6c109 412 views 12 slides Feb 16, 2019
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

Understanding GBM and XGBoost in Scikit-Learn


Slide Content

1 Ensemble Ensemble method is creating many learners(classifier or regressor) and learning new hypothesis by combining them Combining many learners can lead to more reliable prediction results than just one learner (Most of ensemble method mainly use many learners with the same algorithm)

Bagging and Boosting 2 Bagging assigns bootstrap sampled train data to various classifiers and predicts the results by voting or averaging the prediction results RandomForest Boosting assigns many weak learners and each learner trains data and repeatedly correct errors by updating weights Adaboost Gradient Boost eXtream Gradient Boost ( Xgboost) Generally speaking, Boosting is better in predition performance but it takes too much of time and has a little more chance of overfitting

3 + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - Classification 1 Classification 2 Classfication 3 Feature Dataset Step 1 Step 2 Step 3 Step 4 Step 5 + + + + + - - - - - Prediction by combining classification 1, 2,3 AdaBoost

4 0.3 0.5 0.8 + + + + + - - - - - AdaBoost Each weak learner is combined by updating weights. For example, first learner gets weight 0.3, second one gets 0.5 and third one gets 0.8 and consequently they are put together and do prediction

5 GBM(Gradient Boosting Machine) GBM is similar to Adaboost. Major difference is the way of updating weights. It is more sophisticatedly done by gradient descent. But it takes a lot of time by serially updating each weak learner’s weight.

Advantages of XGBOOST 6 eXtra Gradient Boost XGBoost eXcellent prediction performance Faster execution time compared with GBM CPU Parallel processing enabled Various enhancements Regularization Tree Pruning Various utilities Early Stopping Embeded cross validation Embeded Null value processing

XGBOOST implementation in python 7 C/C++ Native Module Python Wrapper Scikit-Learn Wrapper Initially XGBoost was made by C/C++ Python package was made by call native C/C++ It has its own python API and library Integrated with scikit-learn framework Training and prediction by fit( ) and predict( ) methods like any other classifiers in scikit-learn Having no problem of using other scikit-learn modules like GridSearchCV by seamless integration

XGBOOST Python Wrapper vs XGBOOST Scikit-learn Wrapper 8 Category Python Wrapper Scikit-learn Wrapper modules from xgboost as xgb from xgboost import XGBClassifier Training and test datasets DMatrix class is needed train = xgb.DMatrix(data=X_train , label=y_train) in order by create DMatrix objects, feature datasets and label datasets are provided as parameters Using numpy or pandas directly training API Xgb_model=xgb.train( ) Xgb_model is retured trained model by calling xgb.train() XGBClassifer.fit( ) prediction API Xgb_model.predict( ) predict( ) method is called in trained object by xgb.train() . Returned value is not direct prediction result. It is probability value for prediction result XGBClassifer.predict( ) Returning direct prediction results Feature importance visualization plot_importance( ) plot_importance( )

Hyper parameters of python wrapper and scikit-learn wrapper 9 Python Wrapper Scikit-Learn Wrapper Hyper parameter description eta learning_rate Same parameter with GBM’s learning_rate. It is learning rate of updating weights iterating boosting steps. Normally it is set between 0 and 1 Default value is 0.3 when using python wrapper xgboos, 0.1 when using scikit-learn wrapper xgboost num_boost_rounds n_estimators Same parameter with n_estimators in scikit learn ensemble. It means numbers of weak learners(iteration count) min_child_weight min_child_weight Similart to min_child_leaf of decision tree. It is used against overfitting max_depth max_depth Same with max_depth in decision tree. Max tree depth sub_sample subsample Same parameter with subsample in GBM. It sets samping percentage in order to prevent tree from growing bigger and overfitting. If you set sub_sample=0.5, a half of total data can be used for creating tree. 0 ~ 1 can be used but normally 0.5 ~ 1 is used

10 Python Wrapper Scikit-Learn Wrapper Hyper parameter description lambda reg_lambda L2 Regularization value. Default is 1. The bigger value, the more regualarization. Used for overfitting alpha reg_alpha L1 Regularization value. Default is 0. The bigger value, the more regualarization. Used for overfitting colsample_bytree colsample_bytree Similar to max_features in GBM. It is used for sampling features for making tree. If there are too many features , it is used for overfitting XGBoost 파이썬 Wrapper 하이퍼 파라미터와 사이킷런 Wrapper 하이퍼 파라미터 비교

11 XGBoost Early stopping XGBoost can stop its iterations before it reaches designated count unless cost is reduced during specified early stopping repetition interval. It can be wisely used in hyper parameter tuning process to reduce the tuning time. If you set too small value to early stopping, training can be finished without proper optimization Main parameters for early stopping e arly_stopping_rounds : Maximum iterations at which loss metric is no more enhanced e val_metric : cost evaluation metric. eval_set : evaluation dataset which is used in evaluating cost reduction.

12 XGBoost Wrap up XGBoost (and LightGBM) is most used ensemble method especially among Kagglers It can enhance prediction performance compared with GBM but not as much as rocket-boosted improvement Execution time is faster compared with GBM and it can be used with parallel processing with multi cpu cores Hyper parameter tuning is difficult because of too much of them. But you don’t have to stick to it as drastic improvement of performance is rare case in XGBoost XGBoost is not golden campass but it is widely used in various appllications especially in classification and regression