Variants of Gradient Boosting¶

LightGBM¶

Catboost¶

XGBoost ¶

Types of finding the best split¶

Think about Big O notations !!!¶

Approach 1- Presort Algorithms¶

O(n(Features)*n(No of data points))¶

Traditional Implementations found in Sklearn ,also used in Xgboost¶

How to reduce this Time complexity in this algorithm when we are using Big datasets¶

diff

Approach 2 -Histogram Based Algorithms¶

IDEA 1¶

Use the Histograms to find the Bins for each feature¶

Use this bins to find the best split (This is supported by the fact that the Spliting on real value or bins does't cost much difference in accuracy)¶

Note : USing bins may also prevents from Overfiting¶

histogram

Apprach 3-Gradient Based Strategy¶

IDEA 2¶

Use Gradients to find the best split !!!¶

But How ??¶

What does the Large gradients and Small gradients with respect ro the Loss function tries to tell you?¶

Gradient-based One-Side Sampling (GOSS)¶

Taking Sparsity of the data as advantage¶

Ignoring sparse inputs (xgboost and lightGBM)¶

Xgboost proposes to ignore the 0 features when computing the split, then allocating all the data with missing values to whichever side of the split reduces the loss more. This reduces the number of samples that have to be used when evaluating each split, speeding up the training process.¶

goss_exp

Approach 4 -Exclusive Feature Bundling (lightGBM)¶

* some features are never non-zero together¶

In [3]:

from sklearn.ensemble import GradientBoostingClassifier,GradientBoostingRegressor
from sklearn.datasets import load_boston,load_wine
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn import metrics
import math
import numpy as np

def rmse(x,y): return math.sqrt(((x-y)**2).mean())

def print_score(m):
    res = [rmse(m.predict(X_train), y_train), rmse(m.predict(X_test), y_test),
                m.score(X_train, y_train), m.score(X_test, y_test)]
    if hasattr(m, 'oob_score_'): res.append(m.oob_score_)
    print(res)
house_price=load_boston(return_X_y=False)
#house_price['data']
#house_price['feature_names']
#house_price['target']
X_df=pd.DataFrame(data=house_price['data'],columns=house_price['feature_names'])
y=house_price['target']
X_df.head(10)

Out[3]:

	CRIM	ZN	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33
5	0.02985	0.0	2.18	0.458	6.430	58.7	6.0622	3.0	222.0	18.7	394.12	5.21
6	0.08829	12.5	7.87	0.524	6.012	66.6	5.5605	5.0	311.0	15.2	395.60	12.43
7	0.14455	12.5	7.87	0.524	6.172	96.1	5.9505	5.0	311.0	15.2	396.90	19.15
8	0.21124	12.5	7.87	0.524	5.631	100.0	6.0821	5.0	311.0	15.2	386.63	29.93
9	0.17004	12.5	7.87	0.524	6.004	85.9	6.5921	5.0	311.0	15.2	386.71	17.10

In [12]:

X_df.nunique()

Out[12]:

CRIM       504
ZN          26
INDUS       76
CHAS         2
NOX         81
RM         446
AGE        356
DIS        412
RAD          9
TAX         66
PTRATIO     46
B          357
LSTAT      455
dtype: int64

Catboost DEMO¶

In [45]:

#pred=m.predict(X_test)
plt.plot(y_test,label='orig')
plt.plot(preds1,label='pred')
plt.legend()
plt.show()
rmse(y_test,preds1)

Out[45]:

4.501594069582915

In [28]:

#pred=m.predict(X_test)
plt.plot(y_test,label='orig')
plt.plot(preds,label='pred')
plt.legend()
plt.show()
rmse(y_test,preds)

Out[28]:

5.893656062165955

Variants of Gradient Boosting¶

LightGBM¶

Catboost¶

XGBoost ¶

Types of finding the best split¶

Think about Big O notations !!!¶

Approach 1- Presort Algorithms¶

O(n(Features)*n(No of data points))¶

Traditional Implementations found in Sklearn ,also used in Xgboost¶

How to reduce this Time complexity in this algorithm when we are using Big datasets¶

Approach 2 -Histogram Based Algorithms¶

IDEA 1¶

Use the Histograms to find the Bins for each feature¶

Use this bins to find the best split (This is supported by the fact that the Spliting on real value or bins does't cost much difference in accuracy)¶

Note : USing bins may also prevents from Overfiting¶

Apprach 3-Gradient Based Strategy¶

IDEA 2¶

Use Gradients to find the best split !!!¶

But How ??¶

What does the Large gradients and Small gradients with respect ro the Loss function tries to tell you?¶

Gradient-based One-Side Sampling (GOSS)¶

Taking Sparsity of the data as advantage¶

Ignoring sparse inputs (xgboost and lightGBM)¶

Xgboost proposes to ignore the 0 features when computing the split, then allocating all the data with missing values to whichever side of the split reduces the loss more. This reduces the number of samples that have to be used when evaluating each split, speeding up the training process.¶

Approach 4 -Exclusive Feature Bundling (lightGBM)¶

* some features are never non-zero together¶

Catboost DEMO¶

Reference :¶