♎
Limited AI
  • Machine Learning
    • Linear Model Cheating Sheet
    • Nonlinear Model Cheating Sheet
    • General Linear Model 1
    • General Linear Model 2
    • General Linear Model 3
    • Tree Based Methods
    • Tree Based Methods Supplement
    • XG,Cat,Light__Boosting
    • KNN&PCA
    • Model Performance
    • Model Evaluation
    • Code Practice
      • KNN
      • Decision Tree Python Code
    • Data and Feature Engineering
      • Handle Bias Data
      • Cold Start Problem
  • Deep Learning
    • Summary v2
    • Basic Neural Network
      • From Linear to Deep
      • Perceptron and Activation Function
      • NN network Details
      • Backpropagation Details
      • Gradient Vanishing vs Gradient Exploding
    • Basic CNN
      • Why CNN
      • Filter/ Convolution Kernel and Its Operation
      • Padding& Stride
      • Layers
      • Extra:From Fully Connected Layers to Convolutions
      • Extra: Multiple Input and Multiple Output Channels
    • Advance CNN
      • Convolutional Neural Networks(LeNet)
      • Deep Convolution Neural Networks(AlexNet)
      • Networks Using Blocks (VGG)
      • Network in Network(NiN)
      • Multi-Branch Networks(GoogLeNet&I mageNet)
      • Residual Networks(ResNet) and ResNeXt
      • Densely Connected Networks(DenseNet)
      • Batch Normalization
    • Basic RNN
      • Seq Model
      • Raw Text to Seq
      • Language Models
      • Recurrent Neural Networks(RNN)
      • Backpropagation Through Time
    • Advance RNN
      • Gated Recurrent Units(GRU)
      • Long Short-Term Memory(LSTM)
      • Bidirectional Recurrent Neural Networks(BRNN)
      • Encoder-Decoder Architecture
      • Seuqence to Sequence Learning(Seq2Seq)
    • Attention Mechanisms and Transformers
      • Queries, Keys, and Values
      • Attention is all you need
        • Attention and Kernel
        • Attention Scoring Functions
        • The Bahdanau Attention Mechanism
        • Multi-Head Attention
        • Self-Attention
        • Attention的实现
      • The Transformer Architecture
        • Extra Reading
        • 最短的最大路径长度
      • Large-Scaling Pretraning with Transformers
        • BERT vs OpenAI GPT vs ELMo
        • Decoder Model框架
        • Bert vs XLNet
        • T5& GPT& Bert比较
        • 编码器-解码器架构 vs GPT 模型
        • Encoder vs Decoder Reference
      • Transformers for Vision
      • Transformer for Multiomodal
    • NLP Pretraining
      • Word Embedding(word2vec)
        • Extra Reading
      • Approximate Training
      • Word Embedding with Global Vectors(GloVe)
        • Extra Reading
        • Supplement
      • Encoder(BERT)
        • BERT
        • Extra Reading
      • Decoder(GPT&XLNet&Lamma)
        • GPT
        • XLNet
          • XLNet架构
          • XLNet特点与其他比较
      • Encoder-Decoder(BART& T5)
        • BART
        • T5
  • GenAI
    • Introduction
      • GenAI Paper Must Read
      • GenAI六个阶段
    • Language Models Pre-training
      • Encoder-Decoder Architecture
      • Encoder Deep Dive
      • Decoder Deep Dive
      • Encoder VS Decoder
      • Attention Mechanism
      • Transformers
    • Example: Llama 3 8B架构
    • Fine-Tuning Generation Models
    • RAG and Adavance RAG
    • AI Agent
  • Statistics and Optimization
    • A/B testing
    • Sampling/ABtesting/GradientMethod
    • Gradient Decent Deep Dive
  • Machine Learning System Design
    • Extra Reading
    • Introduction
  • Responsible AI
    • AI Risk and Uncertainty
      • What is AI risk
      • General Intro for Uncertainty Quantification
      • Calibration
      • Conformal Prediction
        • Review the linear regression
        • Exchangeability
        • Split Conformal Prediction
        • Conformalized Quantile Regression
        • Beyond marginal coverage
        • Split Conformal Classification
        • Full Conformal Coverage
        • Cross-Validation +
        • Conformal Histgram Regression
    • xAI
      • SHAP value
  • Extra Research
    • Paper Reading
    • Reference
Powered by GitBook
On this page
  • XGboosting
  • 什么是xgboosting
  • Xgboosting树的定义
  • Xgboosting 背后的数学
  • Catboosting
  • LightBoosting
  • Reference
  1. Machine Learning

XG,Cat,Light__Boosting

XGboosting

什么是xgboosting

  • Xgboosting本质上还是一个gbdt,但是努力的把速度和效率发挥到了极致,所以就被取名是X(Extreme)

  • 从损失函数的角度提出的,它在损失函数里加入了正则惩罚项,同时认为单单求偏导还不够.因为求偏导实际是一阶泰勒展开,属于一阶优化,收敛速度还不够快.他提出了损失函数二阶泰勒展开式的想法.

Xgboosting树的定义

Xgboosting 背后的数学

Review CART

Here’s a simple example of a CART that classifies whether someone will like a hypothetical computer game X. The example of tree is below:

The prediction scores of each individual decision tree then sum up to get If you look at the example, an important fact is that the two trees try to complement each other. Mathematically, we can write our model in the form

yi^=∑k=1Kfk(xi),fk∈F\hat{y_i} = \sum_{k=1}^K f_k (x_i), f_k \in \mathcal{F}yi​^​=∑k=1K​fk​(xi​),fk​∈F

where, K is the number of trees, fff is the functional space of FFF, FFF is the set of possible CARTs.

The objective function for the about model is given by:

obj(θ)=∑i=1n(yi,yi^)+∑k=1KΩ(fk)obj(\theta) = \sum_{i= 1}^n(y_i, \hat{y_i}) + \sum_{k=1}^K \Omega(f_k)obj(θ)=∑i=1n​(yi​,yi​^​)+∑k=1K​Ω(fk​)

where, first term is the loss function and the second is the regularization parameter.

Xgboosting

Now, Instead of learning the tree all at once which makes the optimization harder, we apply the additive strategy, minimize the loss what we have learned and add a new tree which can be summarised below:

y^i(0)=0\hat{y}_i^{(0)} = 0y^​i(0)​=0

y^i(1)=f1(xi)=y^i(0)+f1(xi)\hat{y}_i^{(1)} = f_1(x_i) =\hat{y}_i^{(0)} + f_1(x_i)y^​i(1)​=f1​(xi​)=y^​i(0)​+f1​(xi​) 

y^i(2)=f1(xi)+f2(xi)=y^i(1)+f2(xi)\hat{y}^{(2)}_i = f_1(x_i) + f_2(x_i) = \hat{y}^{(1)}_i + f_2(x_i)y^​i(2)​=f1​(xi​)+f2​(xi​)=y^​i(1)​+f2​(xi​)

The objective function of the above model can be defined as:

obj(t)=∑i=1nl(yi,y^i(t))+∑i=1tΩ(fi)=∑i=1nl(yi,y^i(t−1)+ft(xi))+Ω(ft)+constantobj^{(t)} = \sum_{i=1}^{n} l(y_i, \hat{y}^{(t)}_i) + \sum_{i=1}^{t} \Omega(f_i) = \sum_{i=1}^{n} l(y_i, \hat{y}^{(t-1)}_i + f_t(x_i)) + \Omega(f_t) + constantobj(t)=∑i=1n​l(yi​,y^​i(t)​)+∑i=1t​Ω(fi​)=∑i=1n​l(yi​,y^​i(t−1)​+ft​(xi​))+Ω(ft​)+constant

obj(t)=∑i=1n(yi−(y^i(t−1)+ft(xi)))2+∑i=1tΩ(fi)=∑i=1n[2(yi−y^i(t−1))ft(xi)+ft(xi)2]+Ω(ft)+constant]obj^{(t)} = \sum_{i=1}^{n} (y_i - (\hat{y}^{(t-1)}_i + f_t(x_i)))^2 + \sum_{i=1}^{t} \Omega(f_i) = \sum_{i=1}^{n} [2(y_i - \hat{y}^{(t-1)}_i) f_t(x_i) + f_t(x_i)^2] + \Omega(f_t) + constant ] obj(t)=∑i=1n​(yi​−(y^​i(t−1)​+ft​(xi​)))2+∑i=1t​Ω(fi​)=∑i=1n​[2(yi​−y^​i(t−1)​)ft​(xi​)+ft​(xi​)2]+Ω(ft​)+constant]

Now, let’s apply the Taylor series expansion up to the second order:

obj(t)=∑i=1n[l(yi,y^i(t−1))+gift(xi)+12hift2(xi)]+Ω(ft)+constant obj^{(t)} = \sum_{i=1}^{n} [l(y_i, \hat{y}^{(t-1)}_i) + g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)] + \Omega(f_t) + constant obj(t)=∑i=1n​[l(yi​,y^​i(t−1)​)+gi​ft​(xi​)+21​hi​ft2​(xi​)]+Ω(ft​)+constant

where g_i and h_i can be defined as:

gi=∂l(yi,y^i(t−1))∂y^i(t−1)g_i = \frac{\partial l(y_i, \hat{y}^{(t-1)}_i)}{\partial \hat{y}^{(t-1)}_i}gi​=∂y^​i(t−1)​∂l(yi​,y^​i(t−1)​)​

hi=∂2l(yi,y^i(t−1))∂y^i(t−1)h_i = \frac{\partial^2 l(y_i, \hat{y}^{(t-1)}_i)}{\partial \hat{y}^{(t-1)}_i} hi​=∂y^​i(t−1)​∂2l(yi​,y^​i(t−1)​)​

Simplifying and removing the constant:

∑i=1n[gift(xi)+12hift2(xi)]+Ω(ft)\sum_{i=1}^{n} [g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)] + \Omega(f_t)∑i=1n​[gi​ft​(xi​)+21​hi​ft2​(xi​)]+Ω(ft​)

Regularization

Now, we define the regularization term, but first we need to define the model:

ft(x)=wq(x),w∈RT,q:Rd→1,2,...,T ft(x) = w_{q(x)}, w \in R^T, q : R^d \rightarrow {1, 2, ..., T}ft(x)=wq(x)​,w∈RT,q:Rd→1,2,...,T

Here, w is the vector of scores on leaves of tree, q is the function assigning each data point to the corresponding leaf, and T is the number of leaves. The regularization term is then defined by:

Ω(f)=γT+12λ∑j=1Twj2 \Omega(f) = \gamma T + \frac{1}{2} \lambda \sum_{j=1}^{T} w_j^2Ω(f)=γT+21​λ∑j=1T​wj2​

Now, our objective function becomes:

obj(t)≈∑i=1n[giwq(xi)+12hiwq(xi)2]+γT+12λ∑j=1Twj2==∑j=1T((∑i∈Ijgi)wj+12(∑i∈Ijhi+λ)wj2)+γT] obj^{(t)} \approx \sum_{i=1}^{n} [g_i w_{q(x_i)} + \frac{1}{2} h_i w_{q(x_i)}^2] + \gamma T + \frac{1}{2} \lambda \sum_{j=1}^{T} w_j^2 = = \sum_{j=1}^{T} \left( \left( \sum_{i \in I_j} g_i \right) w_j + \frac{1}{2} \left( \sum_{i \in I_j} h_i + \lambda \right) w_j^2 \right) + \gamma T ] obj(t)≈∑i=1n​[gi​wq(xi​)​+21​hi​wq(xi​)2​]+γT+21​λ∑j=1T​wj2​==∑j=1T​((∑i∈Ij​​gi​)wj​+21​(∑i∈Ij​​hi​+λ)wj2​)+γT]

Now, we simplify the above expression:

obj(t)=∑j=1T(Gjwj+12(Hj+λ)wj2)+γT obj^{(t)} = \sum_{j=1}^{T} \left( G_j w_j + \frac{1}{2} (H_j + \lambda) w_j^2 \right) + \gamma Tobj(t)=∑j=1T​(Gj​wj​+21​(Hj​+λ)wj2​)+γT

where,

Gj=∑i∈Ijgi G_j = \sum_{i \in I_j} g_iGj​=∑i∈Ij​​gi​

Hj=∑i∈Ijhi H_j = \sum_{i \in I_j} h_iHj​=∑i∈Ij​​hi​

wj∗=−GjHj+λw_j^* = -\frac{G_j}{H_j + \lambda}wj∗​=−Hj​+λGj​​

obj∗=−12∑j=1TGj2Hj+λ+γTobj^* = -\frac{1}{2} \sum_{j=1}^{T} \frac{G_j^2}{H_j + \lambda} + \gamma Tobj∗=−21​∑j=1T​Hj​+λGj2​​+γT

where, γ\gammaγ is pruning parameter, i.e., the least information gain to perform split.

Now, we try to measure how good the tree is; we can’t directly optimize the tree, we will try to optimize one level of the tree at a time. Specifically, we try to split a leaf into two leaves, and the score it gains is:

Gain=12(GL2HL+λ+GR2HR+λ−(GL+GR)2HL+HR+λ)−γGain = \frac{1}{2} \left( \frac{G_L^2}{H_L + \lambda} + \frac{G_R^2}{H_R + \lambda} - \frac{(G_L + G_R)^2}{H_L + H_R + \lambda} \right) - \gamma Gain=21​(HL​+λGL2​​+HR​+λGR2​​−HL​+HR​+λ(GL​+GR​)2​)−γ

xgboosting的核心算法

  • 就是不停的添加树,不断地进行特征分裂来生长一棵树,每次添加一个树。其实就是学习一个新的函数,去预测之前的残差

  • 当我们训练完成后得到的k棵树,我们要预测一个样本的分数,其实就是根据地这个样本的特征,在每棵树中会落到对应的一个叶子节点,每个叶子节点就对应一个分数

  • 最后只需要将每棵树对应的分数加起来就是该样本的预测值

Reference

Catboosting

LightBoosting

Reference

PreviousTree Based Methods SupplementNextKNN&PCA

Last updated 9 months ago

In this equation, w_j are independent of each other, the best for a given structure q(x) and the best objective reduction we can get is:

https://www.geeksforgeeks.org/xgboost/
https://mayuanucas.github.io/xgboost-lightgbm/
https://github.com/NLP-LOVE/ML-NLP/tree/master/Machine%20Learning/3.3%20XGBoost
https://blog.csdn.net/v_JULY_v/article/details/81410574
https://www.geeksforgeeks.org/xgboost/