vault backup: 2025-01-18 00:00:19
This commit is contained in:
parent
779b4c8fc4
commit
6608588f7a
1 changed files with 16 additions and 16 deletions
|
@ -23,9 +23,9 @@ XGBoost (**eXtreme Gradient Boosting**) is an optimized and scalable implementat
|
|||
|
||||
XGBoost allows users to define a custom loss function, but it relies on second-order Taylor expansion (both gradient and Hessian) to optimize the objective. The general loss function in XGBoost consists of two components:
|
||||
|
||||
L(Θ)=∑i=1nl(yi,y^i)+∑k=1TΩ(hk)L(\Theta) = \sum_{i=1}^n l(y_i, \hat{y}_i) + \sum_{k=1}^T \Omega(h_k)
|
||||
$$L(\Theta) = \sum_{i=1}^n l(y_i, \hat{y}_i) + \sum_{k=1}^T \Omega(h_k)$$
|
||||
|
||||
1. **First Term: Training Loss (l(yi,y^i)l(y_i, \hat{y}_i))**
|
||||
1. **First Term: Training Loss $l(y_i, \hat{y}_i))$**
|
||||
|
||||
- Measures how well the predictions y^i\hat{y}_i match the true labels yiy_i.
|
||||
- Common choices:
|
||||
|
@ -100,10 +100,10 @@ $$\text{Gain} = \frac{1}{2} \left[ \frac{G_L^2}{H_L + \lambda} + \frac{G_R^2}{H_
|
|||
|
||||
Where:
|
||||
|
||||
- GLG_L, GRG_R: Sum of gradients for the left and right child nodes.
|
||||
- HLH_L, HRH_R: Sum of Hessians for the left and right child nodes.
|
||||
- λ\lambda: L2 regularization parameter (smooths the model).
|
||||
- γ\gamma: Minimum loss reduction required to make a split (controls tree complexity).
|
||||
- $G_L$, $G_R$: Sum of gradients for the left and right child nodes.
|
||||
- $H_L$, $H_R$: Sum of Hessians for the left and right child nodes.
|
||||
- $\lambda$: L2 regularization parameter (smooths the model).
|
||||
- $\gamma$: Minimum loss reduction required to make a split (controls tree complexity).
|
||||
|
||||
The algorithm selects the split that maximizes the gain.
|
||||
|
||||
|
@ -113,13 +113,13 @@ The algorithm selects the split that maximizes the gain.
|
|||
|
||||
Once a tree structure is determined, the weight of each leaf is optimized using both the gradients and Hessians. The optimal weight wjw_j for a leaf jj is calculated as:
|
||||
|
||||
wj=−GjHj+λw_j = -\frac{G_j}{H_j + \lambda}
|
||||
$$w_j =-\frac{G_j}{H_j + \lambda}$$
|
||||
|
||||
Where:
|
||||
|
||||
- GjG_j: Sum of gradients for all examples in the leaf.
|
||||
- HjH_j: Sum of Hessians for all examples in the leaf.
|
||||
- λ\lambda: L2 regularization parameter.
|
||||
- $G_j$: Sum of gradients for all examples in the leaf.
|
||||
- $H_j$: Sum of Hessians for all examples in the leaf.
|
||||
- $\lambda$: L2 regularization parameter.
|
||||
|
||||
This weight minimizes the loss for that leaf, balancing model complexity and predictive accuracy.
|
||||
|
||||
|
@ -129,13 +129,13 @@ This weight minimizes the loss for that leaf, balancing model complexity and pre
|
|||
|
||||
After computing the optimal splits and leaf weights, the predictions for the dataset are updated:
|
||||
|
||||
y^i(t+1)=y^i(t)+η⋅w(xi)\hat{y}_i^{(t+1)} = \hat{y}_i^{(t)} + \eta \cdot w(x_i)
|
||||
$$\hat{y}_i^{(t+1)} = \hat{y}_i^{(t)} + \eta \cdot w(x_i)$$
|
||||
|
||||
Where:
|
||||
|
||||
- y^i(t)\hat{y}_i^{(t)}: Prediction for sample ii at iteration tt.
|
||||
- η\eta: Learning rate (controls step size).
|
||||
- w(xi)w(x_i): Weight of the leaf to which xix_i belongs in the new tree.
|
||||
- $\hat{y}_i^{(t)}$: Prediction for sample ii at iteration $t$.
|
||||
- $\eta$: Learning rate (controls step size).
|
||||
- $w(x_i)$: Weight of the leaf to which $x_i$ belongs in the new tree.
|
||||
|
||||
This iterative process improves the model's predictions by reducing the residual errors at each step.
|
||||
|
||||
|
@ -143,8 +143,8 @@ This iterative process improves the model's predictions by reducing the residual
|
|||
|
||||
### **Why Use Gradient and Hessian?**
|
||||
|
||||
1. **Gradient (gg):** Indicates the direction and magnitude of adjustments needed to reduce the loss.
|
||||
2. **Hessian (hh):** Helps adjust for the curvature of the loss function, leading to more precise updates (second-order optimization).
|
||||
1. **Gradient ($g$):** Indicates the direction and magnitude of adjustments needed to reduce the loss.
|
||||
2. **Hessian ($h$):** Helps adjust for the curvature of the loss function, leading to more precise updates (second-order optimization).
|
||||
|
||||
By leveraging both, XGBoost:
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue