vault backup: 2025-01-18 00:00:19

This commit is contained in:
Marco Realacci 2025-01-18 00:00:19 +01:00
parent 779b4c8fc4
commit 6608588f7a

View file

@ -23,9 +23,9 @@ XGBoost (**eXtreme Gradient Boosting**) is an optimized and scalable implementat
XGBoost allows users to define a custom loss function, but it relies on second-order Taylor expansion (both gradient and Hessian) to optimize the objective. The general loss function in XGBoost consists of two components:
L(Θ)=∑i=1nl(yi,y^i)+∑k=1TΩ(hk)L(\Theta) = \sum_{i=1}^n l(y_i, \hat{y}_i) + \sum_{k=1}^T \Omega(h_k)
$$L(\Theta) = \sum_{i=1}^n l(y_i, \hat{y}_i) + \sum_{k=1}^T \Omega(h_k)$$
1. **First Term: Training Loss (l(yi,y^i)l(y_i, \hat{y}_i))**
1. **First Term: Training Loss $l(y_i, \hat{y}_i))$**
- Measures how well the predictions y^i\hat{y}_i match the true labels yiy_i.
- Common choices:
@ -100,10 +100,10 @@ $$\text{Gain} = \frac{1}{2} \left[ \frac{G_L^2}{H_L + \lambda} + \frac{G_R^2}{H_
Where:
- GLG_L, GRG_R: Sum of gradients for the left and right child nodes.
- HLH_L, HRH_R: Sum of Hessians for the left and right child nodes.
- λ\lambda: L2 regularization parameter (smooths the model).
- γ\gamma: Minimum loss reduction required to make a split (controls tree complexity).
- $G_L$, $G_R$: Sum of gradients for the left and right child nodes.
- $H_L$, $H_R$: Sum of Hessians for the left and right child nodes.
- $\lambda$: L2 regularization parameter (smooths the model).
- $\gamma$: Minimum loss reduction required to make a split (controls tree complexity).
The algorithm selects the split that maximizes the gain.
@ -113,13 +113,13 @@ The algorithm selects the split that maximizes the gain.
Once a tree structure is determined, the weight of each leaf is optimized using both the gradients and Hessians. The optimal weight wjw_j for a leaf jj is calculated as:
wj=GjHj+λw_j = -\frac{G_j}{H_j + \lambda}
$$w_j =-\frac{G_j}{H_j + \lambda}$$
Where:
- GjG_j: Sum of gradients for all examples in the leaf.
- HjH_j: Sum of Hessians for all examples in the leaf.
- λ\lambda: L2 regularization parameter.
- $G_j$: Sum of gradients for all examples in the leaf.
- $H_j$: Sum of Hessians for all examples in the leaf.
- $\lambda$: L2 regularization parameter.
This weight minimizes the loss for that leaf, balancing model complexity and predictive accuracy.
@ -129,13 +129,13 @@ This weight minimizes the loss for that leaf, balancing model complexity and pre
After computing the optimal splits and leaf weights, the predictions for the dataset are updated:
y^i(t+1)=y^i(t)+η⋅w(xi)\hat{y}_i^{(t+1)} = \hat{y}_i^{(t)} + \eta \cdot w(x_i)
$$\hat{y}_i^{(t+1)} = \hat{y}_i^{(t)} + \eta \cdot w(x_i)$$
Where:
- y^i(t)\hat{y}_i^{(t)}: Prediction for sample ii at iteration tt.
- η\eta: Learning rate (controls step size).
- w(xi)w(x_i): Weight of the leaf to which xix_i belongs in the new tree.
- $\hat{y}_i^{(t)}$: Prediction for sample ii at iteration $t$.
- $\eta$: Learning rate (controls step size).
- $w(x_i)$: Weight of the leaf to which $x_i$ belongs in the new tree.
This iterative process improves the model's predictions by reducing the residual errors at each step.
@ -143,8 +143,8 @@ This iterative process improves the model's predictions by reducing the residual
### **Why Use Gradient and Hessian?**
1. **Gradient (gg):** Indicates the direction and magnitude of adjustments needed to reduce the loss.
2. **Hessian (hh):** Helps adjust for the curvature of the loss function, leading to more precise updates (second-order optimization).
1. **Gradient ($g$):** Indicates the direction and magnitude of adjustments needed to reduce the loss.
2. **Hessian ($h$):** Helps adjust for the curvature of the loss function, leading to more precise updates (second-order optimization).
By leveraging both, XGBoost: