vault backup: 2025-01-18 00:00:19

2025-01-18 00:00:19 +01:00 · 2025-01-18 00:00:19 +01:00 · 6608588f7a
commit 6608588f7a
parent 779b4c8fc4
1 changed files with 16 additions and 16 deletions
--- a/science/notes/9
+++ b/science/notes/9
@ -23,9 +23,9 @@ XGBoost (**eXtreme Gradient Boosting**) is an optimized and scalable implementat

 XGBoost allows users to define a custom loss function, but it relies on second-order Taylor expansion (both gradient and Hessian) to optimize the objective. The general loss function in XGBoost consists of two components:

-L(Θ)=∑i=1nl(yi,y^i)+∑k=1TΩ(hk)L(\Theta) = \sum_{i=1}^n l(y_i, \hat{y}_i) + \sum_{k=1}^T \Omega(h_k)
+$$L(\Theta) = \sum_{i=1}^n l(y_i, \hat{y}_i) + \sum_{k=1}^T \Omega(h_k)$$

-1. **First Term: Training Loss (l(yi,y^i)l(y_i, \hat{y}_i))**
+1. **First Term: Training Loss $l(y_i, \hat{y}_i))$**
    
    - Measures how well the predictions y^i\hat{y}_i match the true labels yiy_i.
    - Common choices:
@ -100,10 +100,10 @@ $$\text{Gain} = \frac{1}{2} \left[ \frac{G_L^2}{H_L + \lambda} + \frac{G_R^2}{H_

 Where:

- GLG_L, GRG_R: Sum of gradients for the left and right child nodes.
- HLH_L, HRH_R: Sum of Hessians for the left and right child nodes.
- λ\lambda: L2 regularization parameter (smooths the model).
- γ\gamma: Minimum loss reduction required to make a split (controls tree complexity).
+- $G_L$, $G_R$: Sum of gradients for the left and right child nodes.
+- $H_L$, $H_R$: Sum of Hessians for the left and right child nodes.
+- $\lambda$: L2 regularization parameter (smooths the model).
+- $\gamma$: Minimum loss reduction required to make a split (controls tree complexity).

 The algorithm selects the split that maximizes the gain.

@ -113,13 +113,13 @@ The algorithm selects the split that maximizes the gain.

 Once a tree structure is determined, the weight of each leaf is optimized using both the gradients and Hessians. The optimal weight wjw_j for a leaf jj is calculated as:

-wj=−GjHj+λw_j = -\frac{G_j}{H_j + \lambda}
+$$w_j =-\frac{G_j}{H_j + \lambda}$$

 Where:

- GjG_j: Sum of gradients for all examples in the leaf.
- HjH_j: Sum of Hessians for all examples in the leaf.
- λ\lambda: L2 regularization parameter.
+- $G_j$: Sum of gradients for all examples in the leaf.
+- $H_j$: Sum of Hessians for all examples in the leaf.
+- $\lambda$: L2 regularization parameter.

 This weight minimizes the loss for that leaf, balancing model complexity and predictive accuracy.

@ -129,13 +129,13 @@ This weight minimizes the loss for that leaf, balancing model complexity and pre

 After computing the optimal splits and leaf weights, the predictions for the dataset are updated:

-y^i(t+1)=y^i(t)+η⋅w(xi)\hat{y}_i^{(t+1)} = \hat{y}_i^{(t)} + \eta \cdot w(x_i)
+$$\hat{y}_i^{(t+1)} = \hat{y}_i^{(t)} + \eta \cdot w(x_i)$$

 Where:

- y^i(t)\hat{y}_i^{(t)}: Prediction for sample ii at iteration tt.
- η\eta: Learning rate (controls step size).
- w(xi)w(x_i): Weight of the leaf to which xix_i belongs in the new tree.
+- $\hat{y}_i^{(t)}$: Prediction for sample ii at iteration $t$.
+- $\eta$: Learning rate (controls step size).
+- $w(x_i)$: Weight of the leaf to which $x_i$ belongs in the new tree.

 This iterative process improves the model's predictions by reducing the residual errors at each step.

@ -143,8 +143,8 @@ This iterative process improves the model's predictions by reducing the residual

 ### **Why Use Gradient and Hessian?**

-1. **Gradient (gg):** Indicates the direction and magnitude of adjustments needed to reduce the loss.
-2. **Hessian (hh):** Helps adjust for the curvature of the loss function, leading to more precise updates (second-order optimization).
+1. **Gradient ($g$):** Indicates the direction and magnitude of adjustments needed to reduce the loss.
+2. **Hessian ($h$):** Helps adjust for the curvature of the loss function, leading to more precise updates (second-order optimization).

 By leveraging both, XGBoost: