From 6608588f7a011c05291363e3db6e9bcd893518c0 Mon Sep 17 00:00:00 2001
From: Marco Realacci <marco@marcorealacci.me>
Date: Sat, 18 Jan 2025 00:00:19 +0100
Subject: [PATCH] vault backup: 2025-01-18 00:00:19

---
 Foundation of data science/notes/9 XGBoost.md | 32 +++++++++----------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/Foundation of data science/notes/9 XGBoost.md b/Foundation of data science/notes/9 XGBoost.md
index 3dfaa49..2046107 100644
--- a/Foundation of data science/notes/9 XGBoost.md	
+++ b/Foundation of data science/notes/9 XGBoost.md	
@@ -23,9 +23,9 @@ XGBoost (**eXtreme Gradient Boosting**) is an optimized and scalable implementat
 
 XGBoost allows users to define a custom loss function, but it relies on second-order Taylor expansion (both gradient and Hessian) to optimize the objective. The general loss function in XGBoost consists of two components:
 
-L(Θ)=∑i=1nl(yi,y^i)+∑k=1TΩ(hk)L(\Theta) = \sum_{i=1}^n l(y_i, \hat{y}_i) + \sum_{k=1}^T \Omega(h_k)
+$$L(\Theta) = \sum_{i=1}^n l(y_i, \hat{y}_i) + \sum_{k=1}^T \Omega(h_k)$$
 
-1. **First Term: Training Loss (l(yi,y^i)l(y_i, \hat{y}_i))**
+1. **First Term: Training Loss $l(y_i, \hat{y}_i))$**
     
     - Measures how well the predictions y^i\hat{y}_i match the true labels yiy_i.
     - Common choices:
@@ -100,10 +100,10 @@ $$\text{Gain} = \frac{1}{2} \left[ \frac{G_L^2}{H_L + \lambda} + \frac{G_R^2}{H_
 
 Where:
 
-- GLG_L, GRG_R: Sum of gradients for the left and right child nodes.
-- HLH_L, HRH_R: Sum of Hessians for the left and right child nodes.
-- λ\lambda: L2 regularization parameter (smooths the model).
-- γ\gamma: Minimum loss reduction required to make a split (controls tree complexity).
+- $G_L$, $G_R$: Sum of gradients for the left and right child nodes.
+- $H_L$, $H_R$: Sum of Hessians for the left and right child nodes.
+- $\lambda$: L2 regularization parameter (smooths the model).
+- $\gamma$: Minimum loss reduction required to make a split (controls tree complexity).
 
 The algorithm selects the split that maximizes the gain.
 
@@ -113,13 +113,13 @@ The algorithm selects the split that maximizes the gain.
 
 Once a tree structure is determined, the weight of each leaf is optimized using both the gradients and Hessians. The optimal weight wjw_j for a leaf jj is calculated as:
 
-wj=−GjHj+λw_j = -\frac{G_j}{H_j + \lambda}
+$$w_j =-\frac{G_j}{H_j + \lambda}$$
 
 Where:
 
-- GjG_j: Sum of gradients for all examples in the leaf.
-- HjH_j: Sum of Hessians for all examples in the leaf.
-- λ\lambda: L2 regularization parameter.
+- $G_j$: Sum of gradients for all examples in the leaf.
+- $H_j$: Sum of Hessians for all examples in the leaf.
+- $\lambda$: L2 regularization parameter.
 
 This weight minimizes the loss for that leaf, balancing model complexity and predictive accuracy.
 
@@ -129,13 +129,13 @@ This weight minimizes the loss for that leaf, balancing model complexity and pre
 
 After computing the optimal splits and leaf weights, the predictions for the dataset are updated:
 
-y^i(t+1)=y^i(t)+η⋅w(xi)\hat{y}_i^{(t+1)} = \hat{y}_i^{(t)} + \eta \cdot w(x_i)
+$$\hat{y}_i^{(t+1)} = \hat{y}_i^{(t)} + \eta \cdot w(x_i)$$
 
 Where:
 
-- y^i(t)\hat{y}_i^{(t)}: Prediction for sample ii at iteration tt.
-- η\eta: Learning rate (controls step size).
-- w(xi)w(x_i): Weight of the leaf to which xix_i belongs in the new tree.
+- $\hat{y}_i^{(t)}$: Prediction for sample ii at iteration $t$.
+- $\eta$: Learning rate (controls step size).
+- $w(x_i)$: Weight of the leaf to which $x_i$ belongs in the new tree.
 
 This iterative process improves the model's predictions by reducing the residual errors at each step.
 
@@ -143,8 +143,8 @@ This iterative process improves the model's predictions by reducing the residual
 
 ### **Why Use Gradient and Hessian?**
 
-1. **Gradient (gg):** Indicates the direction and magnitude of adjustments needed to reduce the loss.
-2. **Hessian (hh):** Helps adjust for the curvature of the loss function, leading to more precise updates (second-order optimization).
+1. **Gradient ($g$):** Indicates the direction and magnitude of adjustments needed to reduce the loss.
+2. **Hessian ($h$):** Helps adjust for the curvature of the loss function, leading to more precise updates (second-order optimization).
 
 By leveraging both, XGBoost: