vault backup: 2025-01-18 00:03:39
This commit is contained in:
parent
6608588f7a
commit
21bd52ea7b
1 changed files with 9 additions and 11 deletions
|
@ -27,12 +27,12 @@ $$L(\Theta) = \sum_{i=1}^n l(y_i, \hat{y}_i) + \sum_{k=1}^T \Omega(h_k)$$
|
|||
|
||||
1. **First Term: Training Loss $l(y_i, \hat{y}_i))$**
|
||||
|
||||
- Measures how well the predictions y^i\hat{y}_i match the true labels yiy_i.
|
||||
- Measures how well the predictions $\hat{y}_i$ match the true labels $y_i$.
|
||||
- Common choices:
|
||||
- Mean Squared Error (MSE) for regression: l(yi,y^i)=(yi−y^i)2l(y_i, \hat{y}_i) = (y_i - \hat{y}_i)^2
|
||||
- Log Loss for binary classification: l(yi,y^i)=−[yilog(y^i)+(1−yi)log(1−y^i)]l(y_i, \hat{y}_i) = - \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]
|
||||
- Mean Squared Error (MSE) for regression: $l(y_i, \hat{y}_i) = (y_i - \hat{y}_i)^2$
|
||||
- Log Loss for binary classification: $l(y_i, \hat{y}_i) = - \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]$
|
||||
- Multiclass Log Loss for multiclass classification.
|
||||
2. **Second Term: Regularization Term (Ω(hk)\Omega(h_k))**
|
||||
2. **Second Term: Regularization Term ($\Omega(h_k)$)**
|
||||
|
||||
- Adds penalties for model complexity to avoid overfitting: $\Omega(h_k) = \gamma T + \frac{1}{2} \lambda \sum_j w_j^2$
|
||||
- T: Number of leaves in the tree.
|
||||
|
@ -46,7 +46,7 @@ $$L(\Theta) = \sum_{i=1}^n l(y_i, \hat{y}_i) + \sum_{k=1}^T \Omega(h_k)$$
|
|||
|
||||
XGBoost uses a **second-order Taylor approximation** to expand the loss function around the current prediction:
|
||||
|
||||
L(Θ)≈∑i=1n[gihk(xi)+12hihk(xi)2]+Ω(hk)L(\Theta) \approx \sum_{i=1}^n \left[ g_i h_k(x_i) + \frac{1}{2} h_i h_k(x_i)^2 \right] + \Omega(h_k)
|
||||
$$L(\Theta) \approx \sum_{i=1}^n \left[ g_i h_k(x_i) + \frac{1}{2} h_i h_k(x_i)^2 \right] + \Omega(h_k)$$
|
||||
|
||||
- **Gradient (gig_i):** First derivative of the loss function with respect to predictions.
|
||||
- **Hessian (hih_i):** Second derivative of the loss function with respect to predictions.
|
||||
|
@ -79,12 +79,10 @@ Here’s a breakdown of how they are used:
|
|||
|
||||
For a given loss function l(y,y^)l(y, \hat{y}), the **gradient** (gg) and **Hessian** (hh) are computed for each training example:
|
||||
|
||||
- **Gradient (gig_i)**: Measures the direction and magnitude of the steepest ascent in the loss function with respect to the model's prediction:
|
||||
|
||||
gi=∂l(yi,y^i)∂y^ig_i = \frac{\partial l(y_i, \hat{y}_i)}{\partial \hat{y}_i}
|
||||
- **Hessian (hih_i)**: Measures the curvature (second derivative) of the loss function with respect to the model's prediction:
|
||||
|
||||
hi=∂2l(yi,y^i)∂y^i2h_i = \frac{\partial^2 l(y_i, \hat{y}_i)}{\partial \hat{y}_i^2}
|
||||
- **Gradient ($g_i$)**: Measures the direction and magnitude of the steepest ascent in the loss function with respect to the model's prediction:
|
||||
$$g_i = \frac{\partial l(y_i, \hat{y}_i)}{\partial \hat{y}_i}$$
|
||||
- **Hessian ($h_i$)**: Measures the curvature (second derivative) of the loss function with respect to the model's prediction:
|
||||
$$h_i = \frac{\partial^2 l(y_i, \hat{y}_i)}{\partial \hat{y}_i^2}$$
|
||||
|
||||
---
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue