vault backup: 2025-01-18 00:03:39
This commit is contained in:
parent
6608588f7a
commit
21bd52ea7b
1 changed files with 9 additions and 11 deletions
|
@ -27,12 +27,12 @@ $$L(\Theta) = \sum_{i=1}^n l(y_i, \hat{y}_i) + \sum_{k=1}^T \Omega(h_k)$$
|
||||||
|
|
||||||
1. **First Term: Training Loss $l(y_i, \hat{y}_i))$**
|
1. **First Term: Training Loss $l(y_i, \hat{y}_i))$**
|
||||||
|
|
||||||
- Measures how well the predictions y^i\hat{y}_i match the true labels yiy_i.
|
- Measures how well the predictions $\hat{y}_i$ match the true labels $y_i$.
|
||||||
- Common choices:
|
- Common choices:
|
||||||
- Mean Squared Error (MSE) for regression: l(yi,y^i)=(yi−y^i)2l(y_i, \hat{y}_i) = (y_i - \hat{y}_i)^2
|
- Mean Squared Error (MSE) for regression: $l(y_i, \hat{y}_i) = (y_i - \hat{y}_i)^2$
|
||||||
- Log Loss for binary classification: l(yi,y^i)=−[yilog(y^i)+(1−yi)log(1−y^i)]l(y_i, \hat{y}_i) = - \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]
|
- Log Loss for binary classification: $l(y_i, \hat{y}_i) = - \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]$
|
||||||
- Multiclass Log Loss for multiclass classification.
|
- Multiclass Log Loss for multiclass classification.
|
||||||
2. **Second Term: Regularization Term (Ω(hk)\Omega(h_k))**
|
2. **Second Term: Regularization Term ($\Omega(h_k)$)**
|
||||||
|
|
||||||
- Adds penalties for model complexity to avoid overfitting: $\Omega(h_k) = \gamma T + \frac{1}{2} \lambda \sum_j w_j^2$
|
- Adds penalties for model complexity to avoid overfitting: $\Omega(h_k) = \gamma T + \frac{1}{2} \lambda \sum_j w_j^2$
|
||||||
- T: Number of leaves in the tree.
|
- T: Number of leaves in the tree.
|
||||||
|
@ -46,7 +46,7 @@ $$L(\Theta) = \sum_{i=1}^n l(y_i, \hat{y}_i) + \sum_{k=1}^T \Omega(h_k)$$
|
||||||
|
|
||||||
XGBoost uses a **second-order Taylor approximation** to expand the loss function around the current prediction:
|
XGBoost uses a **second-order Taylor approximation** to expand the loss function around the current prediction:
|
||||||
|
|
||||||
L(Θ)≈∑i=1n[gihk(xi)+12hihk(xi)2]+Ω(hk)L(\Theta) \approx \sum_{i=1}^n \left[ g_i h_k(x_i) + \frac{1}{2} h_i h_k(x_i)^2 \right] + \Omega(h_k)
|
$$L(\Theta) \approx \sum_{i=1}^n \left[ g_i h_k(x_i) + \frac{1}{2} h_i h_k(x_i)^2 \right] + \Omega(h_k)$$
|
||||||
|
|
||||||
- **Gradient (gig_i):** First derivative of the loss function with respect to predictions.
|
- **Gradient (gig_i):** First derivative of the loss function with respect to predictions.
|
||||||
- **Hessian (hih_i):** Second derivative of the loss function with respect to predictions.
|
- **Hessian (hih_i):** Second derivative of the loss function with respect to predictions.
|
||||||
|
@ -79,12 +79,10 @@ Here’s a breakdown of how they are used:
|
||||||
|
|
||||||
For a given loss function l(y,y^)l(y, \hat{y}), the **gradient** (gg) and **Hessian** (hh) are computed for each training example:
|
For a given loss function l(y,y^)l(y, \hat{y}), the **gradient** (gg) and **Hessian** (hh) are computed for each training example:
|
||||||
|
|
||||||
- **Gradient (gig_i)**: Measures the direction and magnitude of the steepest ascent in the loss function with respect to the model's prediction:
|
- **Gradient ($g_i$)**: Measures the direction and magnitude of the steepest ascent in the loss function with respect to the model's prediction:
|
||||||
|
$$g_i = \frac{\partial l(y_i, \hat{y}_i)}{\partial \hat{y}_i}$$
|
||||||
gi=∂l(yi,y^i)∂y^ig_i = \frac{\partial l(y_i, \hat{y}_i)}{\partial \hat{y}_i}
|
- **Hessian ($h_i$)**: Measures the curvature (second derivative) of the loss function with respect to the model's prediction:
|
||||||
- **Hessian (hih_i)**: Measures the curvature (second derivative) of the loss function with respect to the model's prediction:
|
$$h_i = \frac{\partial^2 l(y_i, \hat{y}_i)}{\partial \hat{y}_i^2}$$
|
||||||
|
|
||||||
hi=∂2l(yi,y^i)∂y^i2h_i = \frac{\partial^2 l(y_i, \hat{y}_i)}{\partial \hat{y}_i^2}
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue