
Ridge Regression
- Aryan

- Feb 10
- 2 min read
What is Ridge Regression?
Ridge Regression is a type of linear regression that includes an L2 regularization term to prevent overfitting and handle multicollinearity. It modifies the ordinary least squares (OLS) cost function by adding a penalty term that shrinks large coefficients, making the model more stable.
Ridge Regression Cost Function
In standard linear regression, the cost function is:

In Ridge Regression, an L2 penalty term is added:

where:
λ (lambda) is the regularization parameter, controlling the penalty strength.
∑ w² is the sum of squared coefficients (L2 norm).
If λ = 0, Ridge Regression behaves like OLS.
If λ is large, coefficients shrink significantly, preventing overfitting.
Ridge Regression and Multicollinearity
- When independent variables are highly correlated (multicollinearity), OLS produces unstable and large coefficients.
- Ridge Regression stabilizes coefficients by shrinking them, making them less sensitive to correlated predictors.
- This prevents large swings in predictions when the data changes slightly.
Ridge Regression and Variance Reduction
- Overfitting occurs when high-variance models (like polynomial regression) fit training
data well but fail on test data.
- Ridge Regression reduces variance by constraining model complexity, leading to better generalization.
Ridge Regression Formula (Closed-Form Solution)
Unlike OLS, Ridge Regression modifies the normal equation:

where I is the identity matrix. The λI term ensures the matrix is invertible, even when XᵀX is singular due to multicollinearity.
When to Use Ridge Regression?
- When predictors are highly correlated (multicollinearity).
- When the model suffers from high variance (overfitting).
- When you want to regularize coefficients but not remove features completely (unlike Lasso).
Understanding Ridge Regression – 5 Key Points
How the coefficients get affected?
- Ridge Regression shrinks the coefficients towards zero but does not make them
exactly zero (unlike Lasso).
- This means all variables are kept in the model, just with smaller weights.
How large values of coefficients are affected?
- Large coefficients are penalized heavily because Ridge Regression adds their squared values to the loss function.
- This prevents any one feature from dominating the model, leading to better generalization.
Bias-Variance Trade-off
- Low λ (lambda) → Low bias, high variance (flexible but may overfit).
- Moderate λ → Balanced bias and variance (best generalization).
- High λ → High bias, low variance (may underfit, losing predictive power).
Impact on Loss Function
- The loss function in Ridge Regression includes an L2 penalty term, which adds the sum of squared coefficients:


Why is it called Ridge Regression ?
- The name "Ridge" comes from Ridge Estimators, which prevent the coefficients from becoming unstable (large values).
- It helps maintain a stable "ridge" of values even when features are highly correlated.
Controlling the Trade-off with λ (Lambda)
The choice of λ (lambda) determines the balance between bias and variance:

Optimal Choice:
Small λ → Low bias, high variance (good for small datasets, but risk of overfitting).
Moderate λ → Balanced bias and variance (best generalization).
Large λ → High bias, low variance (risk of underfitting).
-> Reduces variance by shrinking coefficients → makes the model more stable.
->Increases bias slightly because it restricts flexibility → may cause underfitting if
too large.
-> λ must be chosen carefully to balance bias and variance.
MATHEMATICAL FORMULATION






