Ridge Regression

Aryan
Feb 10
2 min read

What is Ridge Regression?

Ridge Regression is a type of linear regression that includes an L2 regularization term to prevent overfitting and handle multicollinearity. It modifies the ordinary least squares (OLS) cost function by adding a penalty term that shrinks large coefficients, making the model more stable.

Ridge Regression Cost Function

In standard linear regression, the cost function is:

In Ridge Regression, an L2 penalty term is added:

where:

λ (lambda) is the regularization parameter, controlling the penalty strength.

∑ w² is the sum of squared coefficients (L2 norm).

If λ = 0, Ridge Regression behaves like OLS.

If λ is large, coefficients shrink significantly, preventing overfitting.

Ridge Regression and Multicollinearity

- When independent variables are highly correlated (multicollinearity), OLS produces unstable and large coefficients.

- Ridge Regression stabilizes coefficients by shrinking them, making them less sensitive to correlated predictors.

- This prevents large swings in predictions when the data changes slightly.

Ridge Regression and Variance Reduction

- Overfitting occurs when high-variance models (like polynomial regression) fit training

data well but fail on test data.

- Ridge Regression reduces variance by constraining model complexity, leading to better generalization.

Ridge Regression Formula (Closed-Form Solution)

Unlike OLS, Ridge Regression modifies the normal equation:

where I is the identity matrix. The λI term ensures the matrix is invertible, even when XᵀX is singular due to multicollinearity.

When to Use Ridge Regression?

- When predictors are highly correlated (multicollinearity).

- When the model suffers from high variance (overfitting).

- When you want to regularize coefficients but not remove features completely (unlike Lasso).

Understanding Ridge Regression – 5 Key Points

How the coefficients get affected?

- Ridge Regression shrinks the coefficients towards zero but does not make them
exactly zero (unlike Lasso).
- This means all variables are kept in the model, just with smaller weights.

How large values of coefficients are affected?

- Large coefficients are penalized heavily because Ridge Regression adds their squared values to the loss function.
- This prevents any one feature from dominating the model, leading to better generalization.

Bias-Variance Trade-off

- Low λ (lambda) → Low bias, high variance (flexible but may overfit).
- Moderate λ → Balanced bias and variance (best generalization).
- High λ → High bias, low variance (may underfit, losing predictive power).

Impact on Loss Function
- The loss function in Ridge Regression includes an L2 penalty term, which adds the sum of squared coefficients:

Why is it called Ridge Regression ?

- The name "Ridge" comes from Ridge Estimators, which prevent the coefficients from becoming unstable (large values).
- It helps maintain a stable "ridge" of values even when features are highly correlated.

Controlling the Trade-off with λ (Lambda)

The choice of λ (lambda) determines the balance between bias and variance: