top of page

Ridge Regression

  • Writer: Aryan
    Aryan
  • Feb 10
  • 2 min read

What is Ridge Regression?

 

Ridge Regression is a type of linear regression that includes an L2 regularization term to prevent overfitting and handle multicollinearity. It modifies the ordinary least squares (OLS) cost function by adding a penalty term that shrinks large coefficients, making the model more stable.


Ridge Regression Cost Function

 

In standard linear regression, the cost function is:

ree

In Ridge Regression, an L2 penalty term is added:

ree

where:

  λ (lambda) is the regularization parameter, controlling the penalty strength.

  ∑ w² is the sum of squared coefficients (L2 norm).

  If λ = 0, Ridge Regression behaves like OLS.

  If λ is large, coefficients shrink significantly, preventing overfitting.


Ridge Regression and Multicollinearity

 

- When independent variables are highly correlated (multicollinearity), OLS produces unstable and large coefficients.

- Ridge Regression stabilizes coefficients by shrinking them, making them less sensitive to correlated predictors.

- This prevents large swings in predictions when the data changes slightly.

 

Ridge Regression and Variance Reduction

 

- Overfitting occurs when high-variance models (like polynomial regression) fit training

data well but fail on test data.

- Ridge Regression reduces variance by constraining model complexity, leading to better generalization.


Ridge Regression Formula (Closed-Form Solution)

 

Unlike OLS, Ridge Regression modifies the normal equation:

ree

where I is the identity matrix. The λI term ensures the matrix is invertible, even when XᵀX is singular due to multicollinearity.


When to Use Ridge Regression?

 

- When predictors are highly correlated (multicollinearity).

- When the model suffers from high variance (overfitting).

- When you want to regularize coefficients but not remove features completely (unlike Lasso).

 

Understanding Ridge Regression – 5 Key Points

 

  1.  How the coefficients get affected?


    - Ridge Regression shrinks the coefficients towards zero but does not make them

    exactly zero (unlike Lasso).

    - This means all variables are kept in the model, just with smaller weights.

 

  1. How large values of coefficients are affected?


    - Large coefficients are penalized heavily because Ridge Regression adds their squared values to the loss function.

    - This prevents any one feature from dominating the model, leading to better generalization.

 

  1. Bias-Variance Trade-off


    - Low λ (lambda) → Low bias, high variance (flexible but may overfit).

    - Moderate λ → Balanced bias and variance (best generalization).

    - High λ → High bias, low variance (may underfit, losing predictive power).


  1. Impact on Loss Function

    - The loss function in Ridge Regression includes an L2 penalty term, which adds the sum of squared coefficients:                               

ree

ree

  1. Why is it called Ridge Regression ?


    - The name "Ridge" comes from Ridge Estimators, which prevent the coefficients from becoming unstable (large values).

    - It helps maintain a stable "ridge" of values even when features are highly correlated.


Controlling the Trade-off with λ (Lambda)

 

The choice of λ (lambda) determines the balance between bias and variance:

ree

Optimal Choice:

 Small λ → Low bias, high variance (good for small datasets, but risk of overfitting).

 Moderate λ → Balanced bias and variance (best generalization).

 Large λ → High bias, low variance (risk of underfitting).

 

-> Reduces variance by shrinking coefficients → makes the model more stable.

->Increases bias slightly because it restricts flexibility → may cause underfitting if      

     too large.

-> λ must be chosen carefully to balance bias and variance.


MATHEMATICAL FORMULATION

ree
ree
ree
ree

bottom of page