top of page

Support Vector Machine (SVM) – Part 4

  • Writer: Aryan
    Aryan
  • May 2
  • 8 min read

Transition to SVM

 

This toy example builds the foundation for how SVM uses constrained optimization :

  • In SVM, we try to maximize the margin between two classes (analogous to maximizing z)

  • While ensuring all data points lie on the correct side of the margin (like satisfying the constraint x² + y² = 1)

The underlying geometry — finding the best solution on a constrained path — is exactly what makes SVMs powerful.

 

Understanding Gradient Vector Fields in the Context of Constrained Optimization

 

In the previous part, we saw how to maximize a function under a constraint using the geometric visualization of 3D surfaces and contour plots. Now we dive deeper into what the arrows in contour plots mean and how they help us find the optimum — this is where gradient vector fields come in.


What Does the New Plot Show ?

 

Let’s now examine a new graph :

ree

This plot includes :

  • Contour lines of the function — each line represents a set of (x,y) points where the function value (z) is constant.

  • Arrows that form the gradient vector field.


What Are These Arrows ?

 

These arrows represent the gradient of the function f(x, y) = x²y at different points.

The gradient of a multivariable function is a vector formed by the partial derivatives :

ree

For our specific function f(x, y) = x²y , the gradient is :

ree

At every point (x , y), this gradient vector points in the direction in which f(x , y) increases the fastest — this is known as the direction of steepest ascent.


Gradient = Direction of Maximum Increase

 

Just like in single-variable calculus where the derivative tells you the slope of the function, in multivariable calculus the gradient tells you the slope and direction in which the function increases most rapidly.

  • The direction of the gradient vector points where z = f(x,y) increases fastest.

  • The length (magnitude) of the gradient vector tells how steep the increase is.


Gradient and Contour Lines : Orthogonality

 

There’s a beautiful geometric relationship between gradient vectors and contour lines :

The gradient vector at a point is always perpendicular to the contour line passing through that point.

Here’s why :

  • A contour line connects all points where the function has the same value (i.e., no change in z).

  • So, walking along a contour line means there is no change in function value.

  • But the gradient points in the direction where the function does change — the fastest.

  • Therefore, the direction of no change (contour) and the direction of maximum change (gradient) must be orthogonal.

ree

Mathematical Summary of Gradient Properties

 

Let’s formalize this with key properties :

 

1. Direction of Maximum Change

The gradient ∇f points in the direction in which the function increases most rapidly.

 

2. Perpendicular to Contour Lines

If C is a contour line of f(x, y), and you are standing at a point on that line, the gradient vector will be perpendicular to the tangent vector of C at that point.

 

3. Magnitude = Steepness

  • If the gradient vector is long, the function is changing rapidly (contour lines are close).

  • If the gradient is small, the function is changing slowly (contour lines are far apart).


Intuition in Optimization

 

In optimization, understanding the gradient helps us :

  • Navigate the function surface.

  • Identify how to increase or decrease the function value.

  • Visualize constraints as boundaries within which the gradient must "push".

In our earlier constrained optimization problem, the solution occurred where the gradient of the function and the constraint interacted perfectly — geometrically, this is the point where the contour line of the objective function just touches the constraint curve, and the gradients of both are aligned.

 

Visual Takeaway

 

When we overlay :

  • Contour lines of a function,

  • And gradient vectors,

…we get a full geometric map of how the function behaves, where it increases most, and how constraints influence possible solutions.

This understanding of gradient behavior and constraints is exactly what underpins the mathematical foundation of SVMs, especially when we formulate them as a constrained optimization problem.


Gradient Alignment and Lagrange Multipliers — The Formal Condition for Constrained Optimization

 

We’ve seen how to visualize a function f(x,y) and a constraint like x² + y² = 1 using 3D plots, contours, and gradient fields. Now, we’ll take this one step further : how do we mathematically express the condition at the point where our function achieves its maximum (or minimum) under the constraint ?


Revisiting the Constraint

 

We’ve been working with a constraint :

x² + y² = 1

ree

So, the constraint defines a family of circles.

 

Let’s now define this family more generally :

ree

This defines a scalar field — for each point (x,y), it tells you how far you are from the origin squared.

ree

Contour Plot of g(x, y) = x² + y²

 

The contour lines of this function are concentric circles centered at the origin, each corresponding to a fixed value of g(x, y). These are just slices through the bowl.

ree

These circles represent level sets (points where the function has the same value), just like in topographic maps.


Gradient Vector Field of the Constraint Function

 

Because g(x, y) = x² + y² is a 3D surface, it too has a gradient vector field. For this function :

                                           ∇g(x,y) = (2x,2y)

These vectors point directly outward from the origin, perpendicular to each contour circle.

ree

Notice that :

  • The vectors are perpendicular to the contour circles.

  • The farther from the origin, the larger the gradient vector.


The Key Insight : Tangency of Contour Lines

ree

Each function has its own set of contour lines. Our task is to find the point where a contour line of f(x,y) just touches the constraint circle defined by g(x,y) = 1.

ree

At this point of tangency :

  • Both contour lines touch at exactly one point.

  • Their gradients are perpendicular to their respective contour lines.

  • So, both gradients point in the same direction, even if their magnitudes differ.


Gradient Alignment = Lagrange Condition

 

This leads us to the mathematical condition for an optimum under constraint :

ree

Here :

  • ∇f(x,y) is the gradient of the function we are optimizing.

  • ∇g(x,y) is the gradient of the constraint function.

  • λ (lambda) is a scalar multiplier, known as the Lagrange multiplier.

This equation says :

At the optimal point, the direction in which the function increases fastest (gradient of f) is exactly aligned with the direction in which the constraint changes (gradient of g).

ree

Even if their magnitudes differ, they point along the same line, so one is a scalar multiple of the other.

 

Why This Matters

 

This condition — ∇f = λ∇g— is the core of Lagrange multipliers, a powerful method in optimization.

It lets us convert a constrained problem into a system of equations, which can be solved for the optimal point.

This is exactly what happens in Support Vector Machines :

  • SVMs optimize a function (maximize margin)

  • Subject to constraints (correct classification of data)

  • Using Lagrange multipliers to solve the problem


Final Summary of the Concept

 

Let’s wrap up this 3-part idea :

  1. Optimization on a surface :

    • You want to maximize a function f(x, y) subject to a constraint g(x,y) = c.

  2. Visualization :

    • The function’s surface is plotted in 3D.

    • The constraint is a curve (e.g., circle) lying on this surface.

    • The optimal point is where the surface is highest (or lowest) along the constraint.

  3. Gradient and Contour Interplay :

    • Contours = level curves of constant value.

    • Gradients = point in direction of steepest ascent.

    • At the optimal point, the contours of f and g touch (are tangent).

    • So, their gradients are aligned : ∇f = λ∇g.


Transition to SVM

 

What we’ve built here is the full geometric and mathematical foundation for how SVMs perform optimization under constraints.

In the next step (outside these notes), we would define the specific SVM objective and constraints, then use Lagrange multipliers to derive the solution — just like we did here.


Using Lagrange Multipliers to Maximize f(x,y) = x²y 

 

Subject to the constraint : x² + y² = 1

We are given :

  • Objective function : f(x,y) = x²y

  • Constraint : g(x,y) = x² + y² = 1

We use Lagrange multipliers, where the condition is :

                                         ∇f(x,y) = λ ∇g(x,y)

Here, λ is the Lagrange multiplier, and we compute gradients of f and g.


Step 1 : Compute the Gradients

 

Gradient of f(x,y) = x²y :

ree

Gradient of g(x, y) = x² + y² :

ree

Now applying the Lagrange condition :

ree

This gives us two equations :

  1. 2xy = λ ⋅ 2x

  2. x² = λ ⋅ 2y


Step 2 : Solve the Equations

 

From equation (1), assuming x ≠ 0, divide both sides by 2x :

                                            y = λ                               ….(A)

Substitute (A) into equation (2):

                                            x² = 2y²                       .…..(B)

Now substitute (B) into the constraint x² + y² = 1 :

ree
ree

Step 3 : Candidate Points

 

Possible values of (x,y) are :

ree

Step 4 : Evaluate f (x,y) = x²y at These Points


ree

Constraint Verification


Verify that each point satisfies x² + y² = 1 :

ree

So all candidate points satisfy the constraint .


Visualization of the Lagrange Multiplier Solution


ree

As we discussed earlier, there are two points that maximize the function f(x,y) =  x²y under the constraint x² + y² = 1 :

ree

This is the graphical representation of the solution. We’ve successfully solved a constrained optimization problem—we found the values of x and y that maximize z = f(x,y), while still satisfying the constraint x² + y² = 1 .


Why This Matters : Connecting to SVM

 

This simple example introduces the intuition behind solving constrained optimization problems, which is exactly what Support Vector Machines (SVM) require.

While our demo example involved maximizing a function subject to a geometric constraint (a circle), in SVM, the optimization problem is more complex and involves inequalities, slack variables, and regularization.


SVM Optimization Objective (Soft Margin SVM) :

 

We solve the following constrained optimization problem in soft-margin SVM :

ree

Final Intuition :

 

The SVM optimization problem is far more difficult than our example—but solving the simpler case first helps build the intuition.

Just like our toy problem required gradients, constraints, and optimality conditions, SVM also uses these ideas—along with Lagrange multipliers and dual formulations—to find the optimal separating hyperplane.

Understanding this foundation gives us the conceptual clarity we need before diving into the technical derivation of SVM.


Understanding Lagrange Multipliers

 

Very useful in Support Vector Machines (SVM)

We are given a constrained optimization problem :

ree

This is a problem where we want to maximize a function under a constraint.


Rewriting with Lagrange Multipliers

 

We can convert this constrained problem into a system of equations using Lagrange multipliers.

Let:

  • f(x, y) = x²y

  • g(x, y) = x² + y²

Then the condition becomes :

ree

This gives us a system of equations whose solution gives us the maximum of f(x, y) on the constraint curve g(x,y) = 1 .


Alternate Formulation — Lagrangian Function

 

Another way to approach this is to turn the constrained problem into an unconstrained optimization problem by incorporating the constraint into the objective function using a multiplier λ.

 

We define the Lagrangian function :

ree

That is :

ree

Now, to find the optimal solution, we solve :

ree

This approach transforms a constrained problem into an unconstrained one, where the constraint becomes a part of the equation itself.


Why This Matters for SVM

 

This technique is foundational in SVM because we also transform a constrained optimization (maximize margin under classification constraints) into an unconstrained one using Lagrange multipliers—then solve it using calculus and duality.

 

How the Lagrangian Was Formed

 

Let’s now understand how the Lagrangian method works and why it gives us the same results as before.

 

Recap of Our Problem


We are solving :

ree

We introduced the Lagrangian to turn this into an unconstrained problem :

ree

To find the stationary points, we set the partial derivatives to zero :

ree

Step-by-Step Derivation

 

1. Partial Derivative w.r.t. x :

ree

2. Partial Derivative w.r.t. y :

ree

3. Partial Derivative w.r.t. λ :

ree

This is just our original constraint.


Summary of the System

 

We now have a system of 3 equations :

ree

These are exactly the same equations we would have obtained using the gradient method :

ree

Final Thoughts

 

We have discovered a second, powerful way to handle constrained optimization. Rather than solving the constraint separately, we:

  • Embed the constraint into the objective function using the Lagrangian.

  • Convert the constrained problem into a standard optimization problem.

  • Solve using calculus by setting partial derivatives to zero.

This approach is central to many optimization problems, especially in Support Vector Machines (SVMs) and machine learning in general.






bottom of page