top of page

Exponential Weighted Moving Average (EWMA): Theory, Formula, Example & Intuition

  • Writer: Aryan
    Aryan
  • Dec 22, 2025
  • 4 min read

What is Exponential Weighted Moving Average (EWMA) ?


ree

Let's look at the time-series data in the graph above. On the x-axis, we have dates, and on the y-axis, we have the temperature of a city.

You will notice two distinct patterns:

  • The Red Line: This represents the actual daily temperature. It is noisy and fluctuates heavily.

  • The Black Line: This is the Exponential Weighted Moving Average (EWMA). It smooths out the noise to reveal the underlying trend.

Why do we use EWMA? EWMA is a powerful technique used to identify trends in time-series data, such as city temperatures or stock market prices. It effectively separates the "signal" from the "noise." Beyond basic analysis, it is widely used in:

  • Time series forecasting

  • Financial modeling

  • Signal processing

  • Deep Learning optimizers (like Adam or RMSprop)

The Core Logic When calculating a simple average, every data point is treated equally. EWMA is different because it is "weighted." It operates on two fundamental principles:

  1. Recency Matters: New data points are given higher "weightage" (importance) than older data points. We assume the present gives us better information about the immediate future than the past does.

  2. Exponential Decay: As time progresses, the importance of a specific data point doesn't just stop; it reduces gradually. The older the data becomes, the less influence it has on the current trend line.


Mathematical Formulation

 

To understand how the curve is generated, we look at the recursive formula:

Vₜ = β Vₜ₋₁ + (1 − β)θₜ

Where:

  • Vₜ : The weighted average at the current time t.

  • Vₜ₋₁ : The weighted average from the previous step (the history).

  • θₜ : The actual data point at the current time (e.g., today's temperature).

  • β : A hyperparameter between 0 and 1 that controls the smoothing.


Example Calculation

 

Let's apply this to a small dataset of city temperatures. Suppose we set β = 0.9 and initialize V₀ = 0 .

Day (Index)

Temperature (θₜ​)

D1

25

D2

13

D3

17

D4

31

 Step 1 (Day 1):

V₁ = 0.9(V₀) + 0.1(25) = 0 + 2.5 = 2.5

Step 2 (Day 2):

V₂ = 0.9(V₁) + 0.1(13) = 0.9(2.5) + 1.3 = 3.55

Step 3 (Day 3):

V₃ = 0.9(V₂) + 0.1(17) = 0.9(3.55) + 1.7 = 4.89


The Impact of Beta (β)

ree

The value of β dictates how far back into the past the model looks. A simple way to visualize this is :

ree
  • If β = 0.9 : We are averaging over roughly the last 10 days (1 / 0.1).

  • If β = 0.5 : We are averaging over the last 2 days (1 / 0.5).

  • If β = 0.98 : We are averaging over the last 50 days (1 / 0.02).


Visualizing the Trade-off

 

Looking at the comparison graphs, we can observe distinct behaviors:

  1. High Beta (e.g., 0.98): The curve is very smooth but reacts slowly to changes. By giving significant weight to the history (old points), the line adapts sluggishly to new trends.

  2. Low Beta (e.g., 0.5): The curve is spiky and noisy. It closely hugs the raw data because it prioritizes the current value (θₜ) over the history.

  3. The Sweet Spot (0.9): For most optimization algorithms and general trends, 0.9 is considered the optimal balance. It provides enough smoothing to remove noise while remaining responsive enough to capture recent trends.


Mathematical Intuition: The Proof

 

We previously stated that EWMA gives more importance to new data and less to old data. But can we prove this mathematically ?

Let's expand the recursive formula step-by-step to see exactly how the weights are assigned.

The Formula:

Vₜ = β Vₜ₋₁ + (1 − β)θₜ

 

The Expansion:

Let's assume we start from scratch (V₀ = 0 ) and observe how the equation evolves over 4 time steps.

  1. Time t = 1:

    V₁ = (1 − β)θ₁

  2. Time t = 2: (Substitute V₁ into the equation)

    V₂ = βV₁ + (1 − β)θ₂ = β[(1 − β)θ₁] + (1 − β)θ₂

    V₂ = (1 − β)[βθ₁ + θ₂]

  3. Time t = 3:

    V₃ = βV₂ + (1 − β)θ₃ = β²(1 − β)θ₁ + β(1 − β)θ₂ + (1 − β)θ₃

    V₃ = (1 − β)[β²θ₁ + βθ₂ + θ₃]

  4. Time t = 4:

    V₄ = βV₃ + (1 − β)θ₄ = β³(1 − β)θ₁ + β²(1 − β)θ₂ + β(1 − β)θ₃ + (1 − β)θ₄

    V₄ = (1 − β)[β³θ₁ + β²θ₂ + βθ₃ + θ₄]


 Look at the coefficients multiplying our data points (θ) in the final equation for V₄:

  • Current Data (θ₄): Multiplied by 1 (or β⁰).

  • Old Data (θ₃): Multiplied by β .

  • Older Data (θ₂): Multiplied by β² .

  • Oldest Data (θ₁): Multiplied by β³ .


Why does this matter ?

Remember that β is a fraction between 0 and 1 (e.g., 0.9).

When you take a fraction to a higher power, it becomes smaller:

β³ < β² < β < 1

This mathematically proves that as data points get older, their "weight" (coefficient) decays exponentially. The formula is naturally designed to prioritize the "now" while strictly reducing the influence of the "past."



bottom of page