What Is LSTM? Long Short-Term Memory Explained Clearly

Aryan
Feb 2
5 min read

LSTM — Core Idea (Explained Through a Story)

To understand the intuition behind LSTMs, let’s walk through a simple story.

There was once a brave and kind king who deeply loved his people. His state was peaceful and everyone admired him. One day, the neighbouring king attacked, a war broke out, and although our king fought bravely and won, he died at the end. The people were heartbroken.

After some years, his younger son became the king. He was brave like his father, even more powerful, and very devoted to his people. But he lived with one obsession — to take revenge for his father’s death. He attacked the neighbour but was defeated and eventually died. His son also had a son, and this grandson was not physically strong, but extremely intelligent and thought differently. He too wanted revenge for the deaths of his father and grandfather. He attacked the neighbour, lost at first, but eventually outsmarted him and succeeded.

Now pause for a moment.

How did you decide whether the story was good or bad?

Which parts did you remember the most?

When our brain gets sequential information, it processes it step by step. First you built context about the king, his sons, the generations, and then the revenge arc. While doing this, your brain maintained two types of context:

1. Short-term context

What is happening right now in the story — current events, the sequence of kings, who is fighting whom, etc.

2. Long-term context

Important facts carried throughout the story:

it’s an ancient story
no guns or modern concepts
the geography of the kingdoms
major characters (king → son → grandson)

Your brain automatically decides what should be stored long-term and what should be ignored. For example:

Initially, the king seems like the hero → added to long-term context
Then he dies → replaced
His son seems important → added
Then he dies → replaced
Finally the grandson wins → kept as the final long-term memory

So when asked, “How was the story?”, your judgment comes from the long-term context, not the short-term events.

How This Relates to RNNs

Traditional RNNs only have one kind of state: the hidden state.

This creates a problem:

They are forced to store both short-term and long-term information in a single space. Because of mathematical limitations (vanishing gradients), the short-term context dominates and long-term information gets lost.

RNNs behave like someone who remembers only the last parts of the story — for example, remembering only the grandson and forgetting the original king entirely.

This made scientists realise:

We need a mechanism to preserve long-term information separately from short-term details.

So they created an architecture that maintains two paths instead of one:

Lower path → short-term memory
Upper path → long-term memory (cell state)

Because of this design, if something is important in the early steps, it can still survive until the end.

For example:

“Ankita is a great girl.”
Next sentence: “She is a state topper.”

Your brain uses long-term memory (“Ankita → she”) to choose the correct pronoun.

If later you read:

“MrBeast is a YouTuber… He made different videos.”

Now the long-term memory updates “he”.

This is exactly how an LSTM chooses what to remember, what to forget, and what to update.

The Core Idea of LSTMs

LSTMs introduce an additional long-term memory path that traditional RNNs do not have.

Important information is stored here and remains until explicitly removed.

Key differences from RNNs:

1. Two states instead of one

Short-term memory (hidden state)
Long-term memory (cell state)

2. More complex architecture

Because LSTMs must:

maintain both memories
and communicate between them

To handle this communication, LSTMs use gates (forget, input, output), which results in a more complex but much more powerful architecture.

This ability to keep long-term information while still processing short-term changes is what makes LSTMs so effective for sequential data.

ARCHITECTURAL DIFFERENCE

When we compare an RNN to an LSTM, the first major difference is that an RNN maintains only one state, while an LSTM maintains two states:

Short-term memory (hidden state), shown as the lower path
Long-term memory (cell state), shown as the upper path

You can clearly see this in the diagram at t = 1: the LSTM has two separate lines representing two memory flows, while the RNN has only one.

The second major difference is the architecture.

RNNs have a simple structure because they only update a single memory.

LSTMs, on the other hand, have a more complex structure because they need to communicate between short-term and long-term memory. This internal circuit manages what to keep, what to forget, and what to update.

To simplify this: the internal components you see inside the LSTM cell are gates — mechanisms that control the flow of information.

The Three Gates in an LSTM

If we break an LSTM cell into parts, it contains three gates:

Forget Gate
Input Gate
Output Gate

Let’s understand them with simple intuition and connect them back to the ancient story example from earlier.

1. Forget Gate

Takes the current input and short-term memory as its inputs.
Decides what to remove from long-term memory.

Example:

In our story, the first king was important at the beginning, so he stayed in long-term memory. After he died, he was no longer relevant. The forget gate removes him from long-term memory.

In simple words:

Forget gate decides what should no longer stay in long-term memory.

2. Input Gate

Decides what new information should be added to long-term memory.

Example:

When the king’s son becomes the new important character, the input gate adds him to the long-term memory.

In simple words:

Input gate decides what new information should be stored long-term.

3. Output Gate

Determines what the cell should output at the current time step.
Uses both current input and long-term memory to decide the output.
Also generates the short-term memory that will be passed to the next state.

Example:

When the story continues, the output gate decides what part of the long-term memory (e.g., “who is the current hero?”) should influence the next step.

In simple words:

Output gate decides the current output and produces the next short-term memory.

Why This Architecture Is Powerful

The combination of:

a long-term memory path
a short-term memory path
and three gates controlling their interaction

…gives LSTMs the ability to remember important information for a long duration, while still adapting to new inputs.

This solves the biggest limitation of RNNs, which often forget essential long-term dependencies.

SUMMARY

LSTM works very much like a computer. A computer takes an input, performs some processing, and produces an output. The same idea applies to an LSTM. At every time step t, we give the LSTM a set of inputs, it processes them internally, and then gives two outputs.

At time t, the inputs are:

the long-term memory from the previous step Cₜ₋₁
the short-term memory from the previous step hₜ₋₁
the current input word xₜ

So the LSTM receives the previous timestamp’s memories (both short-term and long-term) along with the current input.

After internal processing, the LSTM produces two outputs:

Updated long-term memory Cₜ for the next state
Updated short-term memory hₜ for the next state

Inside the cell, two major updates happen:

the long-term memory is modified
a new short-term memory is created for the next stage

This is the overall summary of how an LSTM operates.

What Is LSTM? Long Short-Term Memory Explained Clearly

Recent Posts

© 2025 Aryan Upadhyay |