top of page
Exploring Opportunities in AI & Machine Learning


The Transformer Decoder Explained: Architecture, Math & Operations
A complete, step-by-step explanation of the Transformer decoder architecture, covering masked self-attention, cross-attention, feed-forward networks, and the final softmax output using an English-to-Hindi translation example.

Aryan
Mar 15


Masked Self Attention Explained: Why Transformers Are Autoregressive Only at Inference
Transformer decoders behave autoregressively during inference but allow parallel computation during training. This post explains why naive parallel self-attention causes data leakage and how masked self-attention solves this problem while preserving autoregressive behavior.

Aryan
Mar 10


Introduction to Transformers: The Neural Network Architecture Revolutionizing AI
Transformers are the foundation of modern AI systems like ChatGPT, BERT, and Vision Transformers. This article explains what Transformers are, how self-attention works, their historical evolution, impact on NLP and generative AI, advantages, limitations, and future directions—all explained clearly from first principles.

Aryan
Feb 14


How LSTMs Work: A Deep Dive into Gates and Information Flow
Long Short-Term Memory (LSTM) networks solve the limitations of traditional RNNs through a powerful gating mechanism. This article explains how the Forget, Input, and Output gates work internally, breaking down the math, vector dimensions, and intuition behind cell states and hidden states. A deep, implementation-level guide for serious deep learning practitioners.

Aryan
Feb 4


Backpropagation Through Time (BPTT) Explained Step-by-Step with a Simple RNN Example
Backpropagation in RNNs is often confusing because a single weight affects the loss through multiple time-dependent paths. In this post, I break down Backpropagation Through Time step by step using a small toy dataset, clearly showing how gradients flow across timesteps and why unfolding the network is necessary.

Aryan
Jan 28
bottom of page