top of page


Cross Attention in Transformers Explained: Self vs Cross Attention Step by Step
Cross attention is a key mechanism in transformer encoder–decoder models that allows the decoder to focus on relevant parts of the input sequence. This guide explains cross attention step by step, compares it with self-attention, and shows how output representations are formed using input context.

Aryan
3 days ago


Masked Self Attention Explained: Why Transformers Are Autoregressive Only at Inference
Transformer decoders behave autoregressively during inference but allow parallel computation during training. This post explains why naive parallel self-attention causes data leakage and how masked self-attention solves this problem while preserving autoregressive behavior.

Aryan
5 days ago
bottom of page