top of page
Exploring Opportunities in AI & Machine Learning


Cross Attention in Transformers Explained: Self vs Cross Attention Step by Step
Cross attention is a key mechanism in transformer encoder–decoder models that allows the decoder to focus on relevant parts of the input sequence. This guide explains cross attention step by step, compares it with self-attention, and shows how output representations are formed using input context.

Aryan
Mar 12


Masked Self Attention Explained: Why Transformers Are Autoregressive Only at Inference
Transformer decoders behave autoregressively during inference but allow parallel computation during training. This post explains why naive parallel self-attention causes data leakage and how masked self-attention solves this problem while preserving autoregressive behavior.

Aryan
Mar 10
bottom of page