Sequence-to-sequence models form the foundation of modern neural machine translation. In this article, I explain the encoder–decoder architecture from first principles, covering variable-length sequences, training with teacher forcing, backpropagation through time, prediction flow, and key improvements such as embeddings and deep LSTMs—using intuitive explanations and clear diagrams.