Exploring Opportunities in AI & Machine Learning

ARYAN UPADHYAY

AI/ML Systems • Deep Learning • MLOps

Hello

I’m Aryan. From exploring how AI works to building systems that run in the real world—that’s the journey I work on every day. I work across the full lifecycle, from understanding data and designing models to deploying systems that are reliable, scalable, and built for real use.

RESUME

ABOUT

Hello

I’m Aryan. From exploring how AI works to building systems that run in the real world—that’s the journey I work on every day. I work across the full lifecycle, from understanding data and designing models to deploying systems that are reliable, scalable, and built for real use.

RESUME

PROJECTS

Featured Blogs

Explore some of my featured blogs covering AI, ML, deep learning, and practical implementations.

Transformer inference workflow showing encoder–decoder architecture with masked autoregressive decoding steps, illustrating step-by-step token generation during inference.

Transformer Inference Explained: A Step-by-Step Guide to Autoregressive Decoding

A detailed, step-by-step explanation of how Transformer inference works, covering encoder outputs, autoregressive decoding, masked self-attention, cross-attention, and token-by-token generation with clear mathematical intuition.

Aryan

Mar 199 min read

Transformer decoder architecture explained with math and structure, showing stacked decoder blocks with masked self-attention, cross-attention using encoder outputs, layer normalization, feed-forward networks, and linear softmax for sequence generation.

The Transformer Decoder Explained: Architecture, Math & Operations

A complete, step-by-step explanation of the Transformer decoder architecture, covering masked self-attention, cross-attention, feed-forward networks, and the final softmax output using an English-to-Hindi translation example.

Aryan

Mar 157 min read

Cross attention in transformers explained visually, showing how the decoder uses query vectors to attend over encoder key and value representations, illustrated with an encoder–decoder architecture for sequence-to-sequence models.

Cross Attention in Transformers Explained: Self vs Cross Attention Step by Step

Cross attention is a key mechanism in transformer encoder–decoder models that allows the decoder to focus on relevant parts of the input sequence. This guide explains cross attention step by step, compares it with self-attention, and shows how output representations are formed using input context.

Aryan

Mar 126 min read

Transformer encoder architecture showing input embedding with positional encoding, multi-head self-attention, feed-forward neural network, residual connections, and layer normalization in a stacked encoder block.

Transformer Encoder Architecture Explained Step by Step (With Intuition)

A clear, step-by-step explanation of the Transformer encoder architecture, covering tokenization, positional encoding, self-attention, feed-forward networks, residual connections, and why multiple encoder blocks are used.

Aryan

Mar 88 min read

R-CNN object detection pipeline diagram showing input image, selective search region proposals, CNN feature extraction, SVM classification, bounding box regression, and final detected car with confidence score.

R-CNN Explained: A Comprehensive Guide to Object Detection Architecture

Unlock the mechanics of Object Detection with our deep dive into R-CNN. Moving beyond simple image classification, this guide explores how machines localize objects using Bounding Boxes, Selective Search, and Support Vector Machines. Whether you are calculating IoU or understanding the transition from sliding windows to smart proposals, this article covers the complete R-CNN architecture and evaluation metrics.

Aryan

Feb 2416 min read

Diagram illustrating scaled dot-product self-attention in transformers, showing query, key, and value matrices, the softmax(Q·Kᵀ/√dₖ) equation, variance scaling for stable gradients, and the transition from high variance to stable attention distributions in deep learning models.

Scaled Dot-Product Attention Explained: Why We Divide by √dₖ in Transformers

Scaled dot-product attention is a core component of Transformer models, but why do we divide by √dₖ before applying softmax? This article explains the variance growth problem in high-dimensional dot products, the role of scaling in stabilizing softmax, and the mathematical intuition that makes attention training reliable and effective.

Aryan

Feb 215 min read

Exploring Opportunities in AI & Machine Learning

ARYAN UPADHYAY

Hello

Featured Blogs

Transformer Inference Explained: A Step-by-Step Guide to Autoregressive Decoding

The Transformer Decoder Explained: Architecture, Math & Operations

Cross Attention in Transformers Explained: Self vs Cross Attention Step by Step

Transformer Encoder Architecture Explained Step by Step (With Intuition)

R-CNN Explained: A Comprehensive Guide to Object Detection Architecture

Scaled Dot-Product Attention Explained: Why We Divide by √dₖ in Transformers

© 2025 Aryan Upadhyay |