Deep Learning Optimization | Aryan | AI/ML

Exploring Opportunities in AI & Machine Learning

Layer normalization in deep learning illustrated with neural network nodes and connections, highlighting stable training in transformer architectures

Layer Normalization Explained: Why Transformers Prefer It Over Batch Norm

Layer Normalisation is a core component of modern Transformer architectures. This article explains normalization fundamentals, internal covariate shift, why batch normalization fails in self-attention, and how layer normalization works mathematically inside Transformers—step by step with clear examples.

Aryan

Mar 6

A futuristic, dark-themed illustration depicting a neural network on the left with fading connections that represent the vanishing gradient problem. On the right, glowing control dials and sliders symbolize the hyperparameter tuning and optimization techniques used to restore the network's performance.

The Vanishing Gradient Problem & How to Optimize Neural Network Performance

This blog explains the Vanishing Gradient Problem in deep neural networks—why gradients shrink, how it stops learning, and proven fixes like ReLU, BatchNorm, and Residual Networks. It also covers essential strategies to improve neural network performance, including hyperparameter tuning, architecture optimization, and troubleshooting common training issues.

Aryan

Nov 28, 2025

Exploring Opportunities in AI & Machine Learning

Layer Normalization Explained: Why Transformers Prefer It Over Batch Norm

The Vanishing Gradient Problem & How to Optimize Neural Network Performance

© 2025 Aryan Upadhyay |