From RMSProp to AdamW: The Optimizer Evolution Story

Tracing the evolution of modern neural network optimizers through the lens of what each was designed to fix: gradient scale heterogeneity, mini-batch noise, and regularization interference.