From RMSProp to AdamW: The Optimizer Evolution Story
Tracing the evolution of modern neural network optimizers through the lens of what each was designed to fix: gradient scale heterogeneity, mini-batch noise, and regularization interference.
Tracing the evolution of modern neural network optimizers through the lens of what each was designed to fix: gradient scale heterogeneity, mini-batch noise, and regularization interference.
Coordinate Translations, Scaling, and State Transitions - A unified approach to linear algebra decompositions