What Shopify's production generative recommender reveals about building on HSTU: time encoding for seasonality, negative sampling as the primary scaling lever, and training for incremental recall within an ensemble.
How production retrieval systems learn to rank a billion items, tracing the evolution of negative sampling from random batches through hard mining, bias correction, and ANCE.
A mechanistic deep dive into how generative recommender systems work: from Semantic IDs and RQ-VAE to HSTU, M-FALCON, and production deployment at Meta, Kuaishou, and beyond.
Tracing the evolution of modern neural network optimizers through the lens of what each was designed to fix: gradient scale heterogeneity, mini-batch noise, and regularization interference.
Coordinate Translations, Scaling, and State Transitions - A unified approach to linear algebra decompositions