Unpacking Manifold-Constrained Hyper-Connections: A Deep Dive into DeepSeek's Architecture
·26 mins
A technical deep dive into DeepSeek’s Manifold-Constrained Hyper-Connections (mHC), exploring how Doubly Stochastic Matrices and the Birkhoff Polytope solve gradient instability while expanding network capacity.