Tensor Programs IV:
Feature Learning in Infinite-Width Neural Networks
Greg Yang 1 Edward J. Hu 2 3
Abstract
As its width tends to infinity, a deep neural
network’s behavior under gradient descent can
become simplified and predictable (e.g. given
by the Neural Tangent Kernel (NTK)), if it
is parametrized appropriately (e.g. the NTK Figure 1. PCA of Word2Vec embeddings of top US cities and
parametrization). However, we sh ...


雷达卡




京公网安备 11010802022788号







