Understanding and Improving Layer Normalization
Jingjing Xu1 , Xu Sun1,2, Zhiyuan Zhang1 , Guangxiang Zhao2 , Junyang Lin1
1
MOE Key Lab of Computational Linguistics, School of EECS, Peking University
2
Center for Data Science, Peking University
{jingjingxu,xusun,zzy1210,zhaoguangxiang,linjunyang}@pku.edu.cn
Abstract
Layer normalization (LayerNorm) is a technique to normalize the distributions
of in ...


雷达卡



京公网安备 11010802022788号







