Attention is not all you need:
pure attention loses rank doubly exponentially with depth
Yihe Dong 1 Jean-Baptiste Cordonnier 2 Andreas Loukas 3
Abstract attention layers. Surprisingly, we find that pure self-
attention networks (SANs), i.e., transformers with skip con-
Attention-based architectures have become ubiq- nections and multi-layer perceptrons (MLPs) disabled, lose
uitous in machine learnin ...


雷达卡




京公网安备 11010802022788号







