Rethinking Spatial Dimensions of Vision Transformers
Byeongho Heo1 Sangdoo Yun1 Dongyoon Han1 Sanghyuk Chun1 Junsuk Choe2 * Seong Joon Oh1
1 2
NAVER AI Lab Department of Computer Science and Engineering, Sogang University
Abstract eration on ImageNet [8]. As a result, a new direction of net-
work architectures based on self-attention mechanism, not
Vision Transformer (ViT) extend ...


雷达卡




京公网安备 11010802022788号







