Multiscale Vision Transformers
Haoqi Fan *, 1 Bo Xiong *, 1 Karttikeya Mangalam *, 1, 2
Yanghao Li *, 1 Zhicheng Yan 1 Jitendra Malik 1, 2 Christoph Feichtenhofer *, 1
1 2
Facebook AI Research UC Berkeley
Abstract
We present Multiscale Vision Transformers (MViT) for
video and image recognition, by connecting the seminal idea
of multiscale feature hierarchies with transformer models. ...


雷达卡




京公网安备 11010802022788号







