ViViT: A Video Vision Transformer
Anurag Arnab Mostafa Dehghani Georg Heigold Chen Sun Mario Lucic Cordelia Schmid
Google Research
{aarnab, dehghani, heigold, chensun, lucic, cordelias}@google.com
Abstract that a pure-transformer based architecture has outperformed
its convolutional counterparts in image classification. Doso-
We present pure-transformer based models for video ...


雷达卡




京公网安备 11010802022788号







