Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius 1 Heng Wang 1 Lorenzo Torresani 1 2
Abstract Video understanding shares several high-level similarities
with NLP. First of all, videos and sentences are both sequen-
We present a convolution-free approach to video
tial. Furthermore, precisely as the meaning of a word can
classification built exclusively ...


雷达卡




京公网安备 11010802022788号







