VidTr: Video Transformer Without Convolutions
Yanyi Zhang 1,2 *, Xinyu Li 1 *, Chunhui Liu 1 , Bing Shuai 1 , Yi Zhu 1 ,
Biagio Brattoli 1 , Hao Chen 1 , Ivan Marsic 2 and Joseph Tighe 1
1
Amazon Web Service; 2 Rutgers University
{xxnl,chunhliu,bshuai,yzaws,biagib,hxen,tighej}@amazon.com; {yz593,marsic}@rutgers.edu
Abstract videos but still rely on convoluational backbones [31, 55].
...


雷达卡




京公网安备 11010802022788号







