STVGBert: A Visual-linguistic Transformer based Framework
for Spatio-temporal Video Grounding
Rui Su Qian Yu
Platform & Content Group, Tencent College of Software, Beihang University
rayruisu@tencent.com qianyu@buaa.edu.cn
Dong Xu*
School of Electrical and Information Engineering, The University of Sydney
dong.xu@sydney.edu.au
...


雷达卡




京公网安备 11010802022788号







