ViLBERT: Pretraining Task-Agnostic Visiolinguistic
Representations for Vision-and-Language Tasks
Jiasen Lu1 , Dhruv Batra1,3 , Devi Parikh1,3 , Stefan Lee1,2
1
Georgia Institute of Technology, 2 Oregon State University, 3 Facebook AI Research
Abstract
We present ViLBERT (short for Vision-and-Language BERT), a model for learning
task-agnostic joint representations of image content and natural language. We
extend the popular ...


雷达卡



京公网安备 11010802022788号







