Unifying Vision-Language Representation Space
with Single-tower Transformer
Jiho Jang1 Chaerin Kong1 Donghyeon Jeon2 Seonhoon Kim3 Nojun Kwak1
1 2 3
Seoul National University NAVER Coupang
arXiv:2211.11153v1 [cs.LG] 21 Nov 2022
Figure 1: A truly unified vision-language representation s ...


雷达卡



京公网安备 11010802022788号







