一、视觉理解
PDFVQA A New Dataset for Real-World VQA on PDF
Documents.pdf Cream Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models.pdf
TouchStone Evaluating Vision-Language Models by Language Models.pdf
UReader Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.pdf
On the Performance of Multimodal Language Models.pdf
LLaVAR Enhanced Visual Instruction Tuning for Text-Rich Image Understanding.pdf
Multimodal Transformer for Multimodal Machine Translation.pdf
mPLUG-DocOwl Modularized Multimodal Large Language Model for Document Understanding.pdf
M3IT A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning.pdf
DocFormerv2 Local Features for Document Understanding.pdf
二、统一视觉模型
VLMO Unified Vision-Language Pre-Training with.pdf
You Need Multiple Exiting Dynamic Early Exiting for.pdf
Unified Vision-Language Pre-Training for Image Captioning and VQA.pdf
BLIP Bootstrapping Language-Image Pre-training for.pdf
Pro-tuning Unified Prompt Tuning for Vision Tasks.pdf
UNIFIED VISION AND LANGUAGE PROMPT LEARNING.pdf
三、视觉生成
TextPainter Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design.pdf
Multimodal Prompt Retrieval for Generative Visual Question Answering.pdf
Opal Multimodal Image Generation for News Illustration.pdf
Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation.pdf
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos.pdf
KM-BART Knowledge Enhanced Multimodal BART for Visual Commonsense Generation.pdf
Multimodal Differential Network for Visual Question Generation.pdf
Enabling Robots to Draw and Tell Towards Visually Grounded Multimodal Description Generation.pdf
Generation of Multimodal Justification Using Visual Word Constraint Model for Explainable Computer-Aided Diagnosis.pdf
四、多模态agent
The Importance of Multimodal Emotion Conditioning and Affect Consistency for Embodied Conversational Agents.pdf
Guide Your Agent with Adaptive Multimodal Rewards.pdf
You Only Look at Screens Multimodal Chain-of-Action Agents.pdf
Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback.pdf
SPRING Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph.pdf
Multimodal Speech Recognition for Language-Guided Embodied Agents.pdf
Clinically-Inspired Multi-Agent Transformers for Disease Trajectory Forecasting from Multimodal Data.pdf
Instruction-Following Agents with Multimodal Transformer.pdf
A Contextualized Real-Time Multimodal Emotion Recognition for Conversational Agents using Graph Convolutional Networks in Reinforcement Learning.pdf
五、LLM加持的多模态大模型
MM-Vet Evaluating Large Multimodal Models.pdf
X-LLM Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages.pdf
Contextual Object Detection with Multimodal Large Language Models.pdf
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models.pdf
MME A Comprehensive Evaluation Benchmark for Multimodal Large Language Models.pdf
SCITUNE Aligning Large Language Models with Scientific Multimodal.pdf
多模态大模型研究方向学习资料.part1.rar
(98 MB, 需要: RMB 19 元)
多模态大模型研究方向学习资料.part2.rar
(89.21 MB, 需要: RMB 10 元)


雷达卡


京公网安备 11010802022788号







