楼主: Mama-2022
324 0

[其他] 多模态大模型:视觉理解统一视觉模型视觉生成多模态agent/LLM加持的多模态大模型 [推广有奖]

  • 0关注
  • 14粉丝

已卖:1300份资源

院士

94%

还不是VIP/贵宾

-

威望
0
论坛币
754 个
通用积分
309.2008
学术水平
25 点
热心指数
114 点
信用等级
16 点
经验
67453 点
帖子
2893
精华
0
在线时间
1727 小时
注册时间
2022-5-14
最后登录
2026-4-12

楼主
Mama-2022 发表于 2024-7-1 21:27:30 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
多模态大模型研究方向学习资料:视觉理解统一视觉模型视觉生成多模态agent/LLM加持的多模态大模型
一、视觉理解
PDFVQA A New Dataset for Real-World VQA on PDF
Documents.pdf Cream Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models.pdf
TouchStone Evaluating Vision-Language Models by Language Models.pdf
UReader Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.pdf
On the Performance of Multimodal Language Models.pdf
LLaVAR Enhanced Visual Instruction Tuning for Text-Rich Image Understanding.pdf
Multimodal Transformer for Multimodal Machine Translation.pdf
mPLUG-DocOwl Modularized Multimodal Large Language Model for Document Understanding.pdf
M3IT A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning.pdf
DocFormerv2 Local Features for Document Understanding.pdf

二、统一视觉模型
VLMO Unified Vision-Language Pre-Training with.pdf
You Need Multiple Exiting Dynamic Early Exiting for.pdf
Unified Vision-Language Pre-Training for Image Captioning and VQA.pdf
BLIP Bootstrapping Language-Image Pre-training for.pdf
Pro-tuning Unified Prompt Tuning for Vision Tasks.pdf
UNIFIED VISION AND LANGUAGE PROMPT LEARNING.pdf

三、视觉生成
TextPainter Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design.pdf
Multimodal Prompt Retrieval for Generative Visual Question Answering.pdf
Opal Multimodal Image Generation for News Illustration.pdf
Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation.pdf
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos.pdf
KM-BART Knowledge Enhanced Multimodal BART for Visual Commonsense Generation.pdf
Multimodal Differential Network for Visual Question Generation.pdf
Enabling Robots to Draw and Tell Towards Visually Grounded Multimodal Description Generation.pdf
Generation of Multimodal Justification Using Visual Word Constraint Model for Explainable Computer-Aided Diagnosis.pdf

四、多模态agent
The Importance of Multimodal Emotion Conditioning and Affect Consistency for Embodied Conversational Agents.pdf
Guide Your Agent with Adaptive Multimodal Rewards.pdf
You Only Look at Screens Multimodal Chain-of-Action Agents.pdf
Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback.pdf
SPRING Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph.pdf
Multimodal Speech Recognition for Language-Guided Embodied Agents.pdf
Clinically-Inspired Multi-Agent Transformers for Disease Trajectory Forecasting from Multimodal Data.pdf
Instruction-Following Agents with Multimodal Transformer.pdf
A Contextualized Real-Time Multimodal Emotion Recognition for Conversational Agents using Graph Convolutional Networks in Reinforcement Learning.pdf

五、LLM加持的多模态大模型

MM-Vet Evaluating Large Multimodal Models.pdf
X-LLM Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages.pdf
Contextual Object Detection with Multimodal Large Language Models.pdf
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models.pdf
MME A Comprehensive Evaluation Benchmark for Multimodal Large Language Models.pdf
SCITUNE Aligning Large Language Models with Scientific Multimodal.pdf


多模态大模型研究方向学习资料.part1.rar (98 MB, 需要: RMB 19 元)
多模态大模型研究方向学习资料.part2.rar (89.21 MB, 需要: RMB 10 元)



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:agent 多模态 Age LLM Conversation

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
扫码
拉您进交流群
GMT+8, 2026-4-19 04:14