| 所在主题: | |
| 文件名: 多模态大模型关键技术学习资料.part5.rar | |
| 资料下载链接地址: https://bbs.pinggu.org/a-4175953.html | |
| 附件大小: | |
|
多模态大模型关键技术学习资料:多模态指令微调多模态思维链LLM辅助视觉推理多模态上下文学习
一、多模态指令微调 Visual Instruction Tuning. pdf Visual Instruction Tuning with Polite Flamingo. pdf X-LLM Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages. pdf DetGPT Detect What You Need via Reasoning. pdf Video-ChatGPT Towards Detailed Video Understanding via Large Vision and Language Models. pdf VisionLLM Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks. pdf VideoChat Chat-Centric Video Understanding. pdf Video-LLaMA An Instruction-tuned Audio-Visual Language Model for Video Understanding. pdf Shikra Unleashing Multimodal LLM's Referential Dialogue Magic. pdf PMC-VQA Visual Instruction Tuning for Medical Visual Question Answering. pdf PandaGPT One Model To Instruction-Follow Them All. pdf mPLUG-Owl Modularization Empowers Large Language Models with Multimodality. pdf Multilnstruct Improving Multi-Modal Zero-Shot Learning via Instruction Tuning. pdf MultiModal-GPT A Vision and Language Model for Dialogue with Humans. pdf LMEye An Interactive Perception Network for Large Language Models. pdf MiniGPT-4 Enhancing Vision-Language Understanding with Advanced Large Language Models. pdf LLaVAR Enhanced Visual Instruction Tuning for Text-Rich Image Understanding. pdf MIMIC-IT Multi-Modal In-Context Instruction Tuning. pdf Macaw-LLM Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration. pdf M3IT A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning. pdf LLaVA-Med Training a Large Language-and-Vision Assistant for Biomedicine in One Day. pdf Listen, Think, and Understand. pdf LLaMA-Adapter Efficient Fine-tuning of Language Models with Zero-init Attention. pdf LLaMA-Adapter V2 Parameter-Efficient Visual Instruction Model. pdf InstructBLIP Towards General-purpose Vision-Language Models with Instruction Tuning. pdf LAMM Lanauage-Assisted Multi-Modal Instruction-Tunina Dataset. Framework. and Benchmark. pdf 。。。。。。。。。。。。。。 二、多模态思维链 Visual Programming Compositional visual reasoning without training. pdf MM-REACT Prompting ChatGPT for Multimodal Reasoning and Action. pdf Learn to Explain Multimodal Reasoning via Thought Chains for Science Question Answering. pdf Visual Chain of Thought Bridging Logical Gaps with Multimodal Infillings. pdf Visual ChatGPT Talking, Drawing and Editing with Visual Foundation Models. pdf Let's Think Frame by Frame Evaluating Video Chain of Thought with Video Infilling and Prediction. pdf Multimodal Chain-of-Thought Reasoning in Language Models. pdf Chain of Thought Prompt Tuning in Vision Language Models. pdf EmbodiedGPT Vision-Language Pre-Training via Embodied Chain of Thought. pdf Caption Anything Interactive Image Description with Diverse Multimodal Controls. pdf Chameleon Plug-and-Play Compositional Reasoning with Large Language Models. pdf Explainable Multimodal Emotion Reasoning. pdf 三、LLM辅助视觉推理 ViperGPT Visual Inference via Python Execution for Reasoning. pdf Visual Programming Compositional visual reasoning without training. pdf SuS-X Training-Free Name-Only Transfer of Vision-Language Models. pdf Mindstorms in Natural Language-Based Societies of Mind. pdf Visual ChatGPT Talking, Drawing and Editing with Visual Foundation Models. pdf LayoutGPT Compositional Visual Planning and Generation with Large Language Models. pdf Socratic Models Composing Zero-Shot Multimodal Reasoning with Language. pdf MM-REACT Prompting ChatGPT for Multimodal Reasoning and Action. pdf Retrieving-to-Answer Zero-Shot Video Question Answering with Frozen Large Language Models. pdf Prompt, Generate, then Cache Cascade of Foundation Models makes Strong Few-shot Learners. pdf PointCLIP V2 Adapting CLIP for Powerful 3D Open-world Learning. pdf Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation. pdf ChatGPT Asks BLIP-2 Answers Automatic Questioning Towards Enriched Visual Descriptions. pdf GPT4Tools Teaching Large Language Model to Use Tools via Self-instruction. pdf HuggingGPT Solving Al Tasks with ChatGPT and its Friends in HuggingFace. pdf IdealGPT Iteratively Decomposing Vision and Language Reasoning via Large Language Models. pdf Caption Anything Interactive Image Description with Diverse Multimodal Controls. pdf AssistGPT A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn. pdf Chameleon Plug-and-Play Compositional Reasoning with Large Language Models. pdf 四、多模态上下文学习 Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition. pdf Link-Context Learning for Multimodal LLMs. pdf Multimodal Foundation Models For Echocardiogram Interpretation. pdf Proactive Human-Robot Interaction using Visuo-Lingual Transformers. pdf Lightweight In-Context Tuning for Multimodal Unified Models. pdf MMHQA-ICL Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images. pdf HowToCaption Prompting LLMs to Transform Video Annotations at Scale. pdf Large Language Models are Visual Reasoning Coordinators. pdf Language as the Medium Multimodal Video Classification through text only. pdf |
|
熟悉论坛请点击新手指南
|
|
| 下载说明 | |
|
1、论坛支持迅雷和网际快车等p2p多线程软件下载,请在上面选择下载通道单击右健下载即可。 2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。 3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品,拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知,将积极的采取必要措施;同时,本站也将在技术手段和能力范围内,履行版权保护的注意义务。 (如有侵权,欢迎举报) |
|
京ICP备16021002号-2 京B2-20170662号
京公网安备 11010802022788号
论坛法律顾问:王进律师
知识产权保护声明
免责及隐私声明