搜索
人大经济论坛 附件下载

附件下载

所在主题:
文件名:  CVPR2023多模态.part6.rar
资料下载链接地址: https://bbs.pinggu.org/a-7199761.html
附件大小:
人大研究生论文资料扩展阅读材料:CVPR2023多模态、视听语言学习 、视觉-语言

+多模态学习 181.0 MB
| Align and Attend:Multimodal Summarization with Dual Contrastive Losses.pdf 7.6 MB
| BiCro:Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency.pdf 10.6 MB
| CLIP2Scene:Towards Label-Efficient 3D Scene Understanding by CLIP.pdf 9.1 MB
| Decoupled Multimodal Distilling for Emotion Recognition.pdf 7.4 MB
| Detecting and Grounding Multi-Modal Media Manipulation.pdf 11.9 MB
| Emotional Reaction Intensity Estimation Based on Multimodal Data.pdf 6.7 MB
| Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce.pdf 22.9 MB
| MaPLe:Multi-modal Prompt Learning.pdf 14.5 MB
| MM-Diffusion:Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation.pdf 12.5 MB
| Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers.pdf 6.8 MB
| Multimodal Prompting with Missing Modalities for Visual Recognition.pdf 14.8 MB
| Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos.pdf 6.7 MB
| Quantum Multi-Model Fitting.pdf 10.8 MB
| Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.pdf 7.5 MB
| Towards Flexible Multi-modal Document Models.pdf 8.9 MB
| Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning.pdf 7.5 MB
| Uni-Perceiver v2:A Generalist Model for Large-Scale Vision and Vision-Language Tasks.pdf 7.6 MB
| Vita-CLIP:Video and text adaptive CLIP via Multimodal Prompting.pdf 7.3 MB
+视觉-语言 367.0 MB
| Accelerating Vision-Language Pretraining with Free Language Modeling.pdf 7.3 MB
| Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models.pdf 10.7 MB
| Blind Image Quality Assessment via Vision-Language Correspondence:A Multitask Learning Perspective.pdf 7.7 MB
| Connecting Vision and Language with Video Localized Narratives.pdf 22.0 MB
| CrowdCLIP:Unsupervised Crowd Counting via Vision-Language Model.pdf 12.1 MB
| FAME-ViL:Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks.pdf 16.8 MB
| GIVL:Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods.pdf 18.5 MB
| HOICLIP:Efficient Knowledge Transfer for HOI Detection with Vision-Language Models.pdf 8.3 MB
| IFSeg:Image-free Semantic Segmentation via Vision-Language Model.pdf 11.0 MB
| Improving Vision-and-Language Navigation by Generating Future-View Image Semantics.pdf 8.0 MB
| Is BERT Blind?Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding.pdf 14.9 MB
| KERM:Knowledge Enhanced Reasoning for Vision-and-Language Navigation.pdf 8.0 MB
| Lana:A Language-Capable Navigator for Instruction Following and Generation.pdf 11.2 MB
| Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing.pdf 11.4 MB
| Learning to Name Classes for Vision and Language Models.pdf 15.1 MB
| MAGVLT:Masked Generative Vision-and-Language Transformer.pdf 24.1 MB
| MAP:Multimodal Uncertainty-Aware Vision-Language Pre-training Model.pdf 9.5 MB
| Meta-Explore:Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding.pdf 9.0 MB
| Open-vocabulary Attribute Detection.pdf 35.1 MB
| Policy Adaptation from Foundation Model Feedback.pdf 10.3 MB
| PosterLayout:A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout.pdf 12.1 MB
| Seeing What You Miss:Vision-Language Pre-training with Semantic Completion Learning.pdf 8.7 MB
| SynthVSR:Scaling Up Visual Speech Recognition With Synthetic Supervision.pdf 7.5 MB
| Task Residual for Tuning Vision-Language Models.pdf 7.3 MB
| Test of Time:Instilling Video-Language Models with a Sense of Time.pdf 10.9 MB
| Towards Generalisable Video Moment Retrieval:Visual-Dynamic Injection to Image-Text Pre-Training.pdf 7.3 MB
| Turning a CLIP Model into a Scene Text Detector.pdf 8.8 MB
| Video-Text as Game Players:Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning.pdf 9.7 MB
| VILA:Learning Image Aesthetics from User Comments with Vision-Language Pretraining.pdf 15.4 MB
| VLPD:Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision.pdf 8.3 MB
+视听语言学习 159.0 MB
| A Light Weight Model for Active Speaker Detection.pdf 10.0 MB
| Audio-Visual Grouping Network for Sound Localization from Mixtures.pdf 8.1 MB
| CASP-Net:Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective.pdf 7.9 MB
| Dense-Localizing Audio-Visual Events in Untrimmed Videos:A Large-Scale Benchmark and Baseline.pdf 15.5 MB
| Egocentric Audio-Visual Object Localization.pdf 35.8 MB
| Fine-grained Audible Video Description.pdf 13.4 MB
| Language-Guided Audio-Visual Source Separation via Trimodal Consistency.pdf 8.8 MB
| Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning.pdf 10.7 MB
| Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos.pdf 23.3 MB
| Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment.pdf 18.1 MB
| Watch or Listen:Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring.pdf 7.3 MB
CVPR'23多模态学习论文及代码检索目录.pdf 290.0 KB








    熟悉论坛请点击新手指南
下载说明
1、论坛支持迅雷和网际快车等p2p多线程软件下载,请在上面选择下载通道单击右健下载即可。
2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。
3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品,拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知,将积极的采取必要措施;同时,本站也将在技术手段和能力范围内,履行版权保护的注意义务。
(如有侵权,欢迎举报)
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

GMT+8, 2026-2-1 16:11