| 所在主题: | |
| 文件名: CVPR2023多模态.part4.rar | |
| 资料下载链接地址: https://bbs.pinggu.org/a-7199753.html | |
| 附件大小: | |
|
人大研究生论文资料扩展阅读材料:CVPR2023多模态、视听语言学习 、视觉-语言
+多模态学习 181.0 MB | Align and Attend:Multimodal Summarization with Dual Contrastive Losses.pdf 7.6 MB | BiCro:Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency.pdf 10.6 MB | CLIP2Scene:Towards Label-Efficient 3D Scene Understanding by CLIP.pdf 9.1 MB | Decoupled Multimodal Distilling for Emotion Recognition.pdf 7.4 MB | Detecting and Grounding Multi-Modal Media Manipulation.pdf 11.9 MB | Emotional Reaction Intensity Estimation Based on Multimodal Data.pdf 6.7 MB | Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce.pdf 22.9 MB | MaPLe:Multi-modal Prompt Learning.pdf 14.5 MB | MM-Diffusion:Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation.pdf 12.5 MB | Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers.pdf 6.8 MB | Multimodal Prompting with Missing Modalities for Visual Recognition.pdf 14.8 MB | Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos.pdf 6.7 MB | Quantum Multi-Model Fitting.pdf 10.8 MB | Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.pdf 7.5 MB | Towards Flexible Multi-modal Document Models.pdf 8.9 MB | Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning.pdf 7.5 MB | Uni-Perceiver v2:A Generalist Model for Large-Scale Vision and Vision-Language Tasks.pdf 7.6 MB | Vita-CLIP:Video and text adaptive CLIP via Multimodal Prompting.pdf 7.3 MB +视觉-语言 367.0 MB | Accelerating Vision-Language Pretraining with Free Language Modeling.pdf 7.3 MB | Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models.pdf 10.7 MB | Blind Image Quality Assessment via Vision-Language Correspondence:A Multitask Learning Perspective.pdf 7.7 MB | Connecting Vision and Language with Video Localized Narratives.pdf 22.0 MB | CrowdCLIP:Unsupervised Crowd Counting via Vision-Language Model.pdf 12.1 MB | FAME-ViL:Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks.pdf 16.8 MB | GIVL:Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods.pdf 18.5 MB | HOICLIP:Efficient Knowledge Transfer for HOI Detection with Vision-Language Models.pdf 8.3 MB | IFSeg:Image-free Semantic Segmentation via Vision-Language Model.pdf 11.0 MB | Improving Vision-and-Language Navigation by Generating Future-View Image Semantics.pdf 8.0 MB | Is BERT Blind?Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding.pdf 14.9 MB | KERM:Knowledge Enhanced Reasoning for Vision-and-Language Navigation.pdf 8.0 MB | Lana:A Language-Capable Navigator for Instruction Following and Generation.pdf 11.2 MB | Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing.pdf 11.4 MB | Learning to Name Classes for Vision and Language Models.pdf 15.1 MB | MAGVLT:Masked Generative Vision-and-Language Transformer.pdf 24.1 MB | MAP:Multimodal Uncertainty-Aware Vision-Language Pre-training Model.pdf 9.5 MB | Meta-Explore:Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding.pdf 9.0 MB | Open-vocabulary Attribute Detection.pdf 35.1 MB | Policy Adaptation from Foundation Model Feedback.pdf 10.3 MB | PosterLayout:A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout.pdf 12.1 MB | Seeing What You Miss:Vision-Language Pre-training with Semantic Completion Learning.pdf 8.7 MB | SynthVSR:Scaling Up Visual Speech Recognition With Synthetic Supervision.pdf 7.5 MB | Task Residual for Tuning Vision-Language Models.pdf 7.3 MB | Test of Time:Instilling Video-Language Models with a Sense of Time.pdf 10.9 MB | Towards Generalisable Video Moment Retrieval:Visual-Dynamic Injection to Image-Text Pre-Training.pdf 7.3 MB | Turning a CLIP Model into a Scene Text Detector.pdf 8.8 MB | Video-Text as Game Players:Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning.pdf 9.7 MB | VILA:Learning Image Aesthetics from User Comments with Vision-Language Pretraining.pdf 15.4 MB | VLPD:Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision.pdf 8.3 MB +视听语言学习 159.0 MB | A Light Weight Model for Active Speaker Detection.pdf 10.0 MB | Audio-Visual Grouping Network for Sound Localization from Mixtures.pdf 8.1 MB | CASP-Net:Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective.pdf 7.9 MB | Dense-Localizing Audio-Visual Events in Untrimmed Videos:A Large-Scale Benchmark and Baseline.pdf 15.5 MB | Egocentric Audio-Visual Object Localization.pdf 35.8 MB | Fine-grained Audible Video Description.pdf 13.4 MB | Language-Guided Audio-Visual Source Separation via Trimodal Consistency.pdf 8.8 MB | Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning.pdf 10.7 MB | Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos.pdf 23.3 MB | Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment.pdf 18.1 MB | Watch or Listen:Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring.pdf 7.3 MB CVPR'23多模态学习论文及代码检索目录.pdf 290.0 KB |
|
熟悉论坛请点击新手指南
|
|
| 下载说明 | |
|
1、论坛支持迅雷和网际快车等p2p多线程软件下载,请在上面选择下载通道单击右健下载即可。 2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。 3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品,拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知,将积极的采取必要措施;同时,本站也将在技术手段和能力范围内,履行版权保护的注意义务。 (如有侵权,欢迎举报) |
|
京ICP备16021002号-2 京B2-20170662号
京公网安备 11010802022788号
论坛法律顾问:王进律师
知识产权保护声明
免责及隐私声明