楼主: 2023Hua
371 1

[其他] 人大研究生论文资料扩展阅读材料:CVPR2023多模态、视听语言学习 、视觉-语言 [推广有奖]

  • 0关注
  • 22粉丝

已卖:1969份资源

大师

21%

还不是VIP/贵宾

-

威望
1
论坛币
546 个
通用积分
575.4231
学术水平
66 点
热心指数
130 点
信用等级
37 点
经验
114033 点
帖子
6755
精华
0
在线时间
2956 小时
注册时间
2022-8-18
最后登录
2026-1-19

楼主
2023Hua 在职认证  发表于 2025-4-18 10:56:57 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
人大研究生论文资料扩展阅读材料:CVPR2023多模态、视听语言学习 、视觉-语言

+多模态学习            181.0 MB
| Align and Attend:Multimodal Summarization with Dual Contrastive Losses.pdf             7.6 MB
| BiCro:Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency.pdf             10.6 MB
| CLIP2Scene:Towards Label-Efficient 3D Scene Understanding by CLIP.pdf             9.1 MB
| Decoupled Multimodal Distilling for Emotion Recognition.pdf             7.4 MB
| Detecting and Grounding Multi-Modal Media Manipulation.pdf             11.9 MB
| Emotional Reaction Intensity Estimation Based on Multimodal Data.pdf             6.7 MB
| Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce.pdf             22.9 MB
| MaPLe:Multi-modal Prompt Learning.pdf             14.5 MB
| MM-Diffusion:Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation.pdf             12.5 MB
| Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers.pdf             6.8 MB
| Multimodal Prompting with Missing Modalities for Visual Recognition.pdf             14.8 MB
| Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos.pdf             6.7 MB
| Quantum Multi-Model Fitting.pdf             10.8 MB
| Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.pdf             7.5 MB
| Towards Flexible Multi-modal Document Models.pdf             8.9 MB
| Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning.pdf             7.5 MB
| Uni-Perceiver v2:A Generalist Model for Large-Scale Vision and Vision-Language Tasks.pdf             7.6 MB
| Vita-CLIP:Video and text adaptive CLIP via Multimodal Prompting.pdf             7.3 MB
+视觉-语言            367.0 MB
| Accelerating Vision-Language Pretraining with Free Language Modeling.pdf             7.3 MB
| Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models.pdf             10.7 MB
| Blind Image Quality Assessment via Vision-Language Correspondence:A Multitask Learning Perspective.pdf             7.7 MB
| Connecting Vision and Language with Video Localized Narratives.pdf             22.0 MB
| CrowdCLIP:Unsupervised Crowd Counting via Vision-Language Model.pdf             12.1 MB
| FAME-ViL:Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks.pdf             16.8 MB
| GIVL:Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods.pdf             18.5 MB
| HOICLIP:Efficient Knowledge Transfer for HOI Detection with Vision-Language Models.pdf             8.3 MB
| IFSeg:Image-free Semantic Segmentation via Vision-Language Model.pdf             11.0 MB
| Improving Vision-and-Language Navigation by Generating Future-View Image Semantics.pdf             8.0 MB
| Is BERT Blind?Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding.pdf             14.9 MB
| KERM:Knowledge Enhanced Reasoning for Vision-and-Language Navigation.pdf             8.0 MB
| Lana:A Language-Capable Navigator for Instruction Following and Generation.pdf             11.2 MB
| Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing.pdf             11.4 MB
| Learning to Name Classes for Vision and Language Models.pdf             15.1 MB
| MAGVLT:Masked Generative Vision-and-Language Transformer.pdf             24.1 MB
| MAP:Multimodal Uncertainty-Aware Vision-Language Pre-training Model.pdf             9.5 MB
| Meta-Explore:Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding.pdf             9.0 MB
| Open-vocabulary Attribute Detection.pdf             35.1 MB
| Policy Adaptation from Foundation Model Feedback.pdf             10.3 MB
| PosterLayout:A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout.pdf             12.1 MB
| Seeing What You Miss:Vision-Language Pre-training with Semantic Completion Learning.pdf             8.7 MB
| SynthVSR:Scaling Up Visual Speech Recognition With Synthetic Supervision.pdf             7.5 MB
| Task Residual for Tuning Vision-Language Models.pdf             7.3 MB
| Test of Time:Instilling Video-Language Models with a Sense of Time.pdf             10.9 MB
| Towards Generalisable Video Moment Retrieval:Visual-Dynamic Injection to Image-Text Pre-Training.pdf             7.3 MB
| Turning a CLIP Model into a Scene Text Detector.pdf             8.8 MB
| Video-Text as Game Players:Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning.pdf             9.7 MB
| VILA:Learning Image Aesthetics from User Comments with Vision-Language Pretraining.pdf             15.4 MB
| VLPD:Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision.pdf             8.3 MB
+视听语言学习            159.0 MB
| A Light Weight Model for Active Speaker Detection.pdf             10.0 MB
| Audio-Visual Grouping Network for Sound Localization from Mixtures.pdf             8.1 MB
| CASP-Net:Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective.pdf             7.9 MB
| Dense-Localizing Audio-Visual Events in Untrimmed Videos:A Large-Scale Benchmark and Baseline.pdf             15.5 MB
| Egocentric Audio-Visual Object Localization.pdf             35.8 MB
| Fine-grained Audible Video Description.pdf             13.4 MB
| Language-Guided Audio-Visual Source Separation via Trimodal Consistency.pdf             8.8 MB
| Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning.pdf             10.7 MB
| Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos.pdf             23.3 MB
| Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment.pdf             18.1 MB
| Watch or Listen:Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring.pdf             7.3 MB
CVPR'23多模态学习论文及代码检索目录.pdf            290.0 KB



CVPR2023多模态.part1.rar (100 MB, 需要: RMB 29 元)
CVPR2023多模态.part2.rar (100 MB) CVPR2023多模态.part3.rar (100 MB) CVPR2023多模态.part4.rar (100 MB) CVPR2023多模态.part5.rar (100 MB, 需要: RMB 1 元) CVPR2023多模态.part6.rar (100 MB) CVPR2023多模态.part7.rar (81.22 MB)

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:人大研究生 研究生论文 论文资料 阅读材料 语言学

沙发
Kaka-2030(真实交易用户) 发表于 2025-4-20 15:27:08
感谢楼主,正需要一些资料填补我研究的空白

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
扫码
拉您进交流群
GMT+8, 2026-2-1 14:34