摘要翻译:
无监督图像到图像翻译的目的是在源图像域($a$)和目标图像域($B$)之间找到映射,在许多应用中,对齐的图像对在训练中是不可用的。这是一个不适定的学习问题,因为它需要从边际推断联合概率分布。耦合映射的联合学习$F_{AB}:A\rightarrow B$和$F_{BA}:B\rightarrow A$通常被CycleGAN[Zhu et al.,2017]等最新方法所使用,通过在学习问题中引入循环一致性要求来学习这种翻译,即$F_{AB}(F_{BA}(B))\proxb$和$F_{BA}(F_{AB}(A))\proxa$。循环一致性保证了输入图像和翻译图像之间的互信息的保持。但是,它并没有显式地强制$f_{BA}$成为$f_{AB}$的反向操作。我们提出了一种新的深层体系结构,我们称之为可逆自动编码器(InvAuto)来显式地加强这种关系。这是通过强制编码器是解码器的反向版本来实现的,其中相应的层执行相反的映射并共享参数。这些映射被限制为正交映射。由此产生的体系结构导致可训练参数的数量减少(高达$2$倍)。我们在基准数据集上展示了图像翻译结果,并演示了我们方法的最先进性能。最后,我们在道路视频转换任务上测试了所提出的域自适应方法。我们证明了用InvAuto转换的视频具有高质量,并表明在真实道路视频上训练的NVIDIA基于神经网络的自动驾驶端到端学习系统PilotNet,在转换后的视频上测试时表现良好。
---
英文标题:
《Invertible Autoencoder for domain adaptation》
---
作者:
Yunfei Teng, Anna Choromanska, Mariusz Bojarski
---
最新提交年份:
2018
---
分类信息:
一级分类:Electrical Engineering and Systems Science 电气工程与系统科学
二级分类:Image and Video Processing 图像和视频处理
分类描述:Theory, algorithms, and architectures for the formation, capture, processing, communication, analysis, and display of images, video, and multidimensional signals in a wide variety of applications. Topics of interest include: mathematical, statistical, and perceptual image and video modeling and representation; linear and nonlinear filtering, de-blurring, enhancement, restoration, and reconstruction from degraded, low-resolution or tomographic data; lossless and lossy compression and coding; segmentation, alignment, and recognition; image rendering, visualization, and printing; computational imaging, including ultrasound, tomographic and magnetic resonance imaging; and image and video analysis, synthesis, storage, search and retrieval.
用于图像、视频和多维信号的形成、捕获、处理、通信、分析和显示的理论、算法和体系结构。感兴趣的主题包括:数学,统计,和感知图像和视频建模和表示;线性和非线性滤波、去模糊、增强、恢复和重建退化、低分辨率或层析数据;无损和有损压缩编码;分割、对齐和识别;图像渲染、可视化和打印;计算成像,包括超声、断层和磁共振成像;以及图像和视频的分析、合成、存储、搜索和检索。
--
一级分类:Computer Science 计算机科学
二级分类:Computer Vision and Pattern Recognition 计算机视觉与模式识别
分类描述:Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.
涵盖图像处理、计算机视觉、模式识别和场景理解。大致包括ACM课程I.2.10、I.4和I.5中的材料。
--
---
英文摘要:
The unsupervised image-to-image translation aims at finding a mapping between the source ($A$) and target ($B$) image domains, where in many applications aligned image pairs are not available at training. This is an ill-posed learning problem since it requires inferring the joint probability distribution from marginals. Joint learning of coupled mappings $F_{AB}: A \rightarrow B$ and $F_{BA}: B \rightarrow A$ is commonly used by the state-of-the-art methods, like CycleGAN [Zhu et al., 2017], to learn this translation by introducing cycle consistency requirement to the learning problem, i.e. $F_{AB}(F_{BA}(B)) \approx B$ and $F_{BA}(F_{AB}(A)) \approx A$. Cycle consistency enforces the preservation of the mutual information between input and translated images. However, it does not explicitly enforce $F_{BA}$ to be an inverse operation to $F_{AB}$. We propose a new deep architecture that we call invertible autoencoder (InvAuto) to explicitly enforce this relation. This is done by forcing an encoder to be an inverted version of the decoder, where corresponding layers perform opposite mappings and share parameters. The mappings are constrained to be orthonormal. The resulting architecture leads to the reduction of the number of trainable parameters (up to $2$ times). We present image translation results on benchmark data sets and demonstrate state-of-the art performance of our approach. Finally, we test the proposed domain adaptation method on the task of road video conversion. We demonstrate that the videos converted with InvAuto have high quality and show that the NVIDIA neural-network-based end-to-end learning system for autonomous driving, known as PilotNet, trained on real road videos performs well when tested on the converted ones.
---
PDF链接:
https://arxiv.org/pdf/1802.06869


雷达卡



京公网安备 11010802022788号







