深生成模型(DGM)是数据浏览的。从本质上讲,这是因为在有限数据上学习一个复杂的模型,遭受了较大的差异和容易过度的折磨。受\ emph {偏见 - 变化困境}的启发,我们提出了\ emph {正则化的深生成模型}(reg-dgm),该模型}(reg-dgm)利用了不可转移的预训练模型来减少具有有限数据的生成模型的变异。正式地,Reg-DGM优化了数据分布与DGM之间一定差异的加权总和,以及预先训练的模型W.R.T.定义的能量函数的期望。 DGM。从理论上讲,我们表征了Reg-DGM在非参数环境中全球最小值的存在和独特性,并严格证明Reg-DGM W.R.T.的统计益处。在一个简单而代表性的高斯拟合示例中,平均误差和预期风险。从经验上讲,在Reg-DGM中指定DGM和预训练的模型是非常灵活的。尤其是,使用RESNET-18分类器在ImageNet上进行了预先培训和数据依赖性能量功能,Reg-DGM始终在几个基准上改善了强大的DGM的生成性能,包括StyleGAN2和ADA在几个基准上,具有有限的数据,并为国家取得了竞争性的结果 - 艺术方法。
translated by 谷歌翻译
非凸松弛方法已被广泛用于张量恢复问题,并且与凸松弛方法相比,可以实现更好的恢复结果。在本文中,提出了一种新的非凸函数,最小值对数凹点(MLCP)函数,并分析了其某些固有属性,其中有趣的是发现对数函数是MLCP的上限功能。所提出的功能概括为张量病例,得出张量MLCP和加权张量$ l \ gamma $ -norm。考虑到将其直接应用于张量恢复问题时无法获得其明确解决方案。因此,给出了解决此类问题的相应等效定理,即张量等效的MLCP定理和等效加权张量$ l \ gamma $ -norm定理。此外,我们提出了两个基于EMLCP的经典张量恢复问题的模型,即低秩量张量完成(LRTC)和张量稳健的主组件分析(TRPCA)以及设计近端替代线性化最小化(棕榈)算法以单独解决它们。此外,基于Kurdyka - {\ l} ojasiwicz属性,证明所提出算法的溶液序列具有有限的长度并在全球范围内收敛到临界点。最后,广泛的实验表明,提出的算法取得了良好的结果,并证实MLCP函数确实比最小化问题中的对数函数更好,这与理论特性的分析一致。
translated by 谷歌翻译
张量恢复是计算机视觉和机器学习中的重要问题。它通常使用张量排名的凸松弛和$ l_ {0} $ norm,即分别为核定标准和$ l_ {1} $ norm,以解决此类问题。已知凸的近似值会产生偏置的估计量。为了克服这个问题,采用并设计了相应的非凸照器。受到最近开发的矩阵等效最小值凸额(EMCP)定理的启发,本文确定了张量当量的最小值 - concave惩罚(TEMCP)的定理。张量当量MCP(TEMCP)作为非凸照正规器零件和等效加权张量$ \ gamma $ norm(EWTGN)作为低级别部分的构建,两者都可以实现权重适应性。同时,我们提出了两个相应的自适应模型,用于两个经典的张量恢复问题,即低级张量完成(LRTC)和张量鲁棒的主成分分析(TRPCA),其中优化算法基于交替的方向乘数(ADMM)。设计了这种新型的迭代自适应算法,可以产生更准确的张量恢复效果。对于张量的完成模型,考虑了多光谱图像(MSI),磁共振成像(MRI)和彩色视频(CV)数据,而对于张量的稳定性主成分分析模型,高光谱图像(HSI)在高斯噪声和盐和盐和盐和盐和盐和盐和盐和盐和盐和考虑了胡椒噪声。所提出的算法优于ARTS方法,并且通过实验保证其降低和收敛性。
translated by 谷歌翻译
张量稀疏建模是一种有希望的方法,在整个科学和工程学中,取得了巨大的成功。众所周知,实际应用中的各种数据通常由多种因素产生,因此使用张量表示包含多个因素内部结构的数据。但是,与矩阵情况不同,构建合理的稀疏度量张量是一项相对困难且非常重要的任务。因此,在本文中,我们提出了一种称为张量全功能度量(FFM)的新张量稀疏度度量。它可以同时描述张量的每个维度的特征信息以及两个维度之间的相关特征,并将塔克等级与张量管等级连接。这种测量方法可以更全面地描述张量的稀疏特征。在此基础上,我们建立了其非凸放松,并将FFM应用于低级张量完成(LRTC)和张量鲁棒的主成分分析(TRPCA)。提出了基于FFM的LRTC和TRPCA模型,并开发了两种有效的交替方向乘数法(ADMM)算法来求解所提出的模型。各种实际数值实验证实了超出最先进的方法的优势。
translated by 谷歌翻译
低等级张量完成(LRTC)问题引起了计算机视觉和信号处理的极大关注。如何获得高质量的图像恢复效果仍然是目前要解决的紧急任务。本文提出了一种新的张量$ l_ {2,1} $最小化模型(TLNM),该模型(TLNM)集成了总和核标准(SNN)方法,与经典的张量核定常(TNN)基于张量的张量完成方法不同,与$ L_ { 2,1} $ norm和卡塔尔里亚尔分解用于解决LRTC问题。为了提高图像的局部先验信息的利用率,引入了总变化(TV)正则化项,从而导致一类新的Tensor $ L_ {2,1} $ NORM Minimization,总变量模型(TLNMTV)。两个提出的模型都是凸,因此具有全局最佳解决方案。此外,我们采用交替的方向乘数法(ADMM)来获得每个变量的封闭形式解,从而确保算法的可行性。数值实验表明,这两种提出的算法是收敛性的,比较优于方法。特别是,当高光谱图像的采样率为2.5 \%时,我们的方法显着优于对比方法。
translated by 谷歌翻译
The detection of human body and its related parts (e.g., face, head or hands) have been intensively studied and greatly improved since the breakthrough of deep CNNs. However, most of these detectors are trained independently, making it a challenging task to associate detected body parts with people. This paper focuses on the problem of joint detection of human body and its corresponding parts. Specifically, we propose a novel extended object representation that integrates the center location offsets of body or its parts, and construct a dense single-stage anchor-based Body-Part Joint Detector (BPJDet). Body-part associations in BPJDet are embedded into the unified representation which contains both the semantic and geometric information. Therefore, BPJDet does not suffer from error-prone association post-matching, and has a better accuracy-speed trade-off. Furthermore, BPJDet can be seamlessly generalized to jointly detect any body part. To verify the effectiveness and superiority of our method, we conduct extensive experiments on the CityPersons, CrowdHuman and BodyHands datasets. The proposed BPJDet detector achieves state-of-the-art association performance on these three benchmarks while maintains high accuracy of detection. Code will be released to facilitate further studies.
translated by 谷歌翻译
Most recent head pose estimation (HPE) methods are dominated by the Euler angle representation. To avoid its inherent ambiguity problem of rotation labels, alternative quaternion-based and vector-based representations are introduced. However, they both are not visually intuitive, and often derived from equivocal Euler angle labels. In this paper, we present a novel single-stage keypoint-based method via an {\it intuitive} and {\it unconstrained} 2D cube representation for joint head detection and pose estimation. The 2D cube is an orthogonal projection of the 3D regular hexahedron label roughly surrounding one head, and itself contains the head location. It can reflect the head orientation straightforwardly and unambiguously in any rotation angle. Unlike the general 6-DoF object pose estimation, our 2D cube ignores the 3-DoF of head size but retains the 3-DoF of head pose. Based on the prior of equal side length, we can effortlessly obtain the closed-form solution of Euler angles from predicted 2D head cube instead of applying the error-prone PnP algorithm. In experiments, our proposed method achieves comparable results with other representative methods on the public AFLW2000 and BIWI datasets. Besides, a novel test on the CMU panoptic dataset shows that our method can be seamlessly adapted to the unconstrained full-view HPE task without modification.
translated by 谷歌翻译
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text. However, due to the complicated background textures and various text styles, existing methods fall short in generating clear and legible edited text images. In this study, we attribute the poor editing performance to two problems: 1) Implicit decoupling structure. Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously. 2) Domain gap. Due to the lack of edited real scene text images, the network can only be well trained on synthetic pairs and performs poorly on real-world images. To handle the above problems, we propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL). Firstly, we generate stroke guidance maps to explicitly indicate regions to be edited. Different from the implicit one by directly modifying all the pixels at image level, such explicit instructions filter out the distractions from background and guide the network to focus on editing rules of text regions. Secondly, we propose a Semi-supervised Hybrid Learning to train the network with both labeled synthetic images and unpaired real scene text images. Thus, the STE model is adapted to real-world datasets distributions. Moreover, two new datasets (Tamper-Syn2k and Tamper-Scene) are proposed to fill the blank of public evaluation datasets. Extensive experiments demonstrate that our MOSTEL outperforms previous methods both qualitatively and quantitatively. Datasets and code will be available at https://github.com/qqqyd/MOSTEL.
translated by 谷歌翻译
As an effective method to deliver external materials into biological cells, microinjection has been widely applied in the biomedical field. However, the cognition of cell mechanical property is still inadequate, which greatly limits the efficiency and success rate of injection. Thus, a new rate-dependent mechanical model based on membrane theory is proposed for the first time. In this model, an analytical equilibrium equation between the injection force and cell deformation is established by considering the speed effect of microinjection. Different from the traditional membrane-theory-based model, the elastic coefficient of the constitutive material in the proposed model is modified as a function of the injection velocity and acceleration, effectively simulating the influence of speeds on the mechanical responses and providing a more generalized and practical model. Using this model, other mechanical responses at different speeds can be also accurately predicted, including the distribution of membrane tension and stress and the deformed shape. To verify the validity of the model, numerical simulations and experiments are carried out. The results show that the proposed model can match the real mechanical responses well at different injection speeds.
translated by 谷歌翻译
Scene text spotting is of great importance to the computer vision community due to its wide variety of applications. Recent methods attempt to introduce linguistic knowledge for challenging recognition rather than pure visual classification. However, how to effectively model the linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting. Firstly, the autonomous suggests enforcing explicitly language modeling by decoupling the recognizer into vision model and language model and blocking gradient flow between both models. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for the language model which can effectively alleviate the impact of noise input. Finally, to polish ABINet++ in long text recognition, we propose to aggregate horizontal features by embedding Transformer units inside a U-Net, and design a position and content attention module which integrates character order and content to attend to character features precisely. ABINet++ achieves state-of-the-art performance on both scene text recognition and scene text spotting benchmarks, which consistently demonstrates the superiority of our method in various environments especially on low-quality images. Besides, extensive experiments including in English and Chinese also prove that, a text spotter that incorporates our language modeling method can significantly improve its performance both in accuracy and speed compared with commonly used attention-based recognizers.
translated by 谷歌翻译