Deep learning technology has made great progress in multi-view 3D reconstruction tasks. At present, most mainstream solutions establish the mapping between views and shape of an object by assembling the networks of 2D encoder and 3D decoder as the basic structure while they adopt different approaches to obtain aggregation of features from several views. Among them, the methods using attention-based fusion perform better and more stable than the others, however, they still have an obvious shortcoming -- the strong independence of each view during predicting the weights for merging leads to a lack of adaption of the global state. In this paper, we propose a global-aware attention-based fusion approach that builds the correlation between each branch and the global to provide a comprehensive foundation for weights inference. In order to enhance the ability of the network, we introduce a novel loss function to supervise the shape overall and propose a dynamic two-stage training strategy that can effectively adapt to all reconstructors with attention-based fusion. Experiments on ShapeNet verify that our method outperforms existing SOTA methods while the amount of parameters is far less than the same type of algorithm, Pix2Vox++. Furthermore, we propose a view-reduction method based on maximizing diversity and discuss the cost-performance tradeoff of our model to achieve a better performance when facing heavy input amount and limited computational cost.
translated by 谷歌翻译
直接使用现有的文本生成数据集进行可控生成时,我们面临的问题是没有域知识,因此可以控制的方面受到限制。一个典型的示例是,当使用CNN/Daily Mail数据集用于可控文本摘要时,没有关于摘要句子的重点的指导信息。更有用的文本生成器应利用输入文本和控制信号来指导生成,只能在对域知识的深入了解中构建。在这个愿景的激励下,我们的论文介绍了一个名为Mred的新文本生成数据集。我们的新数据集由7,089个元评论组成,其所有45k元评论句子都用9个精心定义的类别之一手动注释,包括抽象,力量,决策等。我们介绍了对开始的实验结果摘要模型,并提出了使用我们的带注释数据的方法对结构控制生成的方法。通过探索各种设置并分析模型行为相对于控制信号,我们证明了我们提出的任务的挑战以及数据集MRD的值。同时,MRD还使我们能够更好地了解元评论域。
translated by 谷歌翻译
In the current person Re-identification (ReID) methods, most domain generalization works focus on dealing with style differences between domains while largely ignoring unpredictable camera view change, which we identify as another major factor leading to a poor generalization of ReID methods. To tackle the viewpoint change, this work proposes to use a 3D dense pose estimation model and a texture mapping module to map the pedestrian images to canonical view images. Due to the imperfection of the texture mapping module, the canonical view images may lose the discriminative detail clues from the original images, and thus directly using them for ReID will inevitably result in poor performance. To handle this issue, we propose to fuse the original image and canonical view image via a transformer-based module. The key insight of this design is that the cross-attention mechanism in the transformer could be an ideal solution to align the discriminative texture clues from the original image with the canonical view image, which could compensate for the low-quality texture information of the canonical view image. Through extensive experiments, we show that our method can lead to superior performance over the existing approaches in various evaluation settings.
translated by 谷歌翻译
训练深图神经网络(GNNS)构成了一项具有挑战性的任务,因为GNN的性能可能会遭受隐藏的消息层的数量。文献集中在过度平滑和了解深度GNN的性能恶化的建议上。在本文中,我们提出了一种新的解释,以解决这种恶化的性能现象,即错误的简化,也就是说,通过防止自我浮动和强迫不得加权的边缘来简化图形。我们表明,这种简化可以降低消息通话层的潜力以捕获图的结构信息。鉴于此,我们提出了一个新的框架,Edge增强了图形神经网络(EEGNN)。 EEGNN使用从提出的Dirichlet混合泊松图模型(贝叶斯非参数模型)中提取的结构信息,以改善各种深度消息的GNN的性能。不同数据集的实验表明,与基准相比,我们的方法实现了可观的性能。
translated by 谷歌翻译
传统上,辩论通常需要手动准备过程,包括阅读大量文章,选择索赔,确定索赔的立场,寻求索赔的证据,等等。由于AI辩论吸引了更多的关注,因此值得探索辩论系统中涉及的乏味过程的方法。在这项工作中,我们介绍了一个名为IAM的全面且大的数据集,可以应用于一系列参数挖掘任务,包括主张提取,立场分类,证据提取等。我们的数据集从与123个主题有关的1K文章中收集了。 。数据集中的接近70k句子是根据其论点属性(例如,索赔,立场,证据等)完全注释的。我们进一步提出了与辩论准备过程相关的两个新的集成参数挖掘任务:(1)使用立场分类(CESC)和(2)索赔 - 证据对提取(CEPE)提取索赔。我们为每个集成任务分别采用管道方法和端到端方法。据报道,有希望的实验结果显示了我们提议的任务的价值和挑战,并激发了未来关于论证挖掘的研究。
translated by 谷歌翻译
远程时间对齐至关重要,但对视频恢复任务有挑战性。最近,一些作品试图将远程对齐分成几个子对齐并逐步处理它们。虽然该操作有助于建模遥控对应关系,但由于传播机制,误差累积是不可避免的。在这项工作中,我们提出了一种新颖的通用迭代对准模块,其采用逐渐改进方案进行子对准,产生更准确的运动补偿。为了进一步提高对准精度和时间一致性,我们开发了一种非参数重新加权方法,其中每个相邻帧的重要性以用于聚合的空间方式自适应地评估。凭借拟议的策略,我们的模型在一系列视频恢复任务中实现了多个基准测试的最先进的性能,包括视频超分辨率,去噪和去束性。我们的项目可用于\ url {https:/github.com/redrock303/revisiting-temporal-alignment-for-video-Restion.git}。
translated by 谷歌翻译
最先进的命名实体识别(NER)模型在很大程度上依赖于完全注释的培训数据。但是,AC可访问的数据通常是不完全注释的,注释者通常缺乏目标域中的全面知识。通常,默认情况下,未注释的代币被认为是非实体,而我们强调这些令牌可能是任何实体的非实体。在这里,我们使用不完整的带注释数据研究NER mod-Eling,其中只有一部分命名实体是la-bel的,并且未标记的令牌被每个可能的标签都刻有多标签。路径可以分散训练模型从金路径(地面真相标签序列)中分散注意力,从而阻碍了学习能力。在本文中,我们提出了称为自适应顶级助攻的Adak-ner,该模型集中在一个较小的可行重新上,其中黄金路径更有可能被宠爱。我们通过广泛的英语和中文数据集证明了UR方法的优势,平均在2003年的F-评分中可以提高2%的速度,而在两个中文数据集中则超过10%,与先前的最新作品相比。
translated by 谷歌翻译
我们考虑单个图像超分辨率(SISR)问题,其中基于低分辨率(LR)输入产生高分辨率(HR)图像。最近,生成的对抗性网络(GANS)变得幻觉细节。大多数沿着这条线的方法依赖于预定义的单个LR-intle-hr映射,这对于SISR任务来说是足够灵活的。此外,GaN生成的假细节可能经常破坏整个图像的现实主义。我们通过为Rich-Detail SISR提出最好的伙伴GANS(Beby-GaN)来解决这些问题。放松不变的一对一的约束,我们允许估计的贴片在培训期间动态寻求最佳监督,这有利于产生更合理的细节。此外,我们提出了一种区域感知的对抗性学习策略,指导我们的模型专注于自适应地为纹理区域发电细节。广泛的实验证明了我们方法的有效性。还构建了超高分辨率4K数据集以促进未来的超分辨率研究。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译