近年来,随着深度神经网络的发展,端到端优化的图像压缩已取得了重大进展,并超过了速度延伸性能的经典方法。但是,大多数基于学习的图像压缩方法是未标记的,在优化模型时不考虑图像语义或内容。实际上,人眼对不同内容具有不同的敏感性,因此还需要考虑图像内容。在本文中,我们提出了一种面向内容的图像压缩方法,该方法处理具有不同策略的不同类型的图像内容。广泛的实验表明,与最先进的端到端学习的图像压缩方法或经典方法相比,所提出的方法可实现竞争性的主观结果。
translated by 谷歌翻译
尽管受到监督的深度学习在医学图像细分方面取得了有希望的表现,但许多方法不能很好地概括在看不见的数据上,从而限制了其现实世界的适用性。为了解决这个问题,我们提出了一个基于学习的贝叶斯框架,该框架共同对图像和标签统计数据进行建模,并利用医学图像的域 - iRrelevant轮廓进行分割。具体而言,我们首先将图像分解为轮廓和基础的组成部分。然后,我们将预期标签建模为仅与轮廓相关的变量。最后,我们开发了一个变异的贝叶斯框架,以推断这些变量的后验分布,包括轮廓,基础和标签。该框架是通过神经网络实现的,因此称为深贝叶斯分割。跨序列心脏MRI分割的任务的结果表明,我们的方法为模型推广设定了新的最新技术。特别是,在T2图像上良好训练的LGE MRI训练的贝斯模型超过了其他型号,即在平均骰子方面超过0.47。我们的代码可在https://zmiclab.github.io/projects.html上找到。
translated by 谷歌翻译
神经建筑搜索(NAS)算法可节省人类专家的巨大劳动。最近的进步进一步将计算开销降低到负担得起的水平。但是,由于挑剔的程序和监督的学习范式,将NAS技术部署在现实世界应用程序中仍然很麻烦。在这项工作中,我们通过允许自我审议并保留在搜索阶段发现的伴随的权重,提出了自我监管和举重的神经体系结构搜索(SSWP-NAS)作为当前NAS框架的扩展。因此,我们将NAS的工作流程简化为单阶段和无代理程序。实验表明,通过所提出的框架搜索的架构实现了CIFAR-10,CIFAR-100和Imagenet数据集上的最新精度,而无需使用手动标签。此外,我们表明,使用伴随的权重作为初始化始终优于随机初始化和两阶段的权重预训练方法,在半监督的学习方案下清晰的边缘。代码可在https://github.com/lzvv123456/sswp-nas上公开获得。
translated by 谷歌翻译
图表神经架构搜索已在最近成功应用于非欧几里德数据上成功应用的图形神经网络(GNNS)得到了很多关注。但是,探索庞大的搜索空间中的所有可能的GNN架构都太耗时或无法对大图数据进行耗时或不可能。在本文中,我们提出了一个平行的图形架构搜索(GraphPas)图形神经网络的框架。在GraphPas中,我们通过设计基于共享的演进学习来探索搜索空间,可以在不失去准确性的情况下提高搜索效率。此外,架构信息熵是动态采用的突变选择概率,这可以减少空间探索。实验结果表明,GraphPas以效率和准确性同时占据了最先进的模型。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables. We formalize this scenario by building a new Chinese benchmark KnowSQL consisting of domain-specific questions covering various domains. We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples. More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing. Experiments using ReGrouP demonstrate a significant 28.2% improvement overall on KnowSQL.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译