To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets and further incorporates $45$ PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.
translated by 谷歌翻译
当测试图像提出看不见的分布时,深层分割模型通常会面临故障风险。改善模型鲁棒性针对这些风险的鲁棒性对于深层模型的大规模临床应用至关重要。在这项研究中,受到人类学习周期的启发,我们提出了一个新颖的在线反思学习框架(REFSEG),以改善细分鲁棒性。基于启用概念的反射概念,我们的refseg首先驱动了深层模型以采取行动以获得语义分割。然后,refseg触发模型以反映自身。因为使深层模型在测试过程中意识到他们的细分失败是具有挑战性的,所以RefSeg合成了从语义面具中综合的逼真的代理图像,以帮助深层模型构建直观有效的反射。该代理翻译并强调了分割缺陷。通过最大程度地提高原始输入和代理之间的结构相似性,可以改善分割鲁棒性的反射循环。 REFSEG在测试阶段运行,并且是分割模型的一般性。通过公共心脏MR数据集和两个内部大型超声数据集对三个医疗图像细分任务进行了广泛的验证,这表明我们的refseg显着提高了模型的鲁棒性,并报告了与强大竞争对手有关的最先进的表现。
translated by 谷歌翻译
扩散概率模型(DPM)是新兴的强大生成模型。尽管具有高质量的生成性能,但DPM仍然遭受缓慢采样的苦难,因为它们通常需要数百或数千个大型神经网络的顺序函数评估(步骤)来绘制样本。可以将来自DPM的采样视为求解相应的扩散普通微分方程(ODE)。在这项工作中,我们提出了扩散ODE的溶液的精确表述。该公式通过分析计算解决方案的线性部分,而不是将所有术语留给先前工作中采用的黑盒ode求解器。通过应用可变化的更改,可以将解决方案等效地简化为神经网络的指数加权积分。根据我们的公式,我们提出了DPM-Solver,这是一种通过收敛顺序保证的快速专用高阶求解器。 DPM溶剂适用于离散时间和连续时间DPM,而无需进行任何进一步的培训。实验结果表明,DPM-Solver可以在各种数据集上的10至20个功能评估中生成高质量的样本。我们在10个功能评估中实现了4.70 FID,在CIFAR10数据集上进行20个功能评估中的2.87 FID,与以前的各种数据集中的先前最先进的无培训样本器相比,$ 4 \ sim 16 \ times $速度。
translated by 谷歌翻译
在缺少标签(MLML)的情况下,多标签学习是一个具有挑战性的问题。现有方法主要关注网络结构或培训方案的设计,这提高了实现的复杂性。这项工作旨在满足MLML中的损失函数的潜力,而不增加程序和复杂性。为此,我们通过鲁棒损失设计提出了两种简单但有效的方法,基于观察到模型可以在高精度训练期间识别丢失的标签。首先是对底层的良好损失,即山损,重量底部以山的形状重量否定,以减轻虚假底片的效果。第二个是自定步损耗校正(SPLC)方法,其利用缺失标签的近似分布下的最大似然标准导出的丢失。在各种多标签图像分类数据集上的综合实验表明,我们的方法可以显着提高MLML的性能,并在MLML中实现新的最先进的损失函数。
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译
Rankings are widely collected in various real-life scenarios, leading to the leakage of personal information such as users' preferences on videos or news. To protect rankings, existing works mainly develop privacy protection on a single ranking within a set of ranking or pairwise comparisons of a ranking under the $\epsilon$-differential privacy. This paper proposes a novel notion called $\epsilon$-ranking differential privacy for protecting ranks. We establish the connection between the Mallows model (Mallows, 1957) and the proposed $\epsilon$-ranking differential privacy. This allows us to develop a multistage ranking algorithm to generate synthetic rankings while satisfying the developed $\epsilon$-ranking differential privacy. Theoretical results regarding the utility of synthetic rankings in the downstream tasks, including the inference attack and the personalized ranking tasks, are established. For the inference attack, we quantify how $\epsilon$ affects the estimation of the true ranking based on synthetic rankings. For the personalized ranking task, we consider varying privacy preferences among users and quantify how their privacy preferences affect the consistency in estimating the optimal ranking function. Extensive numerical experiments are carried out to verify the theoretical results and demonstrate the effectiveness of the proposed synthetic ranking algorithm.
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Brain midline shift (MLS) is one of the most critical factors to be considered for clinical diagnosis and treatment decision-making for intracranial hemorrhage. Existing computational methods on MLS quantification not only require intensive labeling in millimeter-level measurement but also suffer from poor performance due to their dependence on specific landmarks or simplified anatomical assumptions. In this paper, we propose a novel semi-supervised framework to accurately measure the scale of MLS from head CT scans. We formulate the MLS measurement task as a deformation estimation problem and solve it using a few MLS slices with sparse labels. Meanwhile, with the help of diffusion models, we are able to use a great number of unlabeled MLS data and 2793 non-MLS cases for representation learning and regularization. The extracted representation reflects how the image is different from a non-MLS image and regularization serves an important role in the sparse-to-dense refinement of the deformation field. Our experiment on a real clinical brain hemorrhage dataset has achieved state-of-the-art performance and can generate interpretable deformation fields.
translated by 谷歌翻译
It is crucial to evaluate the quality and determine the optimal number of clusters in cluster analysis. In this paper, the multi-granularity characterization of the data set is carried out to obtain the hyper-balls. The cluster internal evaluation index based on hyper-balls(HCVI) is defined. Moreover, a general method for determining the optimal number of clusters based on HCVI is proposed. The proposed methods can evaluate the clustering results produced by the several classic methods and determine the optimal cluster number for data sets containing noises and clusters with arbitrary shapes. The experimental results on synthetic and real data sets indicate that the new index outperforms existing ones.
translated by 谷歌翻译