智能论文笔记

CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

Jie Liu , Yixiao Zhang , Jie-Neng Chen , Junfei Xiao , Yongyi Lu , Bennett A. Landman , Yixuan Yuan , Alan Yuille , Yucheng Tang , Zongwei Zhou

分类：计算机视觉 | 机器学习

2023-01-02

An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.

translated by 谷歌翻译

Large Language Models are reasoners with Self-Verification

Yixuan Weng , Minjun Zhu , Shizhu He , Kang Liu , Jun Zhao

分类：人工智能 | 自然语言处理

2022-12-19

When a large language model (LLM) performs complex reasoning by chain of thought (CoT), it can be highly sensitive to individual mistakes. We have had to train verifiers to address this issue. As we all know, after human inferring a conclusion, they often check it by re-verifying it, which can avoid some mistakes. We propose a new method called self-verification that uses the conclusion of the CoT as a condition to build a new sample and asks the LLM to re-predict the original conditions which be masked. We calculate an explainable verification score based on the accuracy. This method can improve the accuracy of multiple arithmetics and logical reasoning datasets when using few-shot learning. we have demonstrated that LLMs can conduct explainable self-verification of their own conclusions and achieve competitive reasoning performance. Extensive experimentals have demonstrated that our method can help multiple large language models with self-verification can avoid interference from incorrect CoT. Code is available at \url{https://github.com/WENGSYX/Self-Verification}

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Plug-and-Play Recipe Generation with Content Planning

Yinhong Liu , Yixuan Su , Ehsan Shareghi , Nigel Collier

分类：自然语言处理

2022-12-09

Recent pre-trained language models have shown promising capabilities in generating fluent and realistic natural language text. However, generating multi-sentence text with global content planning has been a long-existing research question. Current approaches for controlled text generation can hardly address this issue, as they usually condition on single known control attributes. In this study, we propose a low-cost yet effective framework which explicitly models the global content plan of the generated text. Specifically, it optimizes the joint distribution of the natural language sequence and the global content plan in a plug-and-play manner. We conduct extensive experiments on the well-established Recipe1M+ benchmark. Both automatic and human evaluations verify that our model achieves the state-of-the-art performance on the task of recipe generation

translated by 谷歌翻译

Towards human-compatible autonomous car: A study of non-verbal Turing test in automated driving with affective transition modelling

Zhaoning Li , Qiaoli Jiang , Zhengming Wu , Anqi Liu , Haiyan Wu , Miner Huang , Kai Huang , Yixuan Ku

分类：人工智能 | 自然语言处理

2022-12-06

Autonomous cars are indispensable when humans go further down the hands-free route. Although existing literature highlights that the acceptance of the autonomous car will increase if it drives in a human-like manner, sparse research offers the naturalistic experience from a passenger's seat perspective to examine the human likeness of current autonomous cars. The present study tested whether the AI driver could create a human-like ride experience for passengers based on 69 participants' feedback in a real-road scenario. We designed a ride experience-based version of the non-verbal Turing test for automated driving. Participants rode in autonomous cars (driven by either human or AI drivers) as a passenger and judged whether the driver was human or AI. The AI driver failed to pass our test because passengers detected the AI driver above chance. In contrast, when the human driver drove the car, the passengers' judgement was around chance. We further investigated how human passengers ascribe humanness in our test. Based on Lewin's field theory, we advanced a computational model combining signal detection theory with pre-trained language models to predict passengers' humanness rating behaviour. We employed affective transition between pre-study baseline emotions and corresponding post-stage emotions as the signal strength of our model. Results showed that the passengers' ascription of humanness would increase with the greater affective transition. Our study suggested an important role of affective transition in passengers' ascription of humanness, which might become a future direction for autonomous driving.

translated by 谷歌翻译

Momentum Decoding: Open-ended Text Generation As Graph Exploration

Tian Lan , Yixuan Su , Shuhang Liu , Heyan Huang , Xian-Ling Mao

分类：自然语言处理

2022-12-05

Open-ended text generation with autoregressive language models (LMs) is one of the core tasks in natural language processing. However, maximization-based decoding methods (e.g., greedy/beam search) often lead to the degeneration problem, i.e., the generated text is unnatural and contains undesirable repetitions. Existing solutions to this problem either introduce randomness prone to incoherence or require a look-ahead mechanism that demands extra computational overhead. In this study, we formulate open-ended text generation from a new perspective, i.e., we view it as an exploration process within a directed graph. Thereby, we understand the phenomenon of degeneration as circular loops within the directed graph. Based on our formulation, we propose a novel decoding method -- \textit{momentum decoding} -- which encourages the LM to \textit{greedily} explore new nodes outside the current graph. Meanwhile, it also allows the LM to return to the existing nodes with a momentum downgraded by a pre-defined resistance function. We extensively test our approach on three benchmarks from different domains through automatic and human evaluations. The results show that momentum decoding performs comparably with the current state of the art while enjoying notably improved inference speed and computation FLOPs. Furthermore, we conduct a detailed analysis to reveal the merits and inner workings of our approach. Our codes and other related resources are publicly available at https://github.com/gmftbyGMFTBY/MomentumDecoding.

translated by 谷歌翻译

Exploring Discrete Diffusion Models for Image Captioning

Zixin Zhu , Yixuan Wei , Jianfeng Wang , Zhe Gan , Zheng Zhang , Le Wang , Gang Hua , Lijuan Wang , Zicheng Liu , Han Hu

分类：计算机视觉

2022-11-21

The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one. We present a diffusion-based captioning model, dubbed the name DDCap, to allow more decoding flexibility. Unlike image generation, where the output is continuous and redundant with a fixed length, texts in image captions are categorical and short with varied lengths. Therefore, naively applying the discrete diffusion model to text decoding does not work well, as shown in our experiments. To address the performance gap, we propose several key techniques including best-first inference, concentrated attention mask, text length prediction, and image-free training. On COCO without additional caption pre-training, it achieves a CIDEr score of 117.8, which is +5.0 higher than the auto-regressive baseline with the same architecture in the controlled setting. It also performs +26.8 higher CIDEr score than the auto-regressive baseline (230.3 v.s.203.5) on a caption infilling task. With 4M vision-language pre-training images and the base-sized model, we reach a CIDEr score of 125.1 on COCO, which is competitive to the best well-developed auto-regressive frameworks. The code is available at https://github.com/buxiangzhiren/DDCap.

translated by 谷歌翻译

Is Out-of-Distribution Detection Learnable?

Zhen Fang , Yixuan Li , Jie Lu , Jiahua Dong , Bo Han , Feng Liu

分类：机器学习 | (统计)机器学习

2022-10-26

Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms. To study the generalization of OOD detection, in this paper, we investigate the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem. First, we find a necessary condition for the learnability of OOD detection. Then, using this condition, we prove several impossibility theorems for the learnability of OOD detection under some scenarios. Although the impossibility theorems are frustrating, we find that some conditions of these impossibility theorems may not hold in some practical scenarios. Based on this observation, we next give several necessary and sufficient conditions to characterize the learnability of OOD detection in some practical scenarios. Lastly, we also offer theoretical supports for several representative OOD detection works based on our OOD theory.

translated by 谷歌翻译

Second Order Ensemble Langevin Method for Sampling and Inverse Problems

Ziming Liu , Andrew M. Stuart , Yixuan Wang

分类：机器学习

2022-08-09

我们根据二阶Langevin动力学的集合近似提出了一种采样方法。对数目标密度的附加辅助动量变量中附加了二次项，并引入了阻尼驱动的汉密尔顿动力学。所得的随机微分方程对于Gibbs度量不变，而目标坐标的边际坐标。根据动力学定律，基于协方差的预处理不会改变此不变性属性，并且被引入以加速融合到吉布斯度量。可以通过合奏方法近似产生的平均场动力学。这导致无梯度和仿射不变的随机动力学系统。数值结果证明了其作为贝叶斯反问题中数值采样器的基础的潜力。

translated by 谷歌翻译

Cooperative Actor-Critic via TD Error Aggregation

Martin Figura , Yixuan Lin , Ji Liu , Vijay Gupta

分类：机器学习

2022-07-25

在分散的合作多机构增强学习中，代理可以彼此汇总信息，以学习最大化团队平均目标功能的政策。尽管愿意与他人合作，但各个代理商可能会直接分享有关其当地状态，奖励和价值功能的信息，这是由于隐私问题而不受欢迎的。在这项工作中，我们引入了一种带有TD错误聚合的分散的参与者批判算法，该算法不违反隐私问题，并假设沟通渠道会受到时间延迟和数据包的删除。通过传输数据的维度来衡量，我们为做出如此薄弱的假设所支付的成本是增加的沟通负担。有趣的是，通信负担仅在图形大小上是二次的，这使得适用于大型网络的算法。我们在减小的步进大小下提供收敛分析，以验证代理最大化团队平均目标函数。

translated by 谷歌翻译