智能论文笔记

Improved Image Classification with Token Fusion

Keong Hun Choi , Jin Woo Kim , Yao Wang , Jong Eun Ha

分类：计算机视觉 | 人工智能

2022-08-19

在本文中，我们提出了一种使用CNN和变压器结构融合以提高图像分类性能的方法。对于CNN，可以很好地提取有关图像上局部区域的信息，但是限制了全局信息的提取。另一方面，变压器在相对全局的提取方面具有优势，但缺点是因为它需要大量的内存来进行本地特征值提取。在图像的情况下，它通过CNN转换为特征映射，每个特征映射的像素都被视为令牌。同时，将图像分为贴片区域，然后与将其视为令牌视图的变压器方法融合在一起。对于令牌与两个不同特征的融合，我们提出了三种方法：（1）具有平行结构的晚令融合，（2）早期令牌融合，（3）逐层中的令牌融合。在使用Imagenet 1K的实验中，提出的方法显示了最佳的分类性能。

translated by 谷歌翻译

Alternating Cross-attention Vision-Language Model for Efficient Learning with Medical Image and Report without Curation

Sangjoon Park , Eun Sun Lee , Jeong Eun Lee , Jong Chul Ye

分类：自然语言处理 | 计算机视觉 | 机器学习

2022-08-10

视觉预训练的最新进展表明，在不同的视觉任务中表现出惊人的表现，阐明了对人工智能研究中对视觉和文本概念的全面理解的长期问题。但是，在医学领域的视觉预训练的应用方面取得了有限数量和多样性阻碍了对联合视觉语言概念的成功学习。在这项研究中，我们介绍了Max-VL，这是一种针对医疗领域中有效视觉预训练的模型。我们在实验上证明，预先训练的MAX-VL模型在各种视觉任务中都优于当前最新视觉语言模型。我们还提出了用于诊断新出现疾病和人为错误检测的临床实用性，并显示了该模型在不同领域数据中的广泛适用性。

translated by 谷歌翻译

Multi-scale Hybrid Vision Transformer for Learning Gastric Cancer Histology

Yujin Oh , Go Eun Bae , Kyung-Hee Kim , Min-Kyung Yeo , Jong Chul Ye

分类：计算机视觉 | 机器学习

2022-02-17

胃内窥镜筛查是在早期决定适当的胃癌（GC）治疗的有效方法，从而降低了与GC相关的死亡率。尽管人工智能（AI）带来了一个巨大的希望，可以帮助病理学家筛选数字化整个幻灯片图像，但现有的AI系统受到细粒癌症亚赛的限制，在计划癌症治疗方面几乎没有可用性。我们提出了一个实用的AI系统，该系统可以实现五个GC病理的亚分类，可以直接与一般的GC治疗指南相匹配。 AI系统旨在通过模仿人类病理学家理解组织学的方式，通过使用2阶段混合视觉变压器（VIT）网络通过多尺度的自我注意力转换器（VIT）网络通过多尺度的自我发项机制来有效区分多级GC。 AI系统通过在多中心队列中达到1,212张幻灯片，通过达到高于0.85的类平均灵敏度来显示可靠的诊断性能。此外，与人类病理学家相比，AI辅助病理学家显示出12％的诊断敏感性显着提高了12％。我们的结果表明，在实际临床环境中，AI辅助胃内窥镜筛查具有提供假定的病理学意见和适当的胃癌癌症治疗的巨大潜力。

translated by 谷歌翻译

Tunable Image Quality Control of 3-D Ultrasound using Switchable CycleGAN

Jaeyoung Huh , Shujaat Khan , Sungjin Choi , Dongkuk Shin , Eun Sun Lee , Jong Chul Ye

分类：计算机视觉 | 机器学习

2021-12-06

与单轴平面成像的2-D超声（US）相比，3-D US成像系统可以沿三个轴平面可视化容积。这允许完整的解剖学观察，这对于妇科（GYN）和产科（OB）应用是有用的。不幸的是，与2-D US相比，3-D US在分辨率中具有固有的限制。例如，在3-D US与3-D机械探针的情况下，例如，图像质量沿着光束方向可比较，但在其他两个轴向图像平面中通常观察到图像质量的显着劣化。为了解决这个问题，我们提出了一种新颖的无监督的深度学习方法来提高3-D US图像质量。特别是，使用{\ EM无与伦比的}高质量的2-D US图像作为参考，我们培训了最近提出的可切换Cyclean架构，以便在3-D中的每个映射平面都可以学习2-D US图像的图像质量。由于可切换架构，我们的网络还可以根据用户偏好提供对图像增强级别的实时控制，这是以用户为中心的扫描仪设置的理想选择。具有临床评估的广泛实验证实，我们的方法提供了显着提高的图像质量，也能成为用户友好的灵活性。

translated by 谷歌翻译

Uncertainty-Aware Performance Prediction for Highly Configurable Software Systems via Bayesian Neural Networks

Huong Ha , Zongwen Fan , Hongyu Zhang

分类：人工智能 | 机器学习

2022-12-27

Configurable software systems are employed in many important application domains. Understanding the performance of the systems under all configurations is critical to prevent potential performance issues caused by misconfiguration. However, as the number of configurations can be prohibitively large, it is not possible to measure the system performance under all configurations. Thus, a common approach is to build a prediction model from a limited measurement data to predict the performance of all configurations as scalar values. However, it has been pointed out that there are different sources of uncertainty coming from the data collection or the modeling process, which can make the scalar predictions not certainly accurate. To address this problem, we propose a Bayesian deep learning based method, namely BDLPerf, that can incorporate uncertainty into the prediction model. BDLPerf can provide both scalar predictions for configurations' performance and the corresponding confidence intervals of these scalar predictions. We also develop a novel uncertainty calibration technique to ensure the reliability of the confidence intervals generated by a Bayesian prediction model. Finally, we suggest an efficient hyperparameter tuning technique so as to train the prediction model within a reasonable amount of time whilst achieving high accuracy. Our experimental results on 10 real-world systems show that BDLPerf achieves higher accuracy than existing approaches, in both scalar performance prediction and confidence interval estimation.

translated by 谷歌翻译

Benchmark for Uncertainty & Robustness in Self-Supervised Learning

Ha Manh Bui , Iliana Maifeld-Carucci

分类：计算机视觉 | 机器学习

2022-12-23

Self-Supervised Learning (SSL) is crucial for real-world applications, especially in data-hungry domains such as healthcare and self-driving cars. In addition to a lack of labeled data, these applications also suffer from distributional shifts. Therefore, an SSL method should provide robust generalization and uncertainty estimation in the test dataset to be considered a reliable model in such high-stakes domains. However, existing approaches often focus on generalization, without evaluating the model's uncertainty. The ability to compare SSL techniques for improving these estimates is therefore critical for research on the reliability of self-supervision models. In this paper, we explore variants of SSL methods, including Jigsaw Puzzles, Context, Rotation, Geometric Transformations Prediction for vision, as well as BERT and GPT for language tasks. We train SSL in auxiliary learning for vision and pre-training for language model, then evaluate the generalization (in-out classification accuracy) and uncertainty (expected calibration error) across different distribution covariate shift datasets, including MNIST-C, CIFAR-10-C, CIFAR-10.1, and MNLI. Our goal is to create a benchmark with outputs from experiments, providing a starting point for new SSL methods in Reliable Machine Learning. All source code to reproduce results is available at https://github.com/hamanhbui/reliable_ssl_baselines.

translated by 谷歌翻译

FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos

Joo Chan Lee , Daniel Rho , Jong Hwan Ko , Eunbyung Park

分类：计算机视觉 | 机器学习

2022-12-23

Neural fields, also known as coordinate-based or implicit neural representations, have shown a remarkable capability of representing, generating, and manipulating various forms of signals. For video representations, however, mapping pixel-wise coordinates to RGB colors has shown relatively low compression performance and slow convergence and inference speed. Frame-wise video representation, which maps a temporal coordinate to its entire frame, has recently emerged as an alternative method to represent videos, improving compression rates and encoding speed. While promising, it has still failed to reach the performance of state-of-the-art video compression algorithms. In this work, we propose FFNeRV, a novel method for incorporating flow information into frame-wise representations to exploit the temporal redundancy across the frames in videos inspired by the standard video codecs. Furthermore, we introduce a fully convolutional architecture, enabled by one-dimensional temporal grids, improving the continuity of spatial features. Experimental results show that FFNeRV yields the best performance for video compression and frame interpolation among the methods using frame-wise representations or neural fields. To reduce the model size even further, we devise a more compact convolutional architecture using the group and pointwise convolutions. With model compression techniques, including quantization-aware training and entropy coding, FFNeRV outperforms widely-used standard video codecs (H.264 and HEVC) and performs on par with state-of-the-art video compression algorithms.

translated by 谷歌翻译

Scalable Hybrid Learning Techniques for Scientific Data Compression

Tania Banerjee , Jong Choi , Jaemoon Lee , Qian Gong , Jieyang Chen , Scott Klasky , Anand Rangarajan , Sanjay Ranka

分类：机器学习

2022-12-21

Data compression is becoming critical for storing scientific data because many scientific applications need to store large amounts of data and post process this data for scientific discovery. Unlike image and video compression algorithms that limit errors to primary data, scientists require compression techniques that accurately preserve derived quantities of interest (QoIs). This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression that addresses this requirement. Our hybrid compression technique combines machine learning techniques and standard compression methods. Specifically, we combine an autoencoder, an error-bounded lossy compressor to provide guarantees on raw data error, and a constraint satisfaction post-processing step to preserve the QoIs within a minimal error (generally less than floating point error). The effectiveness of the data compression pipeline is demonstrated by compressing nuclear fusion simulation data generated by a large-scale fusion code, XGC, which produces hundreds of terabytes of data in a single day. Our approach works within the ADIOS framework and results in compression by a factor of more than 150 while requiring only a few percent of the computational resources necessary for generating the data, making the overall approach highly effective for practical scenarios.

translated by 谷歌翻译

Masked Wavelet Representation for Compact Neural Radiance Fields

Daniel Rho , Byeonghyeon Lee , Seungtae Nam , Joo Chan Lee , Jong Hwan Ko , Eunbyung Park

分类：计算机视觉

2022-12-18

Neural radiance fields (NeRF) have demonstrated the potential of coordinate-based neural representation (neural fields or implicit neural representation) in neural rendering. However, using a multi-layer perceptron (MLP) to represent a 3D scene or object requires enormous computational resources and time. There have been recent studies on how to reduce these computational inefficiencies by using additional data structures, such as grids or trees. Despite the promising performance, the explicit data structure necessitates a substantial amount of memory. In this work, we present a method to reduce the size without compromising the advantages of having additional data structures. In detail, we propose using the wavelet transform on grid-based neural fields. Grid-based neural fields are for fast convergence, and the wavelet transform, whose efficiency has been demonstrated in high-performance standard codecs, is to improve the parameter efficiency of grids. Furthermore, in order to achieve a higher sparsity of grid coefficients while maintaining reconstruction quality, we present a novel trainable masking approach. Experimental results demonstrate that non-spatial grid coefficients, such as wavelet coefficients, are capable of attaining a higher level of sparsity than spatial grid coefficients, resulting in a more compact representation. With our proposed mask and compression pipeline, we achieved state-of-the-art performance within a memory budget of 2 MB. Our code is available at https://github.com/daniel03c1/masked_wavelet_nerf.

translated by 谷歌翻译

Cascaded Compositional Residual Learning for Complex Interactive Behaviors

K. Niranjan Kumar , Irfan Essa , Sehoon Ha

分类：机器人 | 机器学习

2022-12-17

Real-world autonomous missions often require rich interaction with nearby objects, such as doors or switches, along with effective navigation. However, such complex behaviors are difficult to learn because they involve both high-level planning and low-level motor control. We present a novel framework, Cascaded Compositional Residual Learning (CCRL), which learns composite skills by recursively leveraging a library of previously learned control policies. Our framework learns multiplicative policy composition, task-specific residual actions, and synthetic goal information simultaneously while freezing the prerequisite policies. We further explicitly control the style of the motion by regularizing residual actions. We show that our framework learns joint-level control policies for a diverse set of motor skills ranging from basic locomotion to complex interactive navigation, including navigating around obstacles, pushing objects, crawling under a table, pushing a door open with its leg, and holding it open while walking through it. The proposed CCRL framework leads to policies with consistent styles and lower joint torques, which we successfully transfer to a real Unitree A1 robot without any additional fine-tuning.

translated by 谷歌翻译