智能论文笔记

Frailty Care Robot for Elderly and Its Application for Physical and Psychological Support

Yoichi Yamazaki , Masayuki Ishii , Takahiro Ito , Takuya Hashimoto

分类：机器人

2021-11-20

为了在老年人的日常生活中实现连续的虚弱护理，我们向家里的老年人提出Ahobo，一位虚弱的护理机器人。通过AHOBO实施两种类型的支持系统，以支持身体健康和心理方面的老年人。对于身体健康的体力保健，我们专注于血压，并开发了一种用Ahobo血压测量的支持系统。对于心理脆弱的护理，我们将用Ahobo作为与机器人的娱乐活动实施着色的着色。根据日常生活中连续使用的假设，评估系统的可用性。对于血压测量的支持系统，我们对16名受试者的问卷进行了定性评估，包括系统血压测量的老年人。结果证实，该拟议的机器人不会影响血压读数，并且在基于主观评估的易用性方面是可接受的。为了使复兴的着色相互作用，在口头流畅性任务下对两名老年人进行了主观评估，并且已经证实了互动可以在日常生活中不断使用。拟议的机器人作为支持日常生活的AI的界面广泛使用将导致AI机器人支持从摇篮到坟墓的社会。

translated by 谷歌翻译

Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models

Naoki Matsunaga , Masato Ishii , Akio Hayakawa , Kenji Suzuki , Takuya Narihira

分类：计算机视觉 | 机器学习

2022-12-05

Generative models, particularly GANs, have been utilized for image editing. Although GAN-based methods perform well on generating reasonable contents aligned with the user's intentions, they struggle to strictly preserve the contents outside the editing region. To address this issue, we use diffusion models instead of GANs and propose a novel image-editing method, based on pixel-wise guidance. Specifically, we first train pixel-classifiers with few annotated data and then estimate the semantic segmentation map of a target image. Users then manipulate the map to instruct how the image is to be edited. The diffusion model generates an edited image via guidance by pixel-wise classifiers, such that the resultant image aligns with the manipulated map. As the guidance is conducted pixel-wise, the proposed method can create reasonable contents in the editing region while preserving the contents outside this region. The experimental results validate the advantages of the proposed method both quantitatively and qualitatively.

translated by 谷歌翻译

Fashion-Specific Attributes Interpretation via Dual Gaussian Visual-Semantic Embedding

Ryotaro Shimizu , Masanari Kimura , Masayuki Goto

分类：计算机视觉 | 机器学习

2022-10-28

Several techniques to map various types of components, such as words, attributes, and images, into the embedded space have been studied. Most of them estimate the embedded representation of target entity as a point in the projective space. Some models, such as Word2Gauss, assume a probability distribution behind the embedded representation, which enables the spread or variance of the meaning of embedded target components to be captured and considered in more detail. We examine the method of estimating embedded representations as probability distributions for the interpretation of fashion-specific abstract and difficult-to-understand terms. Terms, such as "casual," "adult-casual,'' "beauty-casual," and "formal," are extremely subjective and abstract and are difficult for both experts and non-experts to understand, which discourages users from trying new fashion. We propose an end-to-end model called dual Gaussian visual-semantic embedding, which maps images and attributes in the same projective space and enables the interpretation of the meaning of these terms by its broad applications. We demonstrate the effectiveness of the proposed method through multifaceted experiments involving image and attribute mapping, image retrieval and re-ordering techniques, and a detailed theoretical/analytical discussion of the distance measure included in the loss function.

translated by 谷歌翻译

Two-Step Color-Polarization Demosaicking Network

Vy Nguyen , Masayuki Tanaka , Yusuke Monno , Masatoshi Okutomi

分类：计算机视觉

2022-09-13

场景中光的极化信息对于各种图像处理和计算机视觉任务很有价值。平面偏光仪是一种有前途的方法，可以一次性地捕获不同方向的极化图像，而它需要颜色极化的表现。在本文中，我们提出了一个两步的颜色偏振化学网络〜（TCPDNET），该网络由两个颜色的表演和极化演示组成。我们还引入了YCBCR颜色空间中的重建损失，以提高TCPDNET的性能。实验比较表明，TCPDNET在极化图像的图像质量和Stokes参数的准确性方面优于现有方法。

translated by 谷歌翻译

Data Augmentation by Selecting Mixed Classes Considering Distance Between Classes

Shungo Fujii , Yasunori Ishii , Kazuki Kozuka , Tsubasa Hirakawa , Takayoshi Yamashita , Hironobu Fujiyoshi

分类：计算机视觉 | (统计)机器学习

2022-09-12

数据增强是使用深度学习来提高对象识别的识别精度的重要技术。从多个数据集中产生混合数据（例如混音）的方法可以获取未包含在培训数据中的新多样性，从而有助于改善准确性。但是，由于在整个训练过程中选择了选择用于混合的数据，因此在某些情况下未选择适当的类或数据。在这项研究中，我们提出了一种数据增强方法，该方法根据班级概率来计算类之间的距离，并可以从合适的类中选择数据以在培训过程中混合。根据每个班级的训练趋势，对混合数据进行动态调整，以促进培训。所提出的方法与常规方法结合使用，以生成混合数据。评估实验表明，提出的方法改善了对一般和长尾图像识别数据集的识别性能。

translated by 谷歌翻译

Few-shot Adaptive Object Detection with Cross-Domain CutMix

Yuzuru Nakamura , Yasunori Ishii , Yuki Maruyama , Takayoshi Yamashita

分类：计算机视觉 | (统计)机器学习

2022-08-31

在对象检测中，数据量和成本是一种权衡，在特定领域中收集大量数据是劳动密集型的。因此，现有的大规模数据集用于预训练。但是，当目标域与源域显着不同时，常规传输学习和域的适应性不能弥合域间隙。我们提出了一种数据合成方法，可以解决大域间隙问题。在此方法中，目标图像的一部分被粘贴到源图像上，并通过利用对象边界框的信息来对齐粘贴区域的位置。此外，我们介绍对抗性学习，以区分原始区域或粘贴区域。所提出的方法在大量源图像和一些目标域图像上训练。在非常不同的域问题设置中，所提出的方法比常规方法获得更高的精度，其中RGB图像是源域，而热红外图像是目标域。同样，在模拟图像与真实图像的情况下，提出的方法达到了更高的精度。

translated by 谷歌翻译

HTML版本

PoF: Post-Training of Feature Extractor for Improving Generalization

Ikuro Sato , Ryota Yamada , Masayuki Tanaka , Nakamasa Inoue , Rei Kawakami

分类：机器学习

2022-07-05

经过深入的研究，最低限度的损失景观的局部形状，尤其是平坦度对于深层模型的概括起重要作用。我们开发了一种称为POF的培训算法：特征提取器的训练后培训，该培训更新了已经训练的深层模型的特征提取器部分，以搜索最小的最小值。特征是两倍：1）特征提取器在高层参数空间中的参数扰动下受到训练，基于表明使更高层参数空间变平的观测值，以及2）扰动范围以数据驱动的方式确定旨在减少由正损失曲率引起的一部分测试损失。我们提供了理论分析，该分析表明所提出的算法隐含地减少了目标Hessian组件以及损失。实验结果表明，POF仅针对CIFAR-10和CIFAR-100数据集的基线方法提高了模型性能，仅用于10个上学后培训，以及用于50个上学后培训的SVHN数据集。源代码可用：\ url {https://github.com/densoitlab/pof-v1

translated by 谷歌翻译

Noise-aware Physics-informed Machine Learning for Robust PDE Discovery

Pongpisit Thanasutives , Takeshi Morita , Masayuki Numao , Ken-ichi Fukui

分类：人工智能 | 机器学习

2022-06-26

这项工作与发现物理系统的偏微分方程（PDE）有关。现有方法证明了有限观察结果的PDE识别，但未能保持令人满意的噪声性能，部分原因是由于次优估计衍生物并发现了PDE系数。我们通过引入噪音吸引物理学的机器学习（NPIML）框架来解决问题，以在任意分布后从数据中发现管理PDE。我们的建议是双重的。首先，我们提出了几个神经网络，即求解器和预选者，这些神经网络对隐藏的物理约束产生了可解释的神经表示。在经过联合训练之后，求解器网络将近似潜在的候选物，例如部分衍生物，然后将其馈送到稀疏的回归算法中，该算法最初公布了最有可能的PERSIMISIAL PDE，根据信息标准决定。其次，我们提出了基于离散的傅立叶变换（DFT）的Denoising物理信息信息网络（DPINNS），以提供一组最佳的鉴定PDE系数，以符合降低降噪变量。 Denoising Pinns的结构被划分为前沿投影网络和PINN，以前学到的求解器初始化。我们对五个规范PDE的广泛实验确认，该拟议框架为PDE发现提供了一种可靠，可解释的方法，适用于广泛的系统，可能会因噪声而复杂。

translated by 谷歌翻译

Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems

Takuma Udagawa , Masayuki Suzuki , Gakuto Kurata , Nobuyasu Itoh , George Saon

分类：自然语言处理

2022-04-01

大规模的语言模型（LLM），例如GPT-2，BERT和ROBERTA已成功应用于ASR N-OX-t-bess Rescore。但是，在最新的ASR系统附近，它们是否或如何使竞争性受益。在这项研究中，我们将LLM重新验证纳入最具竞争力的ASR基准之一：构象异构体模型。我们证明，LLM的双向，预处理，内域填充和上下文增强可以实现一致的改进。此外，我们的词汇分析阐明了这些组件中的每一个如何有助于ASR性能。

translated by 谷歌翻译

Survey of Hallucination in Natural Language Generation

Ziwei Ji , Nayeon Lee , Rita Frieske , Tiezheng Yu , Dan Su , Yan Xu , Etsuko Ishii , Yejin Bang , Wenliang Dai , Andrea Madotto

分类：自然语言处理

2022-02-08

Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.

translated by 谷歌翻译