智能论文笔记

Multimodal Machine Learning for Automated ICD Coding

Keyang Xu , Mike Lam , Jingzhi Pang , Xin Gao , Charlotte Band , Piyush Mathur , Frank Papay , Ashish K. Khanna , Jacek B. Cywinski , Kamal Maheshwari

分类：机器学习 | (统计)机器学习

2018-10-31

这项研究提出了一个多模式的机器学习模型，以预测ICD-10诊断代码。我们开发了单独的机器学习模型，可以处理来自不同模式的数据，包括非结构化文本，半结构化文本和结构化表格数据。我们进一步采用了合奏方法来集成所有模式特异性模型以生成ICD-10代码。还提取了主要证据，以使我们的预测更具说服力和可解释。我们使用医学信息集市进行重症监护III（模拟-III）数据集来验证我们的方法。对于ICD代码预测，我们的表现最佳模型（Micro-F1 = 0.7633，Micro-AUC = 0.9541）显着超过其他基线模型，包括TF-IDF（Micro-F1 = 0.6721，Micro-AUC = 0.7879）和Text-CNN模型（Micro-F1 = 0.6569，Micro-AUC = 0.9235）。为了解释性，我们的方法在文本数据上实现了JACCARD相似性系数（JSC）为0.1806，在表格数据上分别获得了0.3105，训练有素的医生分别达到0.2780和0.5002。

translated by 谷歌翻译

Learning from Mixed Datasets: A Monotonic Image Quality Assessment Model

Zhaopeng Feng , Keyang Zhang , Baoliang Chen , Shiqi Wang

分类：计算机视觉 | 机器学习

2022-09-21

基于深度学习的图像质量评估（IQA）模型通常会学会从单个数据集中预测图像质量，从而导致该模型过度适合特定的场景。为此，混合的数据集培训可以是增强模型概括能力的有效方法。但是，将不同的iQA数据集组合在一起是无聊的，因为它们的质量评估标准，评分范围，视图条件以及在图像质量注释期间通常不共享主题。在本文中，我们没有对注释对准注释，而是为IQA模型学习提供了一个单调的神经网络，其中包括不同的数据集。特别是，我们的模型由数据集共享的质量回归器和几个特定于数据集的质量变压器组成。质量回归器旨在获得每个数据集的感知质量，而每个质量变压器则将感知质量映射到相应的数据集注释及其单调性。实验结果验证了提出的学习策略的有效性，我们的代码可在https://github.com/fzp0424/monotoniciqa上获得。

translated by 谷歌翻译

Intellectual Property Evaluation Utilizing Machine Learning

Jinxin Ding , Yuxin Huang , Keyang Ni , Xueyao Wang , Yinxiao Wang , Yucheng Wang

分类：人工智能

2022-08-18

智力特性在经济发展中越来越重要。为了通过IP评估中的传统方法来解决疼痛点，我们正在以机器学习为核心开发一项新技术。我们已经建立了一个在线平台，并将在大湾地区扩展我们的业务。

translated by 谷歌翻译

Grasping Core Rules of Time Series through Pure Models

Gedi Liu , Yifeng Jiang , Yi Ouyang , Keyang Zhong , Yang Wang

分类：机器学习 | 人工智能 | (统计)机器学习

2022-08-15

时间序列与许多其他机器学习领域一样，从统计学到深度学习进行了过渡。尽管随着模型在许多公开可用的数据集中的更新时，似乎精度一直在提高，但通常只会将比例尺增加几倍，以换取准确性的略有差异。通过该实验，我们指出了不同的思维方式，时间序列，尤其是长期预测，可能与其他领域有所不同。不必使用广泛而复杂的模型来掌握时间序列的所有方面，而是使用纯模型来掌握时间序列的核心规则。有了这个简单但有效的想法，我们创建了Purets，这是一个具有三个纯线性层的网络，在80％的长序列预测任务中实现了最新的，同时几乎是最轻的模型，并且运行速度最快。在此基础上，我们讨论了纯线性层在现象和本质中的潜力。理解核心法律的能力有助于长距离预测的高精度，并且合理的波动可以防止其扭曲多步预测中的曲线，例如主流深度学习模型，该模型总结为纯粹的线性神经网络，避免了范围 - 覆盖。最后，我们建议轻巧长时间时间序列任务的基本设计标准：输入和输出应尝试具有相同的维度，并且结构避免了碎片化和复杂的操作。

translated by 谷歌翻译

TOCH: Spatio-Temporal Object-to-Hand Correspondence for Motion Refinement

Keyang Zhou , Bharat Lal Bhatnagar , Jan Eric Lenssen , Gerard Pons-Moll

分类：计算机视觉

2022-05-16

我们提出了TOCH，这是一种使用数据先验来完善不正确的3D手对象交互序列的方法。现有的手动跟踪器，尤其是那些依靠很少相机的手动跟踪器，通常会通过手动相交或缺失的触点产生视觉上不切实际的结果。尽管纠正此类错误需要有关交互的时间方面的推理，但大多数以前的作品都集中在静态抓取和触点上。我们方法的核心是Toch Fields，这是一种新颖的时空表示，用于在交互过程中建模手和物体之间的对应关系。 Toch字段是一个以对象为中心的表示，它相对于对象编码手的位置。利用这种新颖的表示，我们学习了具有暂时性的自动编码器的合理象征领域的潜在流形。实验表明，Toch优于最先进的3D手动相互作用模型，这些模型仅限于静态抓取和触点。更重要的是，我们的方法甚至在接触之前和之后都会产生平滑的相互作用。使用单个训练有素的TOCH模型，我们定量和定性地证明了其有用性，可用于纠正现成的RGB/RGB/RGB-D手动重建方法，并跨对象传输grasps。

translated by 谷歌翻译

Look, Cast and Mold: Learning 3D Shape Manifold from Single-view Synthetic Data

Qianyu Feng , Yawei Luo , Keyang Luo , Yi Yang

分类：计算机视觉

2021-03-08

推断现实世界中物体的立体结构是一项具有挑战性但实用的任务。为了配备深层模型，通常需要大量的3D监督，这很难获得。有希望的是，我们可以简单地从合成数据中受益，其中成对地面真相很容易访问。然而，考虑到变体的纹理，形状和上下文，域间隙并非平凡。为了克服这些困难，我们提出了一个称为VPAN的单视3D重建的粘性感知自适应网络。为了将模型概括为真实的场景，我们建议实现几个方面：（1）外观：视觉上从单个视图中纳入空间结构，以增强表示表示的表现力；（2）铸造：在感知上将2D图像特征与具有跨模式语义对比度映射的3D形状先验对齐；（3）模具：通过将嵌入到所需的歧管中来重建目标的立体形状。对几个基准测试的广泛实验证明了拟议方法通过单视图从合成数据学习3D形状的歧管的有效性和鲁棒性。所提出的方法优于iOU 0.292和cd 0.108上的Pix3D数据集上的最先进的方法，并在Pascal 3D+上达到0.329和CD 0.104。

translated by 谷歌翻译

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

Yue Han , Jiangning Zhang , Zhucun Xue , Chao Xu , Xintian Shen , Yabiao Wang , Chengjie Wang , Yong Liu , Xiangtai Li

分类：计算机视觉

2023-01-03

Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.

translated by 谷歌翻译

AI in HCI Design and User Experience

Wei Xu

分类：人工智能

2023-01-03

In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.

translated by 谷歌翻译

More is Better: A Database for Spontaneous Micro-Expression with High Frame Rates

Sirui Zhao , Huaying Tang , Xinglong Mao , Shifeng Liu , Hanqing Tao , Hao Wang , Tong Xu , Enhong Chen

分类：计算机视觉

2023-01-03

As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.

translated by 谷歌翻译

Surveillance Face Anti-spoofing

Hao Fang , Ajian Liu , Jun Wan , Sergio Escalera , Chenxu Zhao , Xu Zhang , Stan Z. Li , Zhen Lei

分类：计算机视觉

2023-01-03

Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.

translated by 谷歌翻译