智能论文笔记

TabText: a Systematic Approach to Aggregate Knowledge Across Tabular Data Structures

Dimitris Bertsimas , Kimberly Villalobos Carballo , Yu Ma , Liangyuan Na , Léonard Boussioux , Cynthia Zeng , Luis R. Soenksen , Ignacio Fuentes

分类：机器学习

2022-06-21

以富有成效和有效的方式处理和分析表格数据对于在医疗保健等领域的成功应用程序中的成功应用至关重要。但是，缺乏代表和标准化表格信息的统一框架对研究人员和专业人员都构成了重大挑战。在这项工作中，我们介绍了TabText，一种利用语言的非结构化数据格式的方法论，可以有效，准确地从不同的表结构和时间段编码表格数据。我们使用两个医疗保健数据集和四个预测任务，这些任务通过TabText提取的特征优于传统处理方法提取的那些提取的任务，而这些任务的功能却高于2-5％。此外，我们分析了框架对缺失价值观，元信息和语言描述性句子表示的不同选择的敏感性，并为赢得改善绩效的策略提供了见解。

translated by 谷歌翻译

Integrated multimodal artificial intelligence framework for healthcare applications

Luis R. Soenksen , Yu Ma , Cynthia Zeng , Leonard D. J. Boussioux , Kimberly Villalobos Carballo , Liangyuan Na , Holly M. Wiberg , Michael L. Li , Ignacio Fuentes , Dimitris Bertsimas

分类：机器学习 | 人工智能

2022-02-25

人工智能（AI）系统在接下来的几十年中有很大的希望可以改善医疗保健。具体而言，利用多个数据源和输入模式的AI系统有望成为一种可行的方法，可以在广泛的应用程序中提供更准确的结果和可部署的管道。在这项工作中，我们提出并评估一个统一的医学中的整体AI（HAIM）框架，以促进利用多模式输入的AI系统的生成和测试。我们的方法使用可通用的数据预处理和机器学习建模阶段，可以很容易地适应医疗保健环境中的研究和部署。我们通过训练和表征基于MIMIC-IV-MM的14,324个独立模型来评估我们的HAIM框架，该模型是一种多模式临床数据库（n = 34,537个样本），其中包含7,279个独特的住院和6,485名患者，涵盖了4个数据模态的所有可能输入组合（即，所有可能的输入组合）表格，时间序列，文本和图像），11个独特的数据源和12个预测任务。我们表明，该框架可以始终如一地生产出在各种医疗保健示范中超过相似的单源方法的模型（乘以6-33％），包括10种不同的胸部病理学诊断，以及休息时间和48小时的死亡率预测。我们还使用Shapley值量化了每种模式和数据源的贡献，这证明了数据类型重要性的异质性以及在不同医疗保健相关的任务中多模式输入的必要性。我们的整体医学AI（HAIM）框架的可推广性能和灵活性可以为未来的临床和运营医疗环境中的多模式预测系统提供有希望的途径。

translated by 谷歌翻译

Financial Risk Management on a Neutral Atom Quantum Processor

Lucas Leclerc , Luis Ortiz-Guitierrez , Sebastian Grijalva , Boris Albrecht , Julia R. K. Cline , Vincent E. Elfving , Adrien Signoles , Loïc Henriet , Gianni Del Bimbo , Usman Ayub Sheikh

分类：机器学习

2022-12-06

Machine Learning models capable of handling the large datasets collected in the financial world can often become black boxes expensive to run. The quantum computing paradigm suggests new optimization techniques, that combined with classical algorithms, may deliver competitive, faster and more interpretable models. In this work we propose a quantum-enhanced machine learning solution for the prediction of credit rating downgrades, also known as fallen-angels forecasting in the financial risk management field. We implement this solution on a neutral atom Quantum Processing Unit with up to 60 qubits on a real-life dataset. We report competitive performances against the state-of-the-art Random Forest benchmark whilst our model achieves better interpretability and comparable training times. We examine how to improve performance in the near-term validating our ideas with Tensor Networks-based numerical simulations.

translated by 谷歌翻译

Automated segmentation of microvessels in intravascular OCT images using deep learning

Juhwan Lee , Justin N. Kim , Lia Gomez-Perez , Yazan Gharaibeh , Issam Motairek , Ga-briel T. R. Pereira , Vladislav N. Zimin , Luis A. P. Dallan , Ammar Hoori , Sadeer Al-Kindi

分类：计算机视觉 | 机器学习

2022-10-01

To analyze this characteristic of vulnerability, we developed an automated deep learning method for detecting microvessels in intravascular optical coherence tomography (IVOCT) images. A total of 8,403 IVOCT image frames from 85 lesions and 37 normal segments were analyzed. Manual annotation was done using a dedicated software (OCTOPUS) previously developed by our group. Data augmentation in the polar (r,{\theta}) domain was applied to raw IVOCT images to ensure that microvessels appear at all possible angles. Pre-processing methods included guidewire/shadow detection, lumen segmentation, pixel shifting, and noise reduction. DeepLab v3+ was used to segment microvessel candidates. A bounding box on each candidate was classified as either microvessel or non-microvessel using a shallow convolutional neural network. For better classification, we used data augmentation (i.e., angle rotation) on bounding boxes with a microvessel during network training. Data augmentation and pre-processing steps improved microvessel segmentation performance significantly, yielding a method with Dice of 0.71+/-0.10 and pixel-wise sensitivity/specificity of 87.7+/-6.6%/99.8+/-0.1%. The network for classifying microvessels from candidates performed exceptionally well, with sensitivity of 99.5+/-0.3%, specificity of 98.8+/-1.0%, and accuracy of 99.1+/-0.5%. The classification step eliminated the majority of residual false positives, and the Dice coefficient increased from 0.71 to 0.73. In addition, our method produced 698 image frames with microvessels present, compared to 730 from manual analysis, representing a 4.4% difference. When compared to the manual method, the automated method improved microvessel continuity, implying improved segmentation performance. The method will be useful for research purposes as well as potential future treatment planning.

translated by 谷歌翻译

Don't Take it Personally: Analyzing Gender and Age Differences in Ratings of Online Humor

J. A. Meaney , Steven R. Wilson , Luis Chiruzzo , Walid Magdy

分类：自然语言处理

2022-08-23

计算幽默检测系统很少对幽默反应的主观性进行建模，或者考虑对幽默的替代反应 - 即犯罪。我们分析了不同年龄段的男性和女性注释者的大量幽默和犯罪评级数据集。我们发现女性比男性更强烈地联系这两个概念，她们倾向于给出较低的幽默评分和更高的进攻得分。我们还发现，幽默与犯罪之间的相关性随着年龄的增长而增加。尽管幽默发现没有性别或年龄差异，但女性和较旧的注释者表示，她们比男性更频繁地理解笑话文本。我们讨论对计算幽默检测和下游任务的影响。

translated by 谷歌翻译

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava , Abhinav Rastogi , Abhishek Rao , Abu Awal Md Shoeb , Abubakar Abid , Adam Fisch , Adam R. Brown , Adam Santoro , Aditya Gupta , Adrià Garriga-Alonso

分类：自然语言处理 | 人工智能 | 机器学习 | (统计)机器学习

2022-06-09

语言模型既展示了定量的改进，又展示了新的定性功能，随着规模的增加。尽管它们具有潜在的变革性影响，但这些新能力的特征却很差。为了为未来的研究提供信息，为破坏性的新模型能力做准备，并改善社会有害的效果，至关重要的是，我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战，我们介绍了超越模仿游戏基准（Big Bench）。 Big Bench目前由204个任务组成，由132家机构的442位作者贡献。任务主题是多样的，从语言学，儿童发展，数学，常识性推理，生物学，物理学，社会偏见，软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号，Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为，跨越了数百万到数十亿个参数。此外，一个人类专家评估者团队执行了所有任务，以提供强大的基准。研究结果包括：模型性能和校准都随规模改善，但绝对的术语（以及与评估者的性能相比）；在模型类中的性能非常相似，尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分，而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标；社交偏见通常会随着含糊不清的环境而随着规模而增加，但这可以通过提示来改善。

translated by 谷歌翻译

Automated analysis of fibrous cap in intravascular optical coherence tomography images of coronary arteries

Juhwan Lee , Gabriel T. R. Pereira , Yazan Gharaibeh , Chaitanya Kolluru , Vladislav N. Zimin , Luis A. P. Dallan , Justin N. Kim , Ammar Hoori , Sadeer G. Al-Kindi , Giulio Guagliumi

分类：机器学习 | 计算机视觉

2022-04-21

Thin-cap fibroatheroma (TCFA) and plaque rupture have been recognized as the most frequent risk factor for thrombosis and acute coronary syndrome. Intravascular optical coherence tomography (IVOCT) can identify TCFA and assess cap thickness, which provides an opportunity to assess plaque vulnerability. We developed an automated method that can detect lipidous plaque and assess fibrous cap thickness in IVOCT images. This study analyzed a total of 4,360 IVOCT image frames of 77 lesions among 41 patients. To improve segmentation performance, preprocessing included lumen segmentation, pixel-shifting, and noise filtering on the raw polar (r, theta) IVOCT images. We used the DeepLab-v3 plus deep learning model to classify lipidous plaque pixels. After lipid detection, we automatically detected the outer border of the fibrous cap using a special dynamic programming algorithm and assessed the cap thickness. Our method provided excellent discriminability of lipid plaque with a sensitivity of 85.8% and A-line Dice coefficient of 0.837. By comparing lipid angle measurements between two analysts following editing of our automated software, we found good agreement by Bland-Altman analysis (difference 6.7+/-17 degree; mean 196 degree). Our method accurately detected the fibrous cap from the detected lipid plaque. Automated analysis required a significant modification for only 5.5% frames. Furthermore, our method showed a good agreement of fibrous cap thickness between two analysts with Bland-Altman analysis (4.2+/-14.6 micron; mean 175 micron), indicating little bias between users and good reproducibility of the measurement. We developed a fully automated method for fibrous cap quantification in IVOCT images, resulting in good agreement with determinations by analysts. The method has great potential to enable highly automated, repeatable, and comprehensive evaluations of TCFAs.

translated by 谷歌翻译

The CAMELS project: public data release

Francisco Villaescusa-Navarro , Shy Genel , Daniel Anglés-Alcázar , Lucia A. Perez , Pablo Villanueva-Domingo , Digvijay Wadekar , Helen Shao , Faizan G. Mohammad , Sultan Hassan , Emily Moser

分类：人工智能 | 机器学习

2022-01-04

制定了具有机器学习模拟（骆驼）项目的宇宙学和天体物理学，通过数千名宇宙的流体动力模拟和机器学习将宇宙学与天体物理学结合起来。骆驼包含4,233个宇宙学仿真，2,049个n-body和2,184个最先进的流体动力模拟，在参数空间中采样巨大的体积。在本文中，我们介绍了骆驼公共数据发布，描述了骆驼模拟的特性和由它们产生的各种数据产品，包括光环，次麦，银河系和空隙目录，功率谱，Bispectra，Lyman - $ \ Alpha $光谱，概率分布函数，光环径向轮廓和X射线光子列表。我们还释放了超过骆驼 - 山姆的数十亿个星系的目录：与Santa Cruz半分析模型相结合的大量N身体模拟。我们释放包含350多个Terabytes的所有数据，并包含143,922个快照，数百万光环，星系和摘要统计数据。我们提供有关如何访问，下载，读取和处理数据AT \ URL {https://camels.readthedocs.io}的进一步技术详细信息。

translated by 谷歌翻译

Conservation Tools: The Next Generation of Engineering--Biology Collaborations

Andrew Schulz , Cassie Shriver , Suzanne Stathatos , Benjamin Seleb , Emily Weigel , Young-Hui Chang , M. Saad Bhamla , David Hu , Joseph R. Mendelson III , .

分类：机器学习

2023-01-03

The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.

translated by 谷歌翻译

Through-life Monitoring of Resource-constrained Systems and Fleets

Felipe Montana , Adam Hartwell , Will Jacobs , Visakan Kadirkamanathan , Andrew R Mills , Tom Clark

分类：机器学习

2023-01-03

A Digital Twin (DT) is a simulation of a physical system that provides information to make decisions that add economic, social or commercial value. The behaviour of a physical system changes over time, a DT must therefore be continually updated with data from the physical systems to reflect its changing behaviour. For resource-constrained systems, updating a DT is non-trivial because of challenges such as on-board learning and the off-board data transfer. This paper presents a framework for updating data-driven DTs of resource-constrained systems geared towards system health monitoring. The proposed solution consists of: (1) an on-board system running a light-weight DT allowing the prioritisation and parsimonious transfer of data generated by the physical system; and (2) off-board robust updating of the DT and detection of anomalous behaviours. Two case studies are considered using a production gas turbine engine system to demonstrate the digital representation accuracy for real-world, time-varying physical systems.

translated by 谷歌翻译