智能论文笔记

Decision Support Models for Predicting and Explaining Airport Passenger Connectivity from Data

Marta Guimaraes , Claudia Soares , Rodrigo Ventura

分类：机器学习

2021-11-02

预测连通航班中的乘客将失去他们的联系对于航空公司盈利能力至关重要。我们为不同阶段的连接飞行管理的不同阶段提出了新型机器学习的决策支持模型，即战略，战术，战术和后期行动。我们预测航空公司枢纽机场的错过航班连接，使用航班和乘客的历史数据，分析了对每个决策地平线的预测结果贡献的因素。我们的数据是高维，异质，不平衡和嘈杂的，并且不会通知客人抵达/离境运输时间。我们采用了分类类的概率编码，与高斯混合模型的数据平衡，以及提升。对于所有规划视野，我们的模型将ROC的AUC达到高于0.93。我们模型的Shap值说明表明计划/感知的连接时间对预测的最大贡献，其次是乘客年龄以及是否需要边界控制。

translated by 谷歌翻译

A machine learning model to identify corruption in México's public procurement contracts

Andrés Aldana , Andrea Falcón-Cortés , Hernán Larralde

分类：机器学习

2022-10-25

The costs and impacts of government corruption range from impairing a country's economic growth to affecting its citizens' well-being and safety. Public contracting between government dependencies and private sector instances, referred to as public procurement, is a fertile land of opportunity for corrupt practices, generating substantial monetary losses worldwide. Thus, identifying and deterring corrupt activities between the government and the private sector is paramount. However, due to several factors, corruption in public procurement is challenging to identify and track, leading to corrupt practices going unnoticed. This paper proposes a machine learning model based on an ensemble of random forest classifiers, which we call hyper-forest, to identify and predict corrupt contracts in M\'exico's public procurement data. This method's results correctly detect most of the corrupt and non-corrupt contracts evaluated in the dataset. Furthermore, we found that the most critical predictors considered in the model are those related to the relationship between buyers and suppliers rather than those related to features of individual contracts. Also, the method proposed here is general enough to be trained with data from other countries. Overall, our work presents a tool that can help in the decision-making process to identify, predict and analyze corruption in public procurement contracts.

translated by 谷歌翻译

Shapley value-based approaches to explain the robustness of classifiers in machine learning

Guilherme Dean Pelegrina , Sajid Siraj

分类：机器学习 | 人工智能

2022-09-09

在机器学习中，使用算法 - 不足的方法是一个新兴领域，用于解释单个特征对预测结果的贡献。尽管重点放在解释预测本身上，但已经做了一些解释这些模型的鲁棒性，即每个功能如何有助于实现这种鲁棒性。在本文中，我们建议使用沙普利值来解释每个特征对模型鲁棒性的贡献，该功能以接收器操作特性（ROC）曲线和ROC曲线（AUC）下的面积来衡量。在一个说明性示例的帮助下，我们证明了解释ROC曲线的拟议思想，并可以看到这些曲线中的不确定性。对于不平衡的数据集，使用Precision-Recall曲线（PRC）被认为更合适，因此我们还演示了如何借助Shapley值解释PRC。

translated by 谷歌翻译

A Generic Methodology for the Statistically Uniform & Comparable Evaluation of Automated Trading Platform Components

Artur Sokolovsky , Luca Arnaboldi

分类：机器学习

2020-09-21

尽管机器学习方法已在金融领域广泛使用，但在非常成功的学位上，这些方法仍然可以根据解释性，可比性和可重复性来定制特定研究和不透明。这项研究的主要目的是通过提供一种通用方法来阐明这一领域，该方法是调查 - 不合Snostic且可解释给金融市场从业人员，从而提高了其效率，降低了进入的障碍，并提高了实验的可重复性。提出的方法在两个自动交易平台组件上展示。也就是说，价格水平，众所周知的交易模式和一种新颖的2步特征提取方法。该方法依赖于假设检验，该假设检验在其他社会和科学学科中广泛应用，以有效地评估除简单分类准确性之外的具体结果。提出的主要假设是为了评估所选的交易模式是否适合在机器学习设置中使用。在整个实验中，我们发现在机器学习设置中使用所考虑的交易模式仅由统计数据得到部分支持，从而导致效果尺寸微不足道（反弹7- $ 0.64 \ pm 1.02 $，反弹11 $ 0.38 \ pm 0.98 $，并且篮板15- $ 1.05 \ pm 1.16 $），但允许拒绝零假设。我们展示了美国期货市场工具上的通用方法，并提供了证据表明，通过这种方法，我们可以轻松获得除传统绩效和盈利度指标之外的信息指标。这项工作是最早将这种严格的统计支持方法应用于金融市场领域的工作之一，我们希望这可能是更多研究的跳板。

translated by 谷歌翻译

Improving debris flow evacuation alerts in Taiwan using machine learning

Yi-Lin Tsai , Jeremy Irvin , Suhas Chundi , João Estacio Gaspar Araujo , Andrew Y. Ng , Christopher B. Field , Peter K. Kitanidis

分类：机器学习 | 人工智能

2022-08-27

台湾对全球碎片流的敏感性和死亡人数最高。台湾现有的碎屑流警告系统，该系统使用降雨量的时间加权度量，当该措施超过预定义的阈值时，会导致警报。但是，该系统会产生许多错误的警报，并错过了实际碎屑流的很大一部分。为了改善该系统，我们实施了五个机器学习模型，以输入历史降雨数据并预测是否会在选定的时间内发生碎屑流。我们发现，随机的森林模型在五个模型中表现最好，并优于台湾现有系统。此外，我们确定了与碎屑流的发生密切相关的降雨轨迹，并探索了缺失碎屑流的风险与频繁的虚假警报之间的权衡。这些结果表明，仅在小时降雨数据中训练的机器学习模型的潜力可以挽救生命，同时减少虚假警报。

translated by 谷歌翻译

User-click Modelling for Predicting Purchase Intent

Simone Borg Bruun

分类：机器学习

2021-12-03

本文使用机器学习方法对建模用户行为进行建模的开放精算数学问题，以预测非寿命保险产品的购买意图。一家公司了解用户与其网站的互动是有价值的，因为它为消费者行为提供了丰富和个性化的洞察力。用户行为建模的大多数现有研究旨在解释或预测搜索引擎结果页面或在赞助搜索中估计点击率。这些模型基于关于网页的用户检测模式的概念和网页的项目表示。调查建模用户行为以预测商业网站的购买意图的问题，我们观察到用户的意图会产生高依赖，对用户如何在用户访问的不同网页的方式导航网站，什么样的网页用户互动，用户在每个网页上花了多少时间。灵感来自这些发现，我们提出了两种不同的方式代表用户会话的特征，导致了基于用户点击的购买预测的两个模型：一个基于馈送前向神经网络，另一个基于经常性神经网络。我们通过使用用户的人口统计特征将上述两种模型与模型进行比较，检查用户点击用户点击的歧视以预测购买意图。我们的实验结果表明，根据标准分类评估指标，我们的点击基础模型显着优于人口统计模型，并且基于用户点击的顺序表示的模型比基于点击特征工程的模型产生略大的性能。

translated by 谷歌翻译

Deep Learning based Urban Vehicle Trajectory Analytics

Seongjin Choi

分类：机器学习

2021-11-15

“轨迹”是指由地理空间中的移动物体产生的迹线，通常由一系列按时间顺序排列的点表示，其中每个点由地理空间坐标集和时间戳组成。位置感应和无线通信技术的快速进步使我们能够收集和存储大量的轨迹数据。因此，许多研究人员使用轨迹数据来分析各种移动物体的移动性。在本文中，我们专注于“城市车辆轨迹”，这是指城市交通网络中车辆的轨迹，我们专注于“城市车辆轨迹分析”。城市车辆轨迹分析提供了前所未有的机会，可以了解城市交通网络中的车辆运动模式，包括以用户为中心的旅行经验和系统范围的时空模式。城市车辆轨迹数据的时空特征在结构上相互关联，因此，许多先前的研究人员使用了各种方法来理解这种结构。特别是，由于其强大的函数近似和特征表示能力，深度学习模型是由于许多研究人员的注意。因此，本文的目的是开发基于深度学习的城市车辆轨迹分析模型，以更好地了解城市交通网络的移动模式。特别是，本文重点介绍了两项研究主题，具有很高的必要性，重要性和适用性：下一个位置预测，以及合成轨迹生成。在这项研究中，我们向城市车辆轨迹分析提供了各种新型模型，使用深度学习。

translated by 谷歌翻译

Comparison and Evaluation of Methods for a Predict+Optimize Problem in Renewable Energy

Christoph Bergmeir , Frits de Nijs , Abishek Sriramulu , Mahdi Abolghasemi , Richard Bean , John Betts , Quang Bui , Nam Trong Dinh , Nils Einecke , Rasul Esmaeilbeigi

分类：人工智能

2022-12-21

Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method.

translated by 谷歌翻译

Multi-Objective Hyperparameter Optimization -- An Overview

Florian Karl , Tobias Pielok , Julia Moosbauer , Florian Pfisterer , Stefan Coors , Martin Binder , Lennart Schneider , Janek Thomas , Jakob Richter , Michel Lang

分类：机器学习 | (统计)机器学习

2022-06-15

超参数优化构成了典型的现代机器学习工作流程的很大一部分。这是由于这样一个事实，即机器学习方法和相应的预处理步骤通常只有在正确调整超参数时就会产生最佳性能。但是在许多应用中，我们不仅有兴趣仅仅为了预测精度而优化ML管道；确定最佳配置时，必须考虑其他指标或约束，从而导致多目标优化问题。由于缺乏知识和用于多目标超参数优化的知识和容易获得的软件实现，因此通常在实践中被忽略。在这项工作中，我们向读者介绍了多个客观超参数优化的基础知识，并激励其在应用ML中的实用性。此外，我们从进化算法和贝叶斯优化的领域提供了现有优化策略的广泛调查。我们说明了MOO在几个特定ML应用中的实用性，考虑了诸如操作条件，预测时间，稀疏，公平，可解释性和鲁棒性之类的目标。

translated by 谷歌翻译

Integrating Machine Learning with Discrete Event Simulation for Improving Health Referral Processing in a Care Management Setting

Mohammed Mahyoub

分类：机器学习

2022-06-25

入院后护理管理协调患者的转诊，以改善从医院出院，尤其是老年人和长期患者。在护理管理环境中，健康转诊是由托管护理组织（MCO）的专业部门处理的，该部门与许多其他实体进行互动，包括住院医院，保险公司和入院后护理提供者。在本文中，提出了一个机器学习引导的离散事件仿真框架，以改善健康推荐处理。开发了基于随机福雷林的预测模型来预测LOS和推荐类型。构建了两个仿真模型，以代表转介处理系统和智能系统的AS配置，分别合并了预测功能。通过将推荐处理系统的预测模块合并以计划和优先级推荐，在减少平均转介创建延迟时间方面增强了整体性能。这项研究将强调放电后护理管理在改善健康质量和降低相关成本方面的作用。此外，本文演示了如何使用集成系统工程方法来改进复杂的医疗系统的过程。

translated by 谷歌翻译

Applying Machine Learning to Life Insurance: some knowledge sharing to master it

Antoine Chancel , Laura Bradier , Antoine Ly , Razvan Ionescu , Laurene Martin

分类： (统计)机器学习 | 机器学习

2022-09-05

机器学习渗透到许多行业，这为公司带来了新的利益来源。然而，在人寿保险行业中，机器学习在实践中并未被广泛使用，因为在过去几年中，统计模型表明了它们的风险评估效率。因此，保险公司可能面临评估人工智能价值的困难。随着时间的流逝，专注于人寿保险行业的修改突出了将机器学习用于保险公司的利益以及通过释放数据价值带来的利益。本文回顾了传统的生存建模方法论，并通过机器学习技术扩展了它们。它指出了与常规机器学习模型的差异，并强调了特定实现在与机器学习模型家族中面对审查数据的重要性。在本文的补充中，已经开发了Python库。已经调整了不同的开源机器学习算法，以适应人寿保险数据的特殊性，即检查和截断。此类模型可以轻松地从该SCOR库中应用，以准确地模拟人寿保险风险。

translated by 谷歌翻译

Understanding transit ridership in an equity context through a comparison of statistical and machine learning algorithms

Elnaz Yousefzadeh Barri , Steven Farber , Hadi Jahanshahi , Eda Beyazit

分类：机器学习

2022-11-30

Building an accurate model of travel behaviour based on individuals' characteristics and built environment attributes is of importance for policy-making and transportation planning. Recent experiments with big data and Machine Learning (ML) algorithms toward a better travel behaviour analysis have mainly overlooked socially disadvantaged groups. Accordingly, in this study, we explore the travel behaviour responses of low-income individuals to transit investments in the Greater Toronto and Hamilton Area, Canada, using statistical and ML models. We first investigate how the model choice affects the prediction of transit use by the low-income group. This step includes comparing the predictive performance of traditional and ML algorithms and then evaluating a transit investment policy by contrasting the predicted activities and the spatial distribution of transit trips generated by vulnerable households after improving accessibility. We also empirically investigate the proposed transit investment by each algorithm and compare it with the city of Brampton's future transportation plan. While, unsurprisingly, the ML algorithms outperform classical models, there are still doubts about using them due to interpretability concerns. Hence, we adopt recent local and global model-agnostic interpretation tools to interpret how the model arrives at its predictions. Our findings reveal the great potential of ML algorithms for enhanced travel behaviour predictions for low-income strata without considerably sacrificing interpretability.

translated by 谷歌翻译

Augmented cross-selling through explainable AI -- a case from energy retailing

Felix Haag , Konstantin Hopf , Pedro Menelau Vasconcelos , Thorsten Staake

分类：机器学习 | 人工智能

2022-08-24

机器学习的进步（ML）引起了人们对这项技术支持决策的浓厚兴趣。尽管复杂的ML模型提供的预测通常比传统工具的预测更准确，但这种模型通常隐藏了用户预测背后的推理，这可能导致采用和缺乏洞察力。在这种张力的激励下，研究提出了可解释的人工智能（XAI）技术，这些技术发现了ML发现的模式。尽管ML和XAI都有很高的希望，但几乎没有经验证据表明传统企业的好处。为此，我们分析了220,185家能源零售商的客户的数据，预测具有多达86％正确性的交叉购买（AUC），并表明XAI方法的Shap提供了为实际买家提供的解释。我们进一步概述了信息系统，XAI和关系营销中的研究的影响。

translated by 谷歌翻译

A Comprehensive Review of Digital Twin -- Part 2: Roles of Uncertainty Quantification and Optimization, a Battery Digital Twin, and Perspectives

Adam Thelen , Xiaoge Zhang , Olga Fink , Yan Lu , Sayan Ghosh , Byeng D. Youn , Michael D. Todd , Sankaran Mahadevan , Chao Hu , Zhen Hu

分类：机器学习

2022-08-27

作为行业4.0时代的一项新兴技术，数字双胞胎因其承诺进一步优化流程设计，质量控制，健康监测，决策和政策制定等，通过全面对物理世界进行建模，以进一步优化流程设计，质量控制，健康监测，决策和政策，因此获得了前所未有的关注。互连的数字模型。在一系列两部分的论文中，我们研究了不同建模技术，孪生启用技术以及数字双胞胎常用的不确定性量化和优化方法的基本作用。第二篇论文介绍了数字双胞胎的关键启示技术的文献综述，重点是不确定性量化，优化方法，开源数据集和工具，主要发现，挑战和未来方向。讨论的重点是当前的不确定性量化和优化方法，以及如何在数字双胞胎的不同维度中应用它们。此外，本文介绍了一个案例研究，其中构建和测试了电池数字双胞胎，以说明在这两部分评论中回顾的一些建模和孪生方法。 GITHUB上可以找到用于生成案例研究中所有结果和数字的代码和预处理数据。

translated by 谷歌翻译

AcME -- Accelerated Model-agnostic Explanations: Fast Whitening of the Machine-Learning Black Box

David Dandolo , Chiara Masiero , Mattia Carletti , Davide Dalle Pezze , Gian Antonio Susto

分类：机器学习

2021-12-23

在人类循环机器学习应用程序的背景下，如决策支持系统，可解释性方法应在不使用户等待的情况下提供可操作的见解。在本文中，我们提出了加速的模型 - 不可知论解释（ACME），一种可解释的方法，即在全球和本地层面迅速提供特征重要性分数。可以将acme应用于每个回归或分类模型的后验。 ACME计算功能排名不仅提供了一个什么，但它还提供了一个用于评估功能值的变化如何影响模型预测的原因 - 如果分析工具。我们评估了综合性和现实世界数据集的建议方法，同时也与福芙添加剂解释（Shap）相比，我们制作了灵感的方法，目前是最先进的模型无关的解释性方法。我们在生产解释的质量方面取得了可比的结果，同时急剧减少计算时间并为全局和局部解释提供一致的可视化。为了促进该领域的研究，为重复性，我们还提供了一种存储库，其中代码用于实验。

translated by 谷歌翻译

Analyzing Machine Learning Models for Credit Scoring with Explainable AI and Optimizing Investment Decisions

Swati Tyagi

分类：机器学习 | (统计)机器学习

2022-09-19

本文研究了与可解释的AI（XAI）实践有关的两个不同但相关的问题。机器学习（ML）在金融服务中越来越重要，例如预批准，信用承销，投资以及各种前端和后端活动。机器学习可以自动检测培训数据中的非线性和相互作用，从而促进更快，更准确的信用决策。但是，机器学习模型是不透明的，难以解释，这是建立可靠技术所需的关键要素。该研究比较了各种机器学习模型，包括单个分类器（逻辑回归，决策树，LDA，QDA），异质集合（Adaboost，随机森林）和顺序神经网络。结果表明，整体分类器和神经网络的表现优于表现。此外，使用基于美国P2P贷款平台Lending Club提供的开放式访问数据集评估了两种先进的事后不可解释能力 - 石灰和外形来评估基于ML的信用评分模型。对于这项研究，我们还使用机器学习算法来开发新的投资模型，并探索可以最大化盈利能力同时最大程度地降低风险的投资组合策略。

translated by 谷歌翻译

Automatic Identification and Classification of Share Buybacks and their Effect on Short-, Mid- and Long-Term Returns

Thilo Reintjes

分类：人工智能 | 机器学习

2022-09-26

本文调查了股票回购，特别是分享回购公告。它解决了如何识别此类公告，股票回购的超额回报以及股票回购公告后的回报的预测。我们说明了两种NLP方法，用于自动检测股票回购公告。即使有少量的培训数据，我们也可以达到高达90％的准确性。该论文利用这些NLP方法生成一个由57,155个股票回购公告组成的大数据集。通过分析该数据集，本论文的目的是表明大多数宣布回购的公司的大多数公司都表现不佳。但是，少数公司的表现极大地超过了MSCI世界。当查看所有公司的平均值时，这种重要的表现过高会导致净收益。如果根据公司的规模调整了基准指数，则平均表现过高，并且大多数表现不佳。但是，发现宣布股票回购的公司至少占其市值的1％，即使使用调整后的基准，也平均交付了显着的表现。还发现，在危机时期宣布股票回购的公司比整个市场更好。此外，生成的数据集用于训练72个机器学习模型。通过此，它能够找到许多可以达到高达77％并产生大量超额回报的策略。可以在六个不同的时间范围内改善各种性能指标，并确定明显的表现。这是通过训练多个模型的不同任务和时间范围以及结合这些不同模型的方法来实现的，从而通过融合弱学习者来产生重大改进，以创造一个强大的学习者。

translated by 谷歌翻译

IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective

Li Yang , Abdallah Shami

分类：机器学习

2022-09-16

近年来，随着传感器和智能设备的广泛传播，物联网（IoT）系统的数据生成速度已大大增加。在物联网系统中，必须经常处理，转换和分析大量数据，以实现各种物联网服务和功能。机器学习（ML）方法已显示出其物联网数据分析的能力。但是，将ML模型应用于物联网数据分析任务仍然面临许多困难和挑战，特别是有效的模型选择，设计/调整和更新，这给经验丰富的数据科学家带来了巨大的需求。此外，物联网数据的动态性质可能引入概念漂移问题，从而导致模型性能降解。为了减少人类的努力，自动化机器学习（AUTOML）已成为一个流行的领域，旨在自动选择，构建，调整和更新机器学习模型，以在指定任务上实现最佳性能。在本文中，我们对Automl区域中模型选择，调整和更新过程中的现有方法进行了审查，以识别和总结将ML算法应用于IoT数据分析的每个步骤的最佳解决方案。为了证明我们的发现并帮助工业用户和研究人员更好地实施汽车方法，在这项工作中提出了将汽车应用于IoT异常检测问题的案例研究。最后，我们讨论并分类了该领域的挑战和研究方向。

translated by 谷歌翻译

Explainable Performance

Hué Sullivan , Hurlin Christophe , Pérignon Christophe , Saurin Sébastien

分类： (统计)机器学习 | 机器学习

2022-12-12

We introduce the XPER (eXplainable PERformance) methodology to measure the specific contribution of the input features to the predictive or economic performance of a model. Our methodology offers several advantages. First, it is both model-agnostic and performance metric-agnostic. Second, XPER is theoretically founded as it is based on Shapley values. Third, the interpretation of the benchmark, which is inherent in any Shapley value decomposition, is meaningful in our context. Fourth, XPER is not plagued by model specification error, as it does not require re-estimating the model. Fifth, it can be implemented either at the model level or at the individual level. In an application based on auto loans, we find that performance can be explained by a surprisingly small number of features. XPER decompositions are rather stable across metrics, yet some feature contributions switch sign across metrics. Our analysis also shows that explaining model forecasts and model performance are two distinct tasks.

translated by 谷歌翻译

A survey on concept drift adaptation

分类：

Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this paper we characterize adaptive learning process, categorize existing strategies for handling concept drift, overview the most representative, distinct and popular techniques and algorithms, discuss evaluation methodology of adaptive algorithms, and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state-of-the-art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts and practitioners.

translated by 谷歌翻译