我们使用基于模型的k均值算法的概括来提出一种聚类时间序列数据的方法,我们称之为k模型。我们证明了该一般算法的收敛性,并将其与用于混合模型的硬EM算法相关联。然后,我们首先使用AR($ p $)聚类示例应用我们的方法,并展示如何使用最小值的偏置偏差标准使群集算法变得可靠。然后,我们为ARMA($ P,Q $)构建了聚类算法,并将其扩展到Arima($ P,D,Q $)。我们针对基于Ljung-Box统计量拟合的模型开发了拟合统计量的优点。我们使用模拟数据执行实验,以说明如何将算法用于离群检测,检测分布漂移以及讨论初始化方法对空簇的影响。我们还对真实数据进行实验,该实验表明我们的方法与其他现有方法竞争类似的时间序列聚类任务。
translated by 谷歌翻译
我们提出了贝叶斯溢出图(BSG),这是一种学习时间关系,识别关键节点的新方法,并量化了动态系统中多种植体溢出效应的不确定性。BSG通过预测误差差异分解(FEVD)和通过贝叶斯时间序列模型的全面不确定性定量来利用可解释的框架,以根据系统性风险和预测可变性来使时间关系与时间关系进行环境化。预测地平线超参数$ h $允许学习短期和均衡状态网络行为。在各种图和误差规格下识别源和下沉节点的实验显示,针对最先进的贝叶斯网络和深度学习基线的实验表现出显着的性能增长。对现实世界系统的应用还展示了BSG作为探索性分析工具,用于发现间接溢出和量化系统性风险。
translated by 谷歌翻译
我们提出了一种基于生成的对冲网络(GANS)的扩展缺失数据载体方法的条件载荷GaN。激励用例是学习 - 排名,现代搜索,推荐系统和信息检索应用的基石。经验排名数据集并不总是遵循标准高斯分布或完全缺少随机(MCAR)机制,这是经典缺失数据载销方法的标准假设。我们的方法提供了一种简单的解决方案,可提供兼容的估算保证,同时放松缺失机制的假设和近似顽固的分布以提高估算质量。我们证明,对于随机(EMAR)的延伸缺失,实现了最佳的GaN载荷,并且在无随机(OAMAR)机制之外,延伸总是缺少的,超出天真MCAR。我们的方法展示了与最先进的基准和各种特征分布相比的开源Microsoft研究排名(MSR)数据集和合成排名数据集的最高估算质量。使用专有的Amazon搜索排名数据集,我们还展示了与地面真实数据相比训练的对GaN illuted数据训练的排名模型的可比排名质量指标。
translated by 谷歌翻译
Recent years have seen a proliferation of research on adversarial machine learning. Numerous papers demonstrate powerful algorithmic attacks against a wide variety of machine learning (ML) models, and numerous other papers propose defenses that can withstand most attacks. However, abundant real-world evidence suggests that actual attackers use simple tactics to subvert ML-driven systems, and as a result security practitioners have not prioritized adversarial ML defenses. Motivated by the apparent gap between researchers and practitioners, this position paper aims to bridge the two domains. We first present three real-world case studies from which we can glean practical insights unknown or neglected in research. Next we analyze all adversarial ML papers recently published in top security conferences, highlighting positive trends and blind spots. Finally, we state positions on precise and cost-driven threat modeling, collaboration between industry and academia, and reproducible research. We believe that our positions, if adopted, will increase the real-world impact of future endeavours in adversarial ML, bringing both researchers and practitioners closer to their shared goal of improving the security of ML systems.
translated by 谷歌翻译
Deep learning classifiers provide the most accurate means of automatically diagnosing diabetic retinopathy (DR) based on optical coherence tomography (OCT) and its angiography (OCTA). The power of these models is attributable in part to the inclusion of hidden layers that provide the complexity required to achieve a desired task. However, hidden layers also render algorithm outputs difficult to interpret. Here we introduce a novel biomarker activation map (BAM) framework based on generative adversarial learning that allows clinicians to verify and understand classifiers decision-making. A data set including 456 macular scans were graded as non-referable or referable DR based on current clinical standards. A DR classifier that was used to evaluate our BAM was first trained based on this data set. The BAM generation framework was designed by combing two U-shaped generators to provide meaningful interpretability to this classifier. The main generator was trained to take referable scans as input and produce an output that would be classified by the classifier as non-referable. The BAM is then constructed as the difference image between the output and input of the main generator. To ensure that the BAM only highlights classifier-utilized biomarkers an assistant generator was trained to do the opposite, producing scans that would be classified as referable by the classifier from non-referable scans. The generated BAMs highlighted known pathologic features including nonperfusion area and retinal fluid. A fully interpretable classifier based on these highlights could help clinicians better utilize and verify automated DR diagnosis.
translated by 谷歌翻译
Recent advancements in sensing and communication facilitate obtaining high-frequency real-time data from various physical systems like power networks, climate systems, biological networks, etc. However, since the data are recorded by physical sensors, it is natural that the obtained data is corrupted by measurement noise. In this paper, we present a novel algorithm for online real-time learning of dynamical systems from noisy time-series data, which employs the Robust Koopman operator framework to mitigate the effect of measurement noise. The proposed algorithm has three main advantages: a) it allows for online real-time monitoring of a dynamical system; b) it obtains a linear representation of the underlying dynamical system, thus enabling the user to use linear systems theory for analysis and control of the system; c) it is computationally fast and less intensive than the popular Extended Dynamic Mode Decomposition (EDMD) algorithm. We illustrate the efficiency of the proposed algorithm by applying it to identify the Van der Pol oscillator, the IEEE 68 bus system, and a ring network of Van der Pol oscillators.
translated by 谷歌翻译
Pairwise compatibility measure (CM) is a key component in solving the jigsaw puzzle problem (JPP) and many of its recently proposed variants. With the rapid rise of deep neural networks (DNNs), a trade-off between performance (i.e., accuracy) and computational efficiency has become a very significant issue. Whereas an end-to-end DNN-based CM model exhibits high performance, it becomes virtually infeasible on very large puzzles, due to its highly intensive computation. On the other hand, exploiting the concept of embeddings to alleviate significantly the computational efficiency, has resulted in degraded performance, according to recent studies. This paper derives an advanced CM model (based on modified embeddings and a new loss function, called hard batch triplet loss) for closing the above gap between speed and accuracy; namely a CM model that achieves SOTA results in terms of performance and efficiency combined. We evaluated our newly derived CM on three commonly used datasets, and obtained a reconstruction improvement of 5.8% and 19.5% for so-called Type-1 and Type-2 problem variants, respectively, compared to best known results due to previous CMs.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
了解强化学习(RL)代理的新兴行为可能很困难,因为这种代理通常使用高度复杂的决策程序在复杂的环境中进行训练。这引起了RL中解释性的多种方法,旨在调和可能在主体行为与观察者预期的行为之间产生的差异。最近的方法取决于域知识,这可能并非总是可用的,分析代理商的策略,或者是对基础环境的特定要素的分析,通常被建模为马尔可夫决策过程(MDP)。我们的主要主张是,即使基本的MDP尚不完全了解(例如,尚未准确地了解过渡概率),也没有由代理商维护(即,在使用无模型方法时),但仍可以利用它为自动生成解释。为此,我们建议使用以前在文献中使用的正式MDP抽象和转换来加快寻找最佳策略的搜索,以自动产生解释。由于这种转换通常基于环境的符号表示,因此它们可能代表了预期和实际代理行为之间差距的有意义的解释。我们正式定义了这个问题,建议一类可用于解释新兴行为的转换,并提出了有效搜索解释的方法。我们演示了一组标准基准测试的方法。
translated by 谷歌翻译