我们考虑在平均场比赛中在线加强学习。与现有作品相反,我们通过开发一种使用通用代理的单个样本路径来估算均值场和最佳策略的算法来减轻对均值甲骨文的需求。我们称此沙盒学习为其,因为它可以用作在多代理非合作环境中运行的任何代理商的温暖启动。我们采用了两种时间尺度的方法,在该方法中,平均场的在线固定点递归在较慢的时间表上运行,并与通用代理更快的时间范围内的控制策略更新同时进行。在足够的勘探条件下,我们提供有限的样本收敛保证,从平均场和控制策略融合到平均场平衡方面。沙盒学习算法的样本复杂性为$ \ Mathcal {o}(\ epsilon^{ - 4})$。最后,我们从经验上证明了沙盒学习算法在交通拥堵游戏中的有效性。
translated by 谷歌翻译
在本文中,我们为不存在或无限的数据的方差提供了置信序列的扩展。置信序列提供的置信区间在任意数据依赖性停止时间时有效,自然具有广泛的应用。我们首先为有限方差案例的CATONI风格置信序列的宽度建立了一个下限,以突出现有结果的松动性。接下来,我们为数据分布提供了紧密的catoni风格的置信序列,该数据分布有一个放松的〜$ p^{th} - $ arment,其中〜$ p \ in(1,2] $,并加强了有限差异案例的结果〜$ p = 2 $。显示出比使用dubins-savage不等式获得的置信序列更好。
translated by 谷歌翻译
Deep latent variable models have achieved significant empirical successes in model-based reinforcement learning (RL) due to their expressiveness in modeling complex transition dynamics. On the other hand, it remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of RL. In this paper, we provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle in the face of uncertainty for exploration. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models. Theoretically, we establish the sample complexity of the proposed approach in the online and offline settings. Empirically, we demonstrate superior performance over current state-of-the-art algorithms across various benchmarks.
translated by 谷歌翻译
Explainable Artificial Intelligence (AI) in the form of an interpretable and semiautomatic approach to stage grading ocular pathologies such as Diabetic retinopathy, Hypertensive retinopathy, and other retinopathies on the backdrop of major systemic diseases. The experimental study aims to evaluate an explainable staged grading process without using deep Convolutional Neural Networks (CNNs) directly. Many current CNN-based deep neural networks used for diagnosing retinal disorders might have appreciable performance but fail to pinpoint the basis driving their decisions. To improve these decisions' transparency, we have proposed a clinician-in-the-loop assisted intelligent workflow that performs a retinal vascular assessment on the fundus images to derive quantifiable and descriptive parameters. The retinal vessel parameters meta-data serve as hyper-parameters for better interpretation and explainability of decisions. The semiautomatic methodology aims to have a federated approach to AI in healthcare applications with more inputs and interpretations from clinicians. The baseline process involved in the machine learning pipeline through image processing techniques for optic disc detection, vessel segmentation, and arteriole/venule identification.
translated by 谷歌翻译
Soft actuators have attracted a great deal of interest in the context of rehabilitative and assistive robots for increasing safety and lowering costs as compared to rigid-body robotic systems. During actuation, soft actuators experience high levels of deformation, which can lead to microscale fractures in their elastomeric structure, which fatigues the system over time and eventually leads to macroscale damages and eventually failure. This paper reports finite element modeling (FEM) of pneu-nets at high angles, along with repetitive experimentation at high deformation rates, in order to study the effect and behavior of fatigue in soft robotic actuators, which would result in deviation from the ideal behavior. Comparing the FEM model and experimental data, we show that FEM can model the performance of the actuator before fatigue to a bending angle of 167 degrees with ~96% accuracy. We also show that the FEM model performance will drop to 80% due to fatigue after repetitive high-angle bending. The results of this paper objectively highlight the emergence of fatigue over cyclic activation of the system and the resulting deviation from the computational FEM model. Such behavior can be considered in future controllers to adapt the system with time-variable and non-autonomous response dynamics of soft robots.
translated by 谷歌翻译
Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming. Recent studies achieved promising results by using pre-trained models in low-resource scenarios. Inspired by this, we aim to ask: which (if any) pre-training strategies can improve performance across SLU benchmarks? To answer this question, we employ four types of pre-trained models and their combinations for SLU. We leverage self-supervised speech and language models (LM) pre-trained on large quantities of unpaired data to extract strong speech and text representations. We also explore using supervised models pre-trained on larger external automatic speech recognition (ASR) or SLU corpora. We conduct extensive experiments on the SLU Evaluation (SLUE) benchmark and observe self-supervised pre-trained models to be more powerful, with pre-trained LM and speech models being most beneficial for the Sentiment Analysis and Named Entity Recognition task, respectively.
translated by 谷歌翻译
最近的研究揭示了NLP数据和模型中的不良偏见。但是,这些努力的重点是西方的社会差异,并且无法直接携带其他地质文化背景。在本文中,我们关注印度背景下的NLP公平。我们首先简要说明印度的社会差异斧头。我们为印度背景下的公平评估建立资源,并利用它们来证明沿着某些轴的预测偏见。然后,我们深入研究了地区和宗教的社会刻板印象,证明了其在Corpora&Models中的普遍性。最后,我们概述了一个整体研究议程,以重新定义印度背景的NLP公平研究,考虑印度社会背景,弥合能力,资源和适应印度文化价值的技术差距。尽管我们在这里专注于“印度”,但可以在其他地理文化背景下进行重新连接化。
translated by 谷歌翻译
流行模型是理解传染病的强大工具。但是,随着它们的大小和复杂性的增加,它们可以迅速在计算上棘手。建模方法的最新进展表明,替代模型可用于模拟具有高维参数空间的复杂流行模型。我们表明,深层序列到序列(SEQ2SEQ)模型可以作为具有基于序列模型参数的复杂流行病模型的准确替代物,从而有效地复制了季节性和长期传播动力学。一旦受过培训,我们的代理人可以预测场景比原始模型快几千倍,从而使其非常适合策略探索。我们证明,用博学的模拟器代替传统的流行模型有助于强大的贝叶斯推断。
translated by 谷歌翻译
在学习到等级的问题中,特权功能是在模型培训期间可用的功能,但在测试时不可用。这种特征自然出现在商品推荐系统中;例如,“用户单击此项目”作为功能可预测离线数据中的“用户购买此项目”,但在线服务期间显然不可用。特权功能的另一个来源是那些太昂贵而无法在线计算但可行的功能。特权功能蒸馏(PFD)是指自然想法:使用所有功能(包括特权的)训练“老师”模型,然后使用它来训练不使用特权功能的“学生”模型。在本文中,我们首先在经验上研究了三个公共排名数据集和从亚马逊日志中得出的工业规模排名问题。我们表明,PFD在所有这些数据集上都超过了几个基线(无缩写,预处理,自我验证和广义蒸馏)。接下来,我们通过经验消融研究和线性模型的理论分析来分析PFD的原因和何时表现良好。两项研究都发现了一个有趣的非主持酮行为:随着特权特征的预测能力增加,最初的学生模型的性能最初会增加,但随后降低。我们表明了后来的表现降低的原因是,一个非常预测的特权教师会产生较高的差异的预测,从而导致较高的差异学生估计和劣等测试表现。
translated by 谷歌翻译
动力学受部分微分方程(PDE)控制的物理系统在许多领域(从工程设计到天气预报)中找到了应用。从此类PDE中获取解决方案的过程对于大规模和参数化问题的计算昂贵。在这项工作中,使用LSTM和TCN等时间表预测开发的深度学习技术,或用于为CNN等空间功能提取而开发的,用于建模系统动力学,以占主导问题。这些模型将输入作为从PDE获得的连续时间步长的一系列高保真矢量解,并预测使用自动回归的后续时间步长的解决方案;从而减少获得此类高保真解决方案所需的计算时间和功率。这些模型经过数值基准测试(1D汉堡的方程式和Stoker的大坝断裂问题),以评估长期预测准确性,甚至在训练域之外(外推)。在向预测模型输入之前,使用非侵入性的降低订购建模技术(例如深度自动编码网络)来压缩高保真快照,以减少在线和离线阶段的复杂性和所需的计算。深层合奏被用来对预测模型进行不确定性量化,该模型提供了有关认知不确定性导致预测方差的信息。
translated by 谷歌翻译