跨越多个领域的系统的自主权水平正在提高,但是这些系统仍然经历故障。减轻失败风险的一种方法是整合人类对自治系统的监督,并依靠人类在自治失败时控制人类。在这项工作中,我们通过行动建议制定了一种协作决策的方法,该建议在不控制系统的情况下改善行动选择。我们的方法通过通过建议合并共享的隐式信息来修改代理商的信念,并以比遵循建议的行动遵循更少的建议,以更少的建议来利用每个建议。我们假设协作代理人共享相同的目标,并通过有效的行动进行交流。通过假设建议的行动仅取决于国家,我们可以将建议的行动纳入对环境的独立观察。协作环境的假设使我们能够利用代理商的政策来估计行动建议的分布。我们提出了两种使用建议动作的方法,并通过模拟实验证明了该方法。提出的方法可以提高性能,同时对次优的建议也有鲁棒性。
translated by 谷歌翻译
我们概括了模型预测路径积分控制(MPPI)的推导,以允许对照序列中的对照组进行单个关节分布。这种改革允许实施自适应重要性采样(AIS)算法,以在最初的重要性采样步骤中实施,同时仍保持MPPI的好处,例如使用任意系统动态和成本功能。在模拟环境中证明了通过在每个控制步骤中集成AIS来优化建议分布的好处,包括控制轨道周围的多辆车。新算法比MPPI更有效地样品,可以通过更少的样品实现更好的性能。随着动作空间的维度的增加,这种性能差异会增长。模拟的结果表明,新算法可以用作任何时间算法,从而增加了每次迭代的控制值与依赖大量样品的算法。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
在本文中,我们为Pavlovian信号传达的多方面的研究 - 一个过程中学到的一个过程,一个代理商通过另一个代理商通知决策的时间扩展预测。信令紧密连接到时间和时间。在生成和接收信号的服务中,已知人类和其他动物代表时间,确定自过去事件以来的时间,预测到未来刺激的时间,并且都识别和生成展开时间的模式。我们调查通过引入部分可观察到的决策域来对学习代理之间的影响和信令在我们称之为霜冻空心的情况下如何影响学习代理之间的影响和信令。在该域中,预测学习代理和加强学习代理被耦合到两部分决策系统,该系统可以在避免时间条件危险时获取稀疏奖励。我们评估了两个域变型:机器代理在七态线性步行中交互,以及虚拟现实环境中的人机交互。我们的结果展示了帕夫洛维亚信号传导的学习速度,对药剂 - 代理协调具有不同时间表示(并且不)的影响,以及颞次锯齿对药剂和人毒剂相互作用的影响方式不同。作为主要贡献,我们将Pavlovian信号传导为固定信号范例与两个代理之间完全自适应通信学习之间的天然桥梁。我们进一步展示了如何从固定的信令过程计算地构建该自适应信令处理,其特征在于,通过快速的连续预测学习和对接收信号的性质的最小限制。因此,我们的结果表明了加固学习代理之间的沟通学习的可行建设者的途径。
translated by 谷歌翻译
互动学习和决策的基本挑战,从强盗问题到加固学习,是提供了实现的采样效率,自适应学习算法,实现了近乎最佳的遗憾。这个问题类似于最佳(监督)统计学习的经典问题,其中有众所周知的复杂性措施(例如,VC维度和Rademacher复杂性),用于控制学习的统计复杂性。然而,由于问题的适应性,表征交互式学习的统计复杂性基本上更具挑战性。这项工作的主要结果提供了复杂性措施,决策系数,被证明是必要的,并且足以用于采样有效的互动学习。特别是,我们提供:1。对于任何交互式决策问题的最佳遗憾的下限,将决策估计系数作为基本限制建立。 2.统一算法设计原理,估算到决策(E2D),它将任何用于监督估算的算法转换为决策的在线算法。 E2D遗憾的是符合我们下限的遗憾,从而实现了最佳的样本高效学习,其特征在于决策估计系数。一起参加,这些结果构成了互动决策的可读性理论。当应用于增强学习设置时,决策估计系数基本上恢复所有现有的硬度结果和下限。更广泛地,该方法可以被视为统计估算的经典LE CAM理论的决策理论;它还统一了许多现有方法 - 贝叶斯和频繁的方法。
translated by 谷歌翻译
人工智能系统越来越涉及持续学习,以实现在系统培训期间不遇到的一般情况下的灵活性。与自治系统的人类互动广泛研究,但在系统积极学习的同时,研究发生了迄今为止发生的互动,并且可以在几分钟内明显改变其行为。在这项试验研究中,我们调查如何在代理商发展能力时如何发展人类和不断学习的预测代理人之间的互动。此外,我们可以比较两个不同的代理架构来评估代理设计中的代表性选择如何影响人工代理交互。我们开发虚拟现实环境和基于时间的预测任务,其中从增强学习(RL)算法增强人类预测中学到的预测。我们评估参与者在此任务中的性能和行为如何在代理类型中不同,使用定量和定性分析。我们的研究结果表明,系统的人类信任可能受到与代理人的早期互动的影响,并且反过来的信任会影响战略行为,但试点研究的限制排除了任何结论的声明。我们将信任作为互动的关键特征,以考虑基于RL的技术在考虑基于RL的技术时,并对这项研究进行了几项建议,以准备更大规模的调查。本文的视频摘要可以在https://youtu.be/ovyjdnbqtwq找到。
translated by 谷歌翻译
假设代理人断言它将以某种方式通过环境。当代理执行其动作时,如何验证索赔?问题出现在一系列上下文中,包括验证有关机器人行为的安全声明,安全性和监视的应用以及科学实验的概念和(物理)设计和物流。给定一套可行的传感器来选择,我们询问如何最佳选择传感器,以确保代理的执行确实适合其预先披露的行程。我们的治疗与两个方面的传感器选择的先前工作区别为:行程所采取的(经常转型语言)以及传感器选择的家庭可以作为单一选择进行分组。两者都密切相关,允许建造产品自动机,因为相同的物理传感器(即相同的选择)可以多次出现。本文建立了该处理内的传感器选择的硬度,并提出了一种基于ILP制剂的精确算法,该算法能够解决中等大小的问题实例。我们展示了对小规模案例研究的疗效,包括野生动物追踪的动机。
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
We present the interpretable meta neural ordinary differential equation (iMODE) method to rapidly learn generalizable (i.e., not parameter-specific) dynamics from trajectories of multiple dynamical systems that vary in their physical parameters. The iMODE method learns meta-knowledge, the functional variations of the force field of dynamical system instances without knowing the physical parameters, by adopting a bi-level optimization framework: an outer level capturing the common force field form among studied dynamical system instances and an inner level adapting to individual system instances. A priori physical knowledge can be conveniently embedded in the neural network architecture as inductive bias, such as conservative force field and Euclidean symmetry. With the learned meta-knowledge, iMODE can model an unseen system within seconds, and inversely reveal knowledge on the physical parameters of a system, or as a Neural Gauge to "measure" the physical parameters of an unseen system with observed trajectories. We test the validity of the iMODE method on bistable, double pendulum, Van der Pol, Slinky, and reaction-diffusion systems.
translated by 谷歌翻译
While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners and controls, thereby proposing a method to calculate directional connectivity. As causal relationships run in both directions, we explore three scenarios, namely channels' activity as sources, as sinks, and in total. Our proposed method can be used for both classification and exploratory analysis. In all scenarios, we find confirmation of the established right-lateralized Theta sampling network anomaly, in line with the temporal sampling framework's assumption of oscillatory differences in the Theta and Gamma bands. Further, we show that this anomaly primarily occurs in the causal relationships of channels acting as sinks, where it is significantly more pronounced than when only total activity is observed. In the sink scenario, our classifier obtains 0.84 and 0.88 accuracy and 0.87 and 0.93 AUC for the Theta and Gamma bands, respectively.
translated by 谷歌翻译