我们考虑使用随时间变化的贝叶斯优化(TVBO)依次优化时间变化的目标函数的问题。在这里,关键挑战是应对旧数据。当前的TVBO方法需要事先了解恒定的变化率。但是,变化率通常既不知道也不恒定。我们提出了一种事件触发的算法,ET-GP-UCB,该算法检测在线目标函数的变化。事件触发器基于高斯过程回归中使用的概率统一误差界。触发器会自动检测目标函数发生重大变化时。然后,该算法通过重置累积数据集来适应时间更改。我们为ET-GP-UCB提供了遗憾的界限,并在数值实验中显示了它与最先进算法具有竞争力,即使它不需要有关时间变化的知识。此外,如果变更率误指出,ET-GP-UCB的表现要优于这些竞争基准,并且我们证明它很容易适用于各种情况,而无需调整超参数。
translated by 谷歌翻译
强大的控制器确保在不确定性下设计但以绩效为代价的反馈回路中的稳定性。最近提出的基于学习的方法可以减少时间不变系统的模型不确定性,从而改善使用数据的稳健控制器的性能。但是,实际上,许多系统在随着时间的变化形式表现出不确定性,例如,由于重量转移或磨损,导致基于学习的控制器的性能或不稳定降低。我们提出了一种事件触发的学习算法,该算法决定何时在LQR问题中以罕见或缓慢的变化在LQR问题中学习。我们的关键想法是在健壮的控制器和学习的控制器之间切换。对于学习,我们首先使用概率模型通过蒙特卡洛估计来近似学习阶段的最佳长度。然后,我们根据LQR成本的力矩生成功能设计了不确定系统的统计测试。该测试检测到控制下的系统的变化,并在控制性能由于系统变化而恶化时触发重新学习。在数值示例中,我们证明了与鲁棒控制器基线相比的性能提高。
translated by 谷歌翻译
变化的条件或环境会导致系统动态随着时间而变化。为了确保最佳控制性能,控制器应适应这些更改。当不明变化的基本原因和时间未知时,我们需要依靠在线数据进行适应。在本文中,我们将使用随时间变化的贝叶斯优化(TVBO)在不断变化的环境中在线调整控制器,并使用有关控制目标及其更改的适当先验知识。两种属性是许多在线控制器调整问题的特征:首先,由于系统动力学的变化,例如通过磨损,它们在目标上表现出增量和持久的变化。其次,优化问题是调谐参数中的凸。当前的TVBO方法不会明确考虑这些属性,从而通过过度探索参数空间导致调谐性能和许多不稳定的控制器。我们建议使用不确定性注入(UI)的新型TVBO遗忘策略,该策略结合了增量和持久变化的假设。控制目标通过时间结构域中的维也纳工艺建模为使用UI的时空高斯过程(GP)。此外,我们通过与线性不等式约束的GP模型明确对空间维度中的凸度假设进行建模。在数值实验中,我们表明我们的模型优于TVBO中的最新方法,表现出减少的遗憾和更少的不稳定参数配置。
translated by 谷歌翻译
强化学习(RL)旨在通过与环境的互动来找到最佳政策。因此,学习复杂行为需要大量的样本,这在实践中可能是持久的。然而,而不是系统地推理和积极选择信息样本,用于本地搜索的政策梯度通常从随机扰动获得。这些随机样品产生高方差估计,因此在样本复杂性方面是次优。积极选择内容性样本是贝叶斯优化的核心,它构成了过去样本的目标的概率替代物,以推理信息的后来的随后。在本文中,我们建议加入两个世界。我们利用目标函数的概率模型及其梯度开发算法。基于该模型,该算法决定查询嘈杂的零顺序oracle以提高梯度估计。生成的算法是一种新型策略搜索方法,我们与现有的黑盒算法进行比较。比较揭示了改进的样本复杂性和对合成目标的广泛实证评估的差异降低。此外,我们突出了主动抽样对流行的RL基准测试的好处。
translated by 谷歌翻译
概率模型(例如高斯流程(GPS))是从数据中学习未知动态系统的强大工具,以供随后在控制设计中使用。尽管基于学习的控制有可能在苛刻的应用中产生卓越的性能,但对不确定性的鲁棒性仍然是一个重要的挑战。由于贝叶斯方法量化了学习结果的不确定性,因此自然地将这些不确定性纳入强大的设计。与大多数考虑最坏情况估计值的最先进的方法相反,我们利用了学习方法在控制器合成中的后验分布。结果是性能和稳健性之间更加明智的,因此更有效的权衡。我们提出了一种新型的控制器合成,用于线性化的GP动力学,该动力学相对于概率稳定性缘就产生了可靠的控制器。该公式基于最近提出的线性二次控制综合算法,我们通过提供概率的鲁棒性来保证该系统的稳定性以可信度的范围为系统的稳定性范围,以基于最差的方法和确定性设计的现有方法的稳定性范围。提出方法的性能和鲁棒性。
translated by 谷歌翻译
Transformers have become the state-of-the-art neural network architecture across numerous domains of machine learning. This is partly due to their celebrated ability to transfer and to learn in-context based on few examples. Nevertheless, the mechanisms by which Transformers become in-context learners are not well understood and remain mostly an intuition. Here, we argue that training Transformers on auto-regressive tasks can be closely related to well-known gradient-based meta-learning formulations. We start by providing a simple weight construction that shows the equivalence of data transformations induced by 1) a single linear self-attention layer and by 2) gradient-descent (GD) on a regression loss. Motivated by that construction, we show empirically that when training self-attention-only Transformers on simple regression tasks either the models learned by GD and Transformers show great similarity or, remarkably, the weights found by optimization match the construction. Thus we show how trained Transformers implement gradient descent in their forward pass. This allows us, at least in the domain of regression problems, to mechanistically understand the inner workings of optimized Transformers that learn in-context. Furthermore, we identify how Transformers surpass plain gradient descent by an iterative curvature correction and learn linear models on deep data representations to solve non-linear regression tasks. Finally, we discuss intriguing parallels to a mechanism identified to be crucial for in-context learning termed induction-head (Olsson et al., 2022) and show how it could be understood as a specific case of in-context learning by gradient descent learning within Transformers.
translated by 谷歌翻译
We consider a variant of the target defense problem where a single defender is tasked to capture a sequence of incoming intruders. The intruders' objective is to breach the target boundary without being captured by the defender. As soon as the current intruder breaches the target or gets captured by the defender, the next intruder appears at a random location on a fixed circle surrounding the target. Therefore, the defender's final location at the end of the current game becomes its initial location for the next game. Thus, the players pick strategies that are advantageous for the current as well as for the future games. Depending on the information available to the players, each game is divided into two phases: partial information and full information phase. Under some assumptions on the sensing and speed capabilities, we analyze the agents' strategies in both phases. We derive equilibrium strategies for both the players to optimize the capture percentage using the notions of engagement surface and capture circle. We quantify the percentage of capture for both finite and infinite sequences of incoming intruders.
translated by 谷歌翻译
We analyze the problem of detecting tree rings in microscopy images of shrub cross sections. This can be regarded as a special case of the instance segmentation task with several particularities such as the concentric circular ring shape of the objects and high precision requirements due to which existing methods don't perform sufficiently well. We propose a new iterative method which we term Iterative Next Boundary Detection (INBD). It intuitively models the natural growth direction, starting from the center of the shrub cross section and detecting the next ring boundary in each iteration step. In our experiments, INBD shows superior performance to generic instance segmentation methods and is the only one with a built-in notion of chronological order. Our dataset and source code are available at http://github.com/alexander-g/INBD.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
In the past years, artificial neural networks (ANNs) have become the de-facto standard to solve tasks in communications engineering that are difficult to solve with traditional methods. In parallel, the artificial intelligence community drives its research to biology-inspired, brain-like spiking neural networks (SNNs), which promise extremely energy-efficient computing. In this paper, we investigate the use of SNNs in the context of channel equalization for ultra-low complexity receivers. We propose an SNN-based equalizer with a feedback structure akin to the decision feedback equalizer (DFE). For conversion of real-world data into spike signals we introduce a novel ternary encoding and compare it with traditional log-scale encoding. We show that our approach clearly outperforms conventional linear equalizers for three different exemplary channels. We highlight that mainly the conversion of the channel output to spikes introduces a small performance penalty. The proposed SNN with a decision feedback structure enables the path to competitive energy-efficient transceivers.
translated by 谷歌翻译