Model based reinforcement learning (MBRL) uses an imperfect model of the world to imagine trajectories of future states and plan the best actions that maximize a given reward. These trajectories are imperfect and MBRL attempts to overcome this by relying on model predictive control (MPC) to continuously re-imagine trajectories from scratch. Such re-generation of imagined trajectories carries the major computational cost and increasing complexity in tasks with longer receding horizon. We investigate how far in the future the imagined trajectories can be relied upon while still maintaining acceptable reward. After taking each action, information becomes available about its immediate effect and its impact on outcomes expected of future actions. Hereby, we propose four methods for deciding whether to trust and act upon imagined trajectories: i) looking at recent errors with respect to expectations, ii) comparing the confidence in an action imagined against its execution, iii) observing the deviation in projected future states iv) observing the deviation in projected future rewards. An experiment analyzing the effects of acting upon imagination shows that our methods reduce computation by at least 20\% and up to 80\%, depending on the environment, while retaining acceptable reward.
translated by 谷歌翻译
本文探讨了强化学习(RL)模型用于自动赛车的使用。与安全车是头等大事的乘用车相反,赛车的目的是最大程度地减少单圈时间。我们将问题视为一项强化学习任务,其中包括由车辆遥测组成的多维输入和连续的动作空间。为了找出哪种RL方法更好地解决了问题,以及获得的模型是否推广到未知轨道上,我们将10种深层确定性策略梯度(DDPG)变体进行了两个实验:i)〜研究RL方法如何学习驱动驱动赛车和ii)研究学习方案如何影响模型的推广能力。我们的研究表明,接受RL训练的模型不仅能够比基线开源手工机器人更快地驾驶,而且还可以推广到未知轨道。
translated by 谷歌翻译
The Elo algorithm, due to its simplicity, is widely used for rating in sports competitions as well as in other applications where the rating/ranking is a useful tool for predicting future results. However, despite its widespread use, a detailed understanding of the convergence properties of the Elo algorithm is still lacking. Aiming to fill this gap, this paper presents a comprehensive (stochastic) analysis of the Elo algorithm, considering round-robin (one-on-one) competitions. Specifically, analytical expressions are derived characterizing the behavior/evolution of the skills and of important performance metrics. Then, taking into account the relationship between the behavior of the algorithm and the step-size value, which is a hyperparameter that can be controlled, some design guidelines as well as discussions about the performance of the algorithm are provided. To illustrate the applicability of the theoretical findings, experimental results are shown, corroborating the very good match between analytical predictions and those obtained from the algorithm using real-world data (from the Italian SuperLega, Volleyball League).
translated by 谷歌翻译
We describe a Physics-Informed Neural Network (PINN) that simulates the flow induced by the astronomical tide in a synthetic port channel, with dimensions based on the Santos - S\~ao Vicente - Bertioga Estuarine System. PINN models aim to combine the knowledge of physical systems and data-driven machine learning models. This is done by training a neural network to minimize the residuals of the governing equations in sample points. In this work, our flow is governed by the Navier-Stokes equations with some approximations. There are two main novelties in this paper. First, we design our model to assume that the flow is periodic in time, which is not feasible in conventional simulation methods. Second, we evaluate the benefit of resampling the function evaluation points during training, which has a near zero computational cost and has been verified to improve the final model, especially for small batch sizes. Finally, we discuss some limitations of the approximations used in the Navier-Stokes equations regarding the modeling of turbulence and how it interacts with PINNs.
translated by 谷歌翻译
Can we leverage the audiovisual information already present in video to improve self-supervised representation learning? To answer this question, we study various pretraining architectures and objectives within the masked autoencoding framework, motivated by the success of similar methods in natural language and image understanding. We show that we can achieve significant improvements on audiovisual downstream classification tasks, surpassing the state-of-the-art on VGGSound and AudioSet. Furthermore, we can leverage our audiovisual pretraining scheme for multiple unimodal downstream tasks using a single audiovisual pretrained model. We additionally demonstrate the transferability of our representations, achieving state-of-the-art audiovisual results on Epic Kitchens without pretraining specifically for this dataset.
translated by 谷歌翻译
Integrated information theory (IIT) is a theoretical framework that provides a quantitative measure to estimate when a physical system is conscious, its degree of consciousness, and the complexity of the qualia space that the system is experiencing. Formally, IIT rests on the assumption that if a surrogate physical system can fully embed the phenomenological properties of consciousness, then the system properties must be constrained by the properties of the qualia being experienced. Following this assumption, IIT represents the physical system as a network of interconnected elements that can be thought of as a probabilistic causal graph, $\mathcal{G}$, where each node has an input-output function and all the graph is encoded in a transition probability matrix. Consequently, IIT's quantitative measure of consciousness, $\Phi$, is computed with respect to the transition probability matrix and the present state of the graph. In this paper, we provide a random search algorithm that is able to optimize $\Phi$ in order to investigate, as the number of nodes increases, the structure of the graphs that have higher $\Phi$. We also provide arguments that show the difficulties of applying more complex black-box search algorithms, such as Bayesian optimization or metaheuristics, in this particular problem. Additionally, we suggest specific research lines for these techniques to enhance the search algorithm that guarantees maximal $\Phi$.
translated by 谷歌翻译
In this paper, we consider the problem where a drone has to collect semantic information to classify multiple moving targets. In particular, we address the challenge of computing control inputs that move the drone to informative viewpoints, position and orientation, when the information is extracted using a "black-box" classifier, e.g., a deep learning neural network. These algorithms typically lack of analytical relationships between the viewpoints and their associated outputs, preventing their use in information-gathering schemes. To fill this gap, we propose a novel attention-based architecture, trained via Reinforcement Learning (RL), that outputs the next viewpoint for the drone favoring the acquisition of evidence from as many unclassified targets as possible while reasoning about their movement, orientation, and occlusions. Then, we use a low-level MPC controller to move the drone to the desired viewpoint taking into account its actual dynamics. We show that our approach not only outperforms a variety of baselines but also generalizes to scenarios unseen during training. Additionally, we show that the network scales to large numbers of targets and generalizes well to different movement dynamics of the targets.
translated by 谷歌翻译
Atrial Fibrillation (AF) is characterized by disorganised electrical activity in the atria and is known to be sustained by the presence of regions of fibrosis (scars) or functional cellular remodeling, both of which may lead to areas of slow conduction. Estimating the effective conductivity of the myocardium and identifying regions of abnormal propagation is therefore crucial for the effective treatment of AF. We hypothesise that the spatial distribution of tissue conductivity can be directly inferred from an array of concurrently acquired contact electrograms (EGMs). We generate a dataset of simulated cardiac AP propagation using randomised scar distributions and a phenomenological cardiac model and calculate contact electrograms at various positions on the field. A deep neural network, based on a modified U-net architecture, is trained to estimate the location of the scar and quantify conductivity of the tissue with a Jaccard index of $91$%. We adapt a wavelet-based surrogate testing analysis to confirm that the inferred conductivity distribution is an accurate representation of the ground truth input to the model. We find that the root mean square error (RMSE) between the ground truth and our predictions is significantly smaller ($p_{val}=0.007$) than the RMSE between the ground truth and surrogate samples.
translated by 谷歌翻译
Echo State Networks (ESN) are a type of Recurrent Neural Networks that yields promising results in representing time series and nonlinear dynamic systems. Although they are equipped with a very efficient training procedure, Reservoir Computing strategies, such as the ESN, require the use of high order networks, i.e. large number of layers, resulting in number of states that is magnitudes higher than the number of model inputs and outputs. This not only makes the computation of a time step more costly, but also may pose robustness issues when applying ESNs to problems such as Model Predictive Control (MPC) and other optimal control problems. One such way to circumvent this is through Model Order Reduction strategies such as the Proper Orthogonal Decomposition (POD) and its variants (POD-DEIM), whereby we find an equivalent lower order representation to an already trained high dimension ESN. The objective of this work is to investigate and analyze the performance of POD methods in Echo State Networks, evaluating their effectiveness. To this end, we evaluate the Memory Capacity (MC) of the POD-reduced network in comparison to the original (full order) ENS. We also perform experiments on two different numerical case studies: a NARMA10 difference equation and an oil platform containing two wells and one riser. The results show that there is little loss of performance comparing the original ESN to a POD-reduced counterpart, and also that the performance of a POD-reduced ESN tend to be superior to a normal ESN of the same size. Also we attain speedups of around $80\%$ in comparison to the original ESN.
translated by 谷歌翻译
Measuring and monitoring soil organic carbon is critical for agricultural productivity and for addressing critical environmental problems. Soil organic carbon not only enriches nutrition in soil, but also has a gamut of co-benefits such as improving water storage and limiting physical erosion. Despite a litany of work in soil organic carbon estimation, current approaches do not generalize well across soil conditions and management practices. We empirically show that explicit modeling of cause-and-effect relationships among the soil processes improves the out-of-distribution generalizability of prediction models. We provide a comparative analysis of soil organic carbon estimation models where the skeleton is estimated using causal discovery methods. Our framework provide an average improvement of 81% in test mean squared error and 52% in test mean absolute error.
translated by 谷歌翻译