当环境稀疏和非马克维亚奖励时,使用标量奖励信号的训练加强学习(RL)代理通常是不可行的。此外,在训练之前对这些奖励功能进行手工制作很容易指定,尤其是当环境的动态仅部分知道时。本文提出了一条新型的管道,用于学习非马克维亚任务规格,作为简洁的有限状态“任务自动机”,从未知环境中的代理体验情节中。我们利用两种关键算法的见解。首先,我们通过将其视为部分可观察到的MDP并为隐藏的Markov模型使用现成的算法,从而学习了由规范的自动机和环境MDP组成的产品MDP,该模型是由规范的自动机和环境MDP组成的。其次,我们提出了一种从学习的产品MDP中提取任务自动机(假定为确定性有限自动机)的新方法。我们学到的任务自动机可以使任务分解为其组成子任务,从而提高了RL代理以后可以合成最佳策略的速率。它还提供了高级环境和任务功能的可解释编码,因此人可以轻松地验证代理商是否在没有错误的情况下学习了连贯的任务。此外,我们采取步骤确保学识渊博的自动机是环境不可静止的,使其非常适合用于转移学习。最后,我们提供实验结果,以说明我们在不同环境和任务中的算法的性能及其合并先前的领域知识以促进更有效学习的能力。
translated by 谷歌翻译
合理验证是指检查系统中的代理在系统中选择形成游戏理论平衡的策略的假设,该问题是检查哪种时间逻辑属性。可以将合理验证理解为模型检查多种系统系统的对应物,但是对于某些时间逻辑规范语言(例如CTL)和具有LTL规格的多项式空间,可以在多项式时间内完成经典模型检查,但合理验证却更加困难:虽然很难:合理验证的关键决策问题是2与LTL规格的Exptime-Complete,即使使用显式状态系统表示。在这种背景下,我们在本文中的贡献是三倍。首先,我们表明,可以通过将规格限制为GR(1),这可以大大降低合理验证的复杂性,GR(1)是LTL的片段,可以代表反应性系统的宽泛且实际上有用的响应属性类别。特别是,我们表明,对于许多相关设置,可以在多项式空间甚至多项式时间内完成合理验证。其次,在考虑均值付费公用事业功能给出的玩家的目标时,我们为合理验证提供了改进的复杂性结果;可以说是并发系统中最广泛使用的定量目标方法。最后,我们考虑了满足社会福利约束的计算结果的问题。为此,我们考虑了实用和平等主义的社会福利,并表明计算此类结果是Pspace-Complete或NP完整的。
translated by 谷歌翻译
深度加强学习(RL)最近在机器人连续控制任务中表现出很大的承诺。尽管如此,在该静脉中心围绕集中式学习设置的研究,这在很大程度上依赖于机器人的所有组件之间的通信可用性。然而,现实世界中的代理商经常以分散的方式运作,由于潜伏期要求,有限的电力预算和安全问题。通过将机器人组件作为分​​散剂的系统配制,这项工作提出了一种用于连续控制的分散的多效增强学习框架。为此,我们首先开发一个合作的多眼PPO框架,允许在执行期间训练和分散的操作期间集中优化。但是,该系统仅接收全局奖励信号,该信号不会归因于每个代理。为了解决这一挑战,我们进一步提出了一个通用的游戏理论信用分配框架,它计算特定于代理的奖励信号。最后但并非最不重要的是,我们还将基于模型的RL模块纳入了我们的信用分配框架,这导致采样效率的显着提高。我们展示了我们对Mujoco机器人控制任务的实验结果框架的有效性。对于演示视频,请访问:https://youtu.be/gfyvpm4svey。
translated by 谷歌翻译
子模块功能一直是多种现实应用程序的强大数学模型。最近,用于建模数据和功能等实体之间的建模概念(例如信息和冗余),在机器学习(ML)中越来越重要。在这些应用中,一个关键问题是回报分配,即如何评估每个实体对集体目标的重要性?为此,合作游戏理论的经典解决方案概念提供了有原则的收益分配方法。然而,尽管游戏理论文献广泛,但在研究中的收益分配相对不足。特别是,在新兴的子模型应用程序中出现的一个重要概念是冗余,这可能来自各种来源,例如丰富的数据或恶意操纵,在这些来源中,玩家复制其资源并在多个身份下行动。尽管许多游戏理论解决方案概念可以直接用于子模型游戏中,但天真地将它们应用于这些设置中的回报可能会导致鲁棒性问题,以防止复制。在本文中,我们系统地研究了子模型游戏中的复制操作并研究了复制鲁棒性,该指标可以定量测量解决方案概念抗复制的鲁棒性。使用该指标,我们提出了从理论上描述半相象的鲁棒性的条件,该标准是夏普利和班扎夫价值在内的广泛解决方案概念的鲁棒性。此外,我们从经验上验证了我们在新兴的Subsodular ML应用程序(即ML数据市场)上验证我们的理论结果。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
The visual dimension of cities has been a fundamental subject in urban studies, since the pioneering work of scholars such as Sitte, Lynch, Arnheim, and Jacobs. Several decades later, big data and artificial intelligence (AI) are revolutionizing how people move, sense, and interact with cities. This paper reviews the literature on the appearance and function of cities to illustrate how visual information has been used to understand them. A conceptual framework, Urban Visual Intelligence, is introduced to systematically elaborate on how new image data sources and AI techniques are reshaping the way researchers perceive and measure cities, enabling the study of the physical environment and its interactions with socioeconomic environments at various scales. The paper argues that these new approaches enable researchers to revisit the classic urban theories and themes, and potentially help cities create environments that are more in line with human behaviors and aspirations in the digital age.
translated by 谷歌翻译
Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
translated by 谷歌翻译