Projection operations are a typical computation bottleneck in online learning. In this paper, we enable projection-free online learning within the framework of Online Convex Optimization with Memory (OCO-M) -- OCO-M captures how the history of decisions affects the current outcome by allowing the online learning loss functions to depend on both current and past decisions. Particularly, we introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret, i.e., that minimizes the suboptimality against any sequence of time-varying decisions. We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments in real-time, accounting for how past decisions affect the present. Examples of such applications are: online control of dynamical systems; statistical arbitrage; and time series prediction. The algorithm builds on the Online Frank-Wolfe (OFW) and Hedge algorithms. We demonstrate how our algorithm can be applied to the online control of linear time-varying systems in the presence of unpredictable process noise. To this end, we develop the first controller with memory and bounded dynamic regret against any optimal time-varying linear feedback control policy. We validate our algorithm in simulated scenarios of online control of linear time-invariant systems.
translated by 谷歌翻译
图像中的3D重建在虚拟现实和自动驾驶中具有广泛的应用,在此精确要求非常高。通过利用多层感知,在神经辐射场(NERF)中进行的突破性研究已大大提高了3D对象的表示质量。后来的一些研究通过建立截短的签名距离场(TSDF)改善了NERF,但仍遭受3D重建中表面模糊的问题。在这项工作中,通过提出一种新颖的3D形状表示方式Omninerf来解决这种表面歧义。它基于训练Omni方向距离场(ODF)和神经辐射场的混合隐式场,用全向信息代替NERF中的明显密度。此外,我们在深度图上介绍了其他监督,以进一步提高重建质量。该提出的方法已被证明可以有效处理表面重建边缘的NERF缺陷,从而提供了更高质量的3D场景重建结果。
translated by 谷歌翻译
我们在不可预测的环境中启用有效和有效的协调,即,在未来进化的环境中是未知的先验甚至对抗性的环境。我们受到自治的未来的激励,涉及多个机器人在动态,非结构化和对抗性环境中协调,以完成复杂的任务,例如目标跟踪,图像覆盖率和区域监视。此类任务通常被建模为子管道最大化协调问题。因此,我们介绍了第一个具有有限跟踪遗憾的第一个子管道协调算法,即,关于最佳的时间变化的行动,次要次数有限,这些行动知道未来是先验的未来。该界限随着环境的对抗性能力而优雅地降级。它还量化了机器人必须重新选择的操作以“学习”以进行协调的频率,就像他们知道未来是先验的。我们的算法概括了Fisher等人的开创性顺序贪婪算法。为了不可预测的环境,利用子模性和算法来跟踪最佳专家的问题。我们在目标跟踪的模拟方案中验证算法。
translated by 谷歌翻译
我们介绍了自回归文本到图像(Parti)模型的途径,该模型生成高保真的影像图像并支持涉及复杂组成和世界知识的内容丰富的合成。 Parti将文本对图像生成视为类似于机器翻译的序列到序列建模问题,图像令牌的序列是目标输出,而不是其他语言的文本令牌。这种策略自然可以利用大型语言模型的先前工作,通过扩展数据和模型尺寸,能力和性能的持续进展。我们的方法很简单:首先,Parti使用基于变压器的图像令牌VIT-VQGAN将图像编码为离散令牌的序列。其次,我们通过将编码器二次变压器模型缩放到20B参数来实现一致的质量改进,其新的最新零弹药FID得分为7.23,而MS-Coco的FIDED得分为3.22。我们对本地化叙述以及党的详细分析(P2),这是1600多个英语提示的新的整体基准,证明了Parti在各种类别和难度方面的有效性。我们还探索并突出了我们的模型的局限性,以定义和体现关注重点领域以进一步改进。有关高分辨率图像,请参见https://parti.research.google/。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
多机器人决策是多个机器人协调操作的过程。在本文中,尽管机器人有限的车载资源和其任务的资源要求复杂性,但我们的目标是可扩展可靠的多机器人决策。我们介绍了第一种使机器人可以选择其他机器人协调的算法,从而平衡了集中式与分散协调的权衡。特别是,集中化有利于全球近乎最佳的决策,但付出了增加的船上资源要求;而权力下放有利于最小的资源要求,但以全球次优的成本。因此,所有机器人都可以负担我们的算法,无论其资源如何。我们受到自治的未来的激励,涉及多个机器人协调行动以完成资源需求任务,例如目标跟踪和区域覆盖。为了提供封闭形式的特征,我们专注于涉及单调和“双重”下函数的最大化问题。为了捕获权力下放的成本,我们介绍了在非邻居(COIN)中的信息集中概念。我们在图像覆盖的模拟场景中验证我们的算法。
translated by 谷歌翻译
联合学习通过融合来自本地节点的协作模型来从分散的数据中学习。然而,FedAVG平均的传统基于坐标的模型忽略了每个参数编码的随机信息,并且可能遭受结构特征未对准。在这项工作中,我们提出了Fed2,一个功能对齐的联合学习框架来解决这个问题,通过在协作模型上建立一个坚定的结构特征对齐来解决这个问题。 FED2由两种主要设计组成:首先,我们设计了一个面向功能的模型结构适应方法,以确保不同神经网络结构中的显式功能分配。将结构适应应用于协作模型,可以在非常早期的训练阶段初始化具有类似特征信息的匹配结构。在联合学习过程中,我们提出了一个特征配对的平均方案,以保证对齐的特征分布,并在IID或非IID方案下维护没有特征融合冲突。最终,FED2可以在广泛的同源和异构环境下有效地提高联合学习收敛性能,提供出色的收敛速度,准确性和计算/通信效率。
translated by 谷歌翻译
Developing autonomous vehicles (AVs) helps improve the road safety and traffic efficiency of intelligent transportation systems (ITS). Accurately predicting the trajectories of traffic participants is essential to the decision-making and motion planning of AVs in interactive scenarios. Recently, learning-based trajectory predictors have shown state-of-the-art performance in highway or urban areas. However, most existing learning-based models trained with fixed datasets may perform poorly in continuously changing scenarios. Specifically, they may not perform well in learned scenarios after learning the new one. This phenomenon is called "catastrophic forgetting". Few studies investigate trajectory predictions in continuous scenarios, where catastrophic forgetting may happen. To handle this problem, first, a novel continual learning (CL) approach for vehicle trajectory prediction is proposed in this paper. Then, inspired by brain science, a dynamic memory mechanism is developed by utilizing the measurement of traffic divergence between scenarios, which balances the performance and training efficiency of the proposed CL approach. Finally, datasets collected from different locations are used to design continual training and testing methods in experiments. Experimental results show that the proposed approach achieves consistently high prediction accuracy in continuous scenarios without re-training, which mitigates catastrophic forgetting compared to non-CL approaches. The implementation of the proposed approach is publicly available at https://github.com/BIT-Jack/D-GSM
translated by 谷歌翻译
Training a Neural Radiance Field (NeRF) without pre-computed camera poses is challenging. Recent advances in this direction demonstrate the possibility of jointly optimising a NeRF and camera poses in forward-facing scenes. However, these methods still face difficulties during dramatic camera movement. We tackle this challenging problem by incorporating undistorted monocular depth priors. These priors are generated by correcting scale and shift parameters during training, with which we are then able to constrain the relative poses between consecutive frames. This constraint is achieved using our proposed novel loss functions. Experiments on real-world indoor and outdoor scenes show that our method can handle challenging camera trajectories and outperforms existing methods in terms of novel view rendering quality and pose estimation accuracy.
translated by 谷歌翻译
In this paper, we propose an end-to-end Retrieval-Augmented Visual Language Model (REVEAL) that learns to encode world knowledge into a large-scale memory, and to retrieve from it to answer knowledge-intensive queries. REVEAL consists of four key components: the memory, the encoder, the retriever and the generator. The large-scale memory encodes various sources of multimodal world knowledge (e.g. image-text pairs, question answering pairs, knowledge graph triplets, etc) via a unified encoder. The retriever finds the most relevant knowledge entries in the memory, and the generator fuses the retrieved knowledge with the input query to produce the output. A key novelty in our approach is that the memory, encoder, retriever and generator are all pre-trained end-to-end on a massive amount of data. Furthermore, our approach can use a diverse set of multimodal knowledge sources, which is shown to result in significant gains. We show that REVEAL achieves state-of-the-art results on visual question answering and image captioning.
translated by 谷歌翻译