Foveated imaging provides a better tradeoff between situational awareness (field of view) and resolution and is critical in long-wavelength infrared regimes because of the size, weight, power, and cost of thermal sensors. We demonstrate computational foveated imaging by exploiting the ability of a meta-optical frontend to discriminate between different polarization states and a computational backend to reconstruct the captured image/video. The frontend is a three-element optic: the first element which we call the "foveal" element is a metalens that focuses s-polarized light at a distance of $f_1$ without affecting the p-polarized light; the second element which we call the "perifoveal" element is another metalens that focuses p-polarized light at a distance of $f_2$ without affecting the s-polarized light. The third element is a freely rotating polarizer that dynamically changes the mixing ratios between the two polarization states. Both the foveal element (focal length = 150mm; diameter = 75mm), and the perifoveal element (focal length = 25mm; diameter = 25mm) were fabricated as polarization-sensitive, all-silicon, meta surfaces resulting in a large-aperture, 1:6 foveal expansion, thermal imaging capability. A computational backend then utilizes a deep image prior to separate the resultant multiplexed image or video into a foveated image consisting of a high-resolution center and a lower-resolution large field of view context. We build a first-of-its-kind prototype system and demonstrate 12 frames per second real-time, thermal, foveated image, and video capture in the wild.
translated by 谷歌翻译
互动主义模型引入了一种动态的语言,交流和认知方法。在这项工作中,我们在对话对话系统(SDS)的对话建模的背景下探讨了这一基本理论。为了扩展这样的理论框架,我们提出了一组设计原则,这些设计原则遵守中央心理语言和交流理论,以实现SDS中的互动主义。通过这些,关键思想可以构成我们提出的设计原则的基础。
translated by 谷歌翻译
我们研究(选定的)宽,狭窄,深而浅,较浅,懒惰和非懒惰的训练环境中(选定的)深度神经网络中的平均鲁棒性概念。我们证明,在参数不足的环境中,宽度具有负面影响,而在过度参数化的环境中提高了鲁棒性。深度的影响紧密取决于初始化和训练模式。特别是,当用LeCun初始化初始化时,深度有助于通过懒惰训练制度进行稳健性。相反,当用神经切线核(NTK)初始化并进行初始化时,深度会损害稳健性。此外,在非懒惰培训制度下,我们演示了两层relu网络的宽度如何使鲁棒性受益。我们的理论发展改善了Huang等人的结果。[2021],Wu等。[2021]与Bubeck and Sellke [2021],Bubeck等人一致。[2021]。
translated by 谷歌翻译
本文介绍了一种数据驱动的形状完成方法,该方法着重于完成3D形状缺失区域的几何细节。我们观察到,现有的生成方法缺乏训练数据和表示能力,可以通过复杂的几何形状和拓扑合成合理的,细粒度的细节。我们的关键见解是从部分输入复制和变形补丁以完成缺失区域。这使我们能够保留本地几何特征的风格,即使它与培训数据有很大不同。我们的全自动方法分为两个阶段。首先,我们学会从输入形状检索候选补丁。其次,我们选择并变形了一些检索到的候选者,以无缝将它们融合到完整的形状中。该方法结合了两种最常见的完成方法的优点:基于相似性的单稳定性完成,以及通过学习形状空间来完成。我们通过从部分输入中检索贴片来利用重复模式,并通过使用神经网络来指导检索和变形步骤来学习全球结构先验。实验结果表明,我们的方法在多个数据集和形状类别上的表现非常优于基线。代码和数据可在https://github.com/gitbosun/patchrd上找到。
translated by 谷歌翻译
医疗AI通过支持基于证据的医学实践,个性化患者治疗,降低成本以及改善提供者和患者体验,推进医疗保健的巨大潜力。我们认为解锁此潜力需要一种系统的方法来衡量在大规模异构数据上的医疗AI模型的性能。为了满足这种需求,我们正在建立Medperf,这是一个开放的框架,用于在医疗领域的基准测试机器学习。 Medperf将使联合评估能够将模型安全地分配给不同的评估设施,从而赋予医疗组织在高效和人类监督过程中评估和验证AI模型的性能,同时优先考虑隐私。我们描述了当前的挑战医疗保健和AI社区面临,需要开放平台,Medperf的设计理念,其目前的实施状态和我们的路线图。我们呼吁研究人员和组织加入我们创建Medperf开放基准平台。
translated by 谷歌翻译
在视频数据中,来自移动区域的忙碌运动细节在频域中的特定频率带宽内传送。同时,视频数据的其余频率是用具有实质冗余的安静信息编码,这导致现有视频模型中的低处理效率作为输入原始RGB帧。在本文中,我们考虑为处理重要忙碌信息的处理和对安静信息的计算的处理分配。我们设计可训练的运动带通量模块(MBPM),用于将繁忙信息从RAW视频数据中的安静信息分开。通过将MBPM嵌入到两个路径CNN架构中,我们定义了一个繁忙的网络(BQN)。 BQN的效率是通过避免由两个路径处理的特征空间中的冗余来确定:一个在低分辨率的安静特征上运行,而另一个处理繁忙功能。所提出的BQN在某物V1,Kinetics400,UCF101和HMDB51数据集中略高于最近最近的视频处理模型。
translated by 谷歌翻译
在本文中,我们提出了一种新的视频表示学习方法,名为时间挤压(TS)池,这可以从长期的视频帧中提取基本移动信息,并将其映射到一组名为挤压图像的几个图像中。通过将时间挤压池作为层嵌入到现成的卷积神经网络(CNN)中,我们设计了一个名为Temporal Squeeze网络(TESNet)的新视频分类模型。由此产生的挤压图像包含来自视频帧的基本移动信息,对应于视频分类任务的优化。我们在两个视频分类基准上评估我们的架构,并与最先进的结果进行了比较。
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network. We make such a notion more precise for a variant of the classification problem that we term selective dependence classification (SDC) when used with attention model architectures. Under such a setting, we demonstrate various error modes where an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We illustrate various situations that can accentuate and mitigate this behaviour. Finally, we use our objective definition of interpretability for SDC tasks to evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译