快速移动对象的检测和跟踪在许多领域都具有广泛的实用性。但是,由于复杂的计算和有限的数据处理能力,使用基于图像的技术满足快速有效检测和跟踪的这种需求是有问题的。为了解决这个问题,我们提出了一种无图像的方法,以实现快速移动对象的实时检测和跟踪。它采用Hadamard模式通过空间光调节器来照亮快速移动对象,其中单像素检测器收集所得的光信号。单像素测量值直接用于无需图像重建而无需重建位置信息。此外,一种新的采样方法用于优化实现超低采样率的模式投影方法。与最先进的方法相比,我们的方法不仅能够处理实时检测和跟踪,而且还具有少量计算和高效率。我们在实验上证明,使用22kHz数字微型摩尔设备的提出方法可以在跟踪时以1.28%的采样速率实现105FPS帧速率。我们的方法突破了传统的跟踪方式,可以在无图像重建的情况下实现对象实时跟踪。
translated by 谷歌翻译
作为一种引起巨大关注的新兴技术,通过分析继电器表面上的漫反射来重建隐藏物体的非视线(NLOS)成像,具有广泛的应用前景,在自主驾驶,医学成像和医学成像领域防御。尽管信噪比低(SNR)和高不良效率的挑战,但近年来,NLOS成像已迅速发展。大多数当前的NLOS成像技术使用传统的物理模型,通过主动或被动照明构建成像模型,并使用重建算法来恢复隐藏场景。此外,NLOS成像的深度学习算法最近也得到了很多关注。本文介绍了常规和深度学习的NLOS成像技术的全面概述。此外,我们还调查了新的拟议的NLOS场景,并讨论了现有技术的挑战和前景。这样的调查可以帮助读者概述不同类型的NLOS成像,从而加速了在角落周围看到的发展。
translated by 谷歌翻译
Non-line-of-sight (NLOS) imaging aims to reconstruct the three-dimensional hidden scenes from the data measured in the line-of-sight, which uses photon time-of-flight information encoded in light after multiple diffuse reflections. The under-sampled scanning data can facilitate fast imaging. However, the resulting reconstruction problem becomes a serious ill-posed inverse problem, the solution of which is of high possibility to be degraded due to noises and distortions. In this paper, we propose two novel NLOS reconstruction models based on curvature regularization, i.e., the object-domain curvature regularization model and the dual (i.e., signal and object)-domain curvature regularization model. Fast numerical optimization algorithms are developed relying on the alternating direction method of multipliers (ADMM) with the backtracking stepsize rule, which are further accelerated by GPU implementation. We evaluate the proposed algorithms on both synthetic and real datasets, which achieve state-of-the-art performance, especially in the compressed sensing setting. All our codes and data are available at https://github.com/Duanlab123/CurvNLOS.
translated by 谷歌翻译
使用单像素检测,联合优化编码和解码的端到端神经网络可以实现高精度成像和高电平语义传感。然而,对于不同的采样率,大规模网络需要重新培训,这是呈现的呈现和计算消耗。在这封信中,我们报告了一种加权优化技术,用于动态速率自适应单像素成像和感应,只需要培训网络一次可用于任何采样率的时间一次。具体地,我们在编码过程中引入一种新的加权方案,以表征不同的模式的调制效率。虽然网络以高采样速率训练,但是迭代地更新调制模式和相应的权重,这在融合时产生最佳排名编码串。在实验实施方案中,采用最高重量的最佳模式系列用于光调制,从而实现高效的成像和感测。报告的策略节省了现有动态单像素网络所需另一种低速速率网络的额外培训,这进一步加倍训练效率。验证了Mnist DataSet上的实验,通过采样率为1的网络培训,平均成像PSNR为0.1采样率达到23.50 dB,并且图像的图像分类精度达到高达95.00 \%,以0.03的采样率达到95.00 \% 97.91 \%以0.1的采样率。
translated by 谷歌翻译
We present a novel single-shot interferometric ToF camera targeted for precise 3D measurements of dynamic objects. The camera concept is based on Synthetic Wavelength Interferometry, a technique that allows retrieval of depth maps of objects with optically rough surfaces at submillimeter depth precision. In contrast to conventional ToF cameras, our device uses only off-the-shelf CCD/CMOS detectors and works at their native chip resolution (as of today, theoretically up to 20 Mp and beyond). Moreover, we can obtain a full 3D model of the object in single-shot, meaning that no temporal sequence of exposures or temporal illumination modulation (such as amplitude or frequency modulation) is necessary, which makes our camera robust against object motion. In this paper, we introduce the novel camera concept and show first measurements that demonstrate the capabilities of our system. We present 3D measurements of small (cm-sized) objects with > 2 Mp point cloud resolution (the resolution of our used detector) and up to sub-mm depth precision. We also report a "single-shot 3D video" acquisition and a first single-shot "Non-Line-of-Sight" measurement. Our technique has great potential for high-precision applications with dynamic object movement, e.g., in AR/VR, industrial inspection, medical imaging, and imaging through scattering media like fog or human tissue.
translated by 谷歌翻译
Visual perception plays an important role in autonomous driving. One of the primary tasks is object detection and identification. Since the vision sensor is rich in color and texture information, it can quickly and accurately identify various road information. The commonly used technique is based on extracting and calculating various features of the image. The recent development of deep learning-based method has better reliability and processing speed and has a greater advantage in recognizing complex elements. For depth estimation, vision sensor is also used for ranging due to their small size and low cost. Monocular camera uses image data from a single viewpoint as input to estimate object depth. In contrast, stereo vision is based on parallax and matching feature points of different views, and the application of deep learning also further improves the accuracy. In addition, Simultaneous Location and Mapping (SLAM) can establish a model of the road environment, thus helping the vehicle perceive the surrounding environment and complete the tasks. In this paper, we introduce and compare various methods of object detection and identification, then explain the development of depth estimation and compare various methods based on monocular, stereo, and RDBG sensors, next review and compare various methods of SLAM, and finally summarize the current problems and present the future development trends of vision technologies.
translated by 谷歌翻译
我们考虑使用系统的光学成像过程与卷积神经网络(CNN)来解决快照高光谱成像重建问题,其使用双相机系统以压缩方式捕获三维高光谱图像(HSIS)。近年来已经开发了使用CNN的各种方法来重建HSI,但大多数监督的深度学习方法旨在符合捕获的压缩图像和标准HSI之间的蛮力映射关系。因此,当观察数据偏离训练数据时,学习的映射将无效。特别是,我们通常在现实方案中没有地面真相。在本文中,我们提出了一个自我监督的双摄像机设备,具有未经训练的物理信息的CNNS框架。广泛的模拟和实验结果表明,我们没有培训的方法可以适应具有良好性能的广泛成像环境。此外,与基于培训的方法相比,我们的系统可以在现实方案中不断微调和自我改善。
translated by 谷歌翻译
兴趣点检测是计算机视觉和图像处理中最根本,最关键的问题之一。在本文中,我们对图像特征信息(IFI)提取技术进行了全面综述,以进行利益点检测。为了系统地介绍现有的兴趣点检测方法如何从输入图像中提取IFI,我们提出了IFI提取技术的分类学检测。根据该分类法,我们讨论了不同类型的IFI提取技术以进行兴趣点检测。此外,我们确定了与现有的IFI提取技术有关的主要未解决的问题,以及以前尚未讨论过的任何兴趣点检测方法。提供了现有的流行数据集和评估标准,并评估和讨论了18种最先进方法的性能。此外,还详细阐述了有关IFI提取技术的未来研究方向。
translated by 谷歌翻译
基于掩模的无透镜相机可以是平坦的,薄型和轻质的,这使得它们适用于具有大表面积和任意形状的计算成像系统的新颖设计。尽管最近在无晶体相机的进展中,由于底层测量系统的不良状态,从透镜相机恢复的图像质量往往差。在本文中,我们建议使用编码照明来提高用无透镜相机重建的图像的质量。在我们的成像模型中,场景/物体被多种编码照明模式照亮,因为无透镜摄像机记录传感器测量。我们设计并测试了许多照明模式,并观察到变速点(和相关的正交)模式提供了最佳的整体性能。我们提出了一种快速和低复杂性的恢复算法,可利用我们系统中的可分离性和块对角线结构。我们提出了仿真结果和硬件实验结果,以证明我们的提出方法可以显着提高重建质量。
translated by 谷歌翻译
在部署非视线(NLOS)成像系统中,越来越兴趣,以恢复障碍物背后的物体。现有解决方案通常在扫描隐藏对象之前预先校准系统。在封堵器,对象和扫描模式的现场调整需要重新校准。我们提出了一种在线校准技术,直接将所获取的瞬态扫描到LOS和隐藏组件中的所获取的瞬态耦合。我们使用前者直接(RE)在场景/障碍配置,扫描区域和扫描模式的变化时校准系统,而后者通过空间,频率或基于学习的技术恢复后者。我们的技术避免使用辅助校准设备,例如镜子或棋盘,并支持实验室验证和现实世界部署。
translated by 谷歌翻译
Lensless cameras are a class of imaging devices that shrink the physical dimensions to the very close vicinity of the image sensor by replacing conventional compound lenses with integrated flat optics and computational algorithms. Here we report a diffractive lensless camera with spatially-coded Voronoi-Fresnel phase to achieve superior image quality. We propose a design principle of maximizing the acquired information in optics to facilitate the computational reconstruction. By introducing an easy-to-optimize Fourier domain metric, Modulation Transfer Function volume (MTFv), which is related to the Strehl ratio, we devise an optimization framework to guide the optimization of the diffractive optical element. The resulting Voronoi-Fresnel phase features an irregular array of quasi-Centroidal Voronoi cells containing a base first-order Fresnel phase function. We demonstrate and verify the imaging performance for photography applications with a prototype Voronoi-Fresnel lensless camera on a 1.6-megapixel image sensor in various illumination conditions. Results show that the proposed design outperforms existing lensless cameras, and could benefit the development of compact imaging systems that work in extreme physical conditions.
translated by 谷歌翻译
Foveated imaging provides a better tradeoff between situational awareness (field of view) and resolution and is critical in long-wavelength infrared regimes because of the size, weight, power, and cost of thermal sensors. We demonstrate computational foveated imaging by exploiting the ability of a meta-optical frontend to discriminate between different polarization states and a computational backend to reconstruct the captured image/video. The frontend is a three-element optic: the first element which we call the "foveal" element is a metalens that focuses s-polarized light at a distance of $f_1$ without affecting the p-polarized light; the second element which we call the "perifoveal" element is another metalens that focuses p-polarized light at a distance of $f_2$ without affecting the s-polarized light. The third element is a freely rotating polarizer that dynamically changes the mixing ratios between the two polarization states. Both the foveal element (focal length = 150mm; diameter = 75mm), and the perifoveal element (focal length = 25mm; diameter = 25mm) were fabricated as polarization-sensitive, all-silicon, meta surfaces resulting in a large-aperture, 1:6 foveal expansion, thermal imaging capability. A computational backend then utilizes a deep image prior to separate the resultant multiplexed image or video into a foveated image consisting of a high-resolution center and a lower-resolution large field of view context. We build a first-of-its-kind prototype system and demonstrate 12 frames per second real-time, thermal, foveated image, and video capture in the wild.
translated by 谷歌翻译
Most Deep Learning (DL) based Compressed Sensing (DCS) algorithms adopt a single neural network for signal reconstruction, and fail to jointly consider the influences of the sampling operation for reconstruction. In this paper, we propose unified framework, which jointly considers the sampling and reconstruction process for image compressive sensing based on well-designed cascade neural networks. Two sub-networks, which are the sampling sub-network and the reconstruction sub-network, are included in the proposed framework. In the sampling sub-network, an adaptive full connected layer instead of the traditional random matrix is used to mimic the sampling operator. In the reconstruction sub-network, a cascade network combining stacked denoising autoencoder (SDA) and convolutional neural network (CNN) is designed to reconstruct signals. The SDA is used to solve the signal mapping problem and the signals are initially reconstructed. Furthermore, CNN is used to fully recover the structure and texture features of the image to obtain better reconstruction performance. Extensive experiments show that this framework outperforms many other state-of-the-art methods, especially at low sampling rates.
translated by 谷歌翻译
在深海勘探领域,声纳目前是唯一有效的长距离传感装置。复杂的水下环境,如噪声干扰,低目标强度或背景动态,对声纳成像带来了许多负面影响。其中,非线性强度的问题非常普遍。它也被称为声学传感器成像的各向异性,即当自主水下车辆(AUV)携带声纳从不同角度检测到相同的目标时,图像对之间的强度变化有时非常大,这使得传统匹配算法成为了传统的匹配算法几乎无效。但是,图像匹配是诸如导航,定位和映射等综合任务的基础。因此,获得稳健和准确的匹配结果是非常有价值的。本文提出了一种基于相位信息和深卷积特征的组合匹配方法。它具有两个出色的优势:一个是深度卷积特征可用于衡量声纳图像的本地和全球位置的相似性;另一种是可以在声纳图像的关键目标位置执行本地特征匹配。该方法不需要复杂的手动设计,并以关闭端到端的方式完成非线性强度声纳图像的匹配任务。特征匹配实验在AUV捕获的深海声纳图像上进行,结果表明我们的提议具有卓越的匹配精度和鲁棒性。
translated by 谷歌翻译
基于事件的相机(ECS)是受生物启发的传感器,它们异步报告每个像素的亮度变化。由于它们的高动态范围,像素带宽,时间分辨率,低功耗和计算简单性,它们对在挑战性照明条件下基于视觉的项目有益,并且可以通过微秒响应时间检测快速运动。第一代EC是单色的,但是颜色数据非常有用,有时对于某些基于视觉的应用程序至关重要。最新的技术使制造商能够建造颜色EC,交易传感器的大小,并与单色模型相比,尽管具有相同的带宽,但与单色模型相比大大降低了分辨率。此外,ECS仅检测光的变化,不会显示静态或缓慢移动的物体。我们介绍了一种使用结构化照明投影仪帮助的单色EC检测完整RGB事件的方法。投影仪在场景上迅速发出了光束的RGB图案,其反射是由EC捕获的。我们结合了ECS和基于投影的技术的好处,并允许将静态或移动物体与商用Ti LightCrafter 4500投影仪和单眼单色EC进行深度和颜色检测,为无框RGB-D传感应用铺平了道路。
translated by 谷歌翻译
在许多图像处理任务中,深度学习方法的成功,最近还将深度学习方法引入了阶段检索问题。这些方法与传统的迭代优化方法不同,因为它们通常只需要一个强度测量,并且可以实时重建相位图像。但是,由于巨大的领域差异,这些方法给出的重建图像的质量仍然有很大的改进空间来满足一般应用要求。在本文中,我们设计了一种新型的深神经网络结构,名为Sisprnet,以基于单个傅立叶强度测量值进行相检索。为了有效利用测量的光谱信息,我们建议使用多层感知器(MLP)作为前端提出一个新的特征提取单元。它允许将输入强度图像的所有像素一起考虑,以探索其全局表示。 MLP的大小经过精心设计,以促进代表性特征的提取,同时减少噪音和异常值。辍学层还可以减轻训练MLP的过度拟合问题。为了促进重建图像中的全局相关性,将自我注意力的机制引入了提议的Sisprnet的上采样和重建(UR)块。这些UR块被插入残留的学习结构中,以防止由于其复杂的层结构而导致的较弱的信息流和消失的梯度问题。使用线性相关幅度和相位的仅相位图像和图像的不同测试数据集对所提出的模型进行了广泛的评估。在光学实验平台上进行了实验,以了解在实用环境中工作时不同深度学习方法的性能。
translated by 谷歌翻译
Spatially varying spectral modulation can be implemented using a liquid crystal spatial light modulator (SLM) since it provides an array of liquid crystal cells, each of which can be purposed to act as a programmable spectral filter array. However, such an optical setup suffers from strong optical aberrations due to the unintended phase modulation, precluding spectral modulation at high spatial resolutions. In this work, we propose a novel computational approach for the practical implementation of phase SLMs for implementing spatially varying spectral filters. We provide a careful and systematic analysis of the aberrations arising out of phase SLMs for the purposes of spatially varying spectral modulation. The analysis naturally leads us to a set of "good patterns" that minimize the optical aberrations. We then train a deep network that overcomes any residual aberrations, thereby achieving ideal spectral modulation at high spatial resolution. We show a number of unique operating points with our prototype including dynamic spectral filtering, material classification, and single- and multi-image hyperspectral imaging.
translated by 谷歌翻译
我们研究了基于自主驾驶环境中的毫米波(MMW)雷达的目标跟踪算法。针对在目标跟踪阶段中的簇匹配,提出了一种新的加权特征相似性算法,其在强的环境噪声和多个干扰目标下增加了相邻帧中的相同目标的匹配速率。对于自动驾驶场景,我们构建了一种方法,该方法利用其运动参数来提取和校正移动目标的轨迹,这解决了车辆运动期间移动目标检测和轨迹校正的问题。最后,通过自动驾驶环境中的一系列实验验证了所提出的方法的可行性。结果验证了该方法的高识别精度和低位置误差。
translated by 谷歌翻译
传感器是将物理参数或环境特征(例如温度,距离,速度等)转换为可以通过数字测量和处理以执行特定任务的信号的设备。移动机器人需要传感器来测量其环境的属性,从而允许安全导航,复杂的感知和相应的动作以及与填充环境的其他代理的有效相互作用。移动机器人使用的传感器范围从简单的触觉传感器(例如保险杠)到复杂的基于视觉的传感器,例如结构化灯相机。所有这些都提供了可以由机器人计算机处理的数字输出(例如,字符串,一组值,矩阵等)。通常通过使用传感器中包含的数字转换器(ADC)的类似物来离散一个或多个模拟电信号来获得此类输出。在本章中,我们介绍了移动机器人技术中最常见的传感器,并提供了其分类法,基本特征和规格的介绍。对功能和应用程序类型的描述遵循一种自下而上的方法:在描述现实世界传感器之前,介绍了传感器所基于的基本原理和组件,这些传感器通常基于多种技术和基本设备。
translated by 谷歌翻译
Dynamic magnetic resonance image reconstruction from incomplete k-space data has generated great research interest due to its capability to reduce scan time. Never-theless, the reconstruction problem is still challenging due to its ill-posed nature. Recently, diffusion models espe-cially score-based generative models have exhibited great potential in algorithm robustness and usage flexi-bility. Moreover, the unified framework through the variance exploding stochastic differential equation (VE-SDE) is proposed to enable new sampling methods and further extend the capabilities of score-based gener-ative models. Therefore, by taking advantage of the uni-fied framework, we proposed a k-space and image Du-al-Domain collaborative Universal Generative Model (DD-UGM) which combines the score-based prior with low-rank regularization penalty to reconstruct highly under-sampled measurements. More precisely, we extract prior components from both image and k-space domains via a universal generative model and adaptively handle these prior components for faster processing while maintaining good generation quality. Experimental comparisons demonstrated the noise reduction and detail preservation abilities of the proposed method. Much more than that, DD-UGM can reconstruct data of differ-ent frames by only training a single frame image, which reflects the flexibility of the proposed model.
translated by 谷歌翻译