深度估计是某些领域的关键技术之一,例如自动驾驶和机器人导航。但是,使用单个传感器的传统方法不可避免地受到传感器的性能的限制。因此,提出了一种融合激光镜头和立体声摄像机的精度和健壮方法。该方法完全结合了LiDAR和立体声摄像机的优势,这些摄像头可以保留LIDAR高精度和图像的高分辨率的优势。与传统的立体声匹配方法相比,对象和照明条件的质地对算法的影响较小。首先,将LIDAR数据的深度转换为立体声摄像机的差异。由于LiDAR数据的密度在Y轴上相对稀疏,因此使用插值方法对转换的差异图进行了更采样。其次,为了充分利用精确的差异图,融合了差异图和立体声匹配以传播准确的差异。最后,将视差图转换为深度图。此外,转换后的差异图还可以提高算法的速度。我们在Kitti基准测试中评估了拟议的管道。该实验表明,我们的算法比几种经典方法具有更高的精度。
translated by 谷歌翻译
立体声匹配是计算机愿景中的一个重要任务,这些任务是几十年来引起了巨大的研究。虽然在差距准确度,密度和数据大小方面,公共立体声数据集难以满足模型的要求。在本文中,我们的目标是解决数据集和模型之间的问题,并提出了一个具有高精度差异地面真理的大规模立体声数据集,名为Plantstereo。我们使用了半自动方式来构造数据集:在相机校准和图像配准后,可以从深度图像获得高精度视差图像。总共有812个图像对覆盖着多种植物套装:菠菜,番茄,胡椒和南瓜。我们首先在四种不同立体声匹配方法中评估了我们的Plandstereo数据集。不同模型和植物的广泛实验表明,与整数精度的基础事实相比,Plantstereo提供的高精度差异图像可以显着提高深度学习模型的培训效果。本文提供了一种可行和可靠的方法来实现植物表面密集的重建。 PlantSereo数据集和相对代码可用于:https://www.github.com/wangqingyu985/plantstereo
translated by 谷歌翻译
3D场景流动表征了当前时间的点如何流到3D欧几里得空间中的下一次,该空间具有自主推断场景中所有对象的非刚性运动的能力。从图像估算场景流的先前方法具有局限性,该方法通过分别估计光流和差异来划分3D场景流的整体性质。学习3D场景从点云流动也面临着综合数据和真实数据与LIDAR点云的稀疏性之间差距的困难。在本文中,利用生成的密集深度图来获得显式的3D坐标,该坐标可直接从2D图像中学习3D场景流。通过将2D像素的密度性质引入3D空间,可以改善预测场景流的稳定性。通过统计方法消除了生成的3D点云中的离群值,以削弱噪声点对3D场景流估计任务的影响。提出了差异一致性损失,以实现3D场景流的更有效的无监督学习。比较了现实世界图像上3D场景流的自我监督学习方法与在综合数据集中学习的多种方法和在LIDAR点云上学习的方法。显示多个场景流量指标的比较可以证明引入伪LIDAR点云到场景流量估计的有效性和优势。
translated by 谷歌翻译
This paper as technology report is focusing on evaluation and performance about depth estimations based on lidar data and stereo images(front left and front right). The lidar 3d cloud data and stereo images are provided by ford. In addition, this paper also will explain some details about optimization for depth estimation performance. And some reasons why not use machine learning to do depth estimation, replaced by pure mathmatics to do stereo depth estimation. The structure of this paper is made of by following:(1) Performance: to discuss and evaluate about depth maps created from stereo images and 3D cloud points, and relationships analysis for alignment and errors;(2) Depth estimation by stereo images: to explain the methods about how to use stereo images to estimate depth;(3)Depth estimation by lidar: to explain the methods about how to use 3d cloud datas to estimate depth;In summary, this report is mainly to show the performance of depth maps and their approaches, analysis for them.
translated by 谷歌翻译
在现有方法中,LIDAR的探测器显示出卓越的性能,但视觉探测器仍被广泛用于其价格优势。从惯例上讲,视觉检验的任务主要依赖于连续图像的输入。但是,探测器网络学习图像提供的异性几何信息非常复杂。在本文中,将伪LIDAR的概念引入了探测器中以解决此问题。伪LIDAR点云背面项目由图像生成的深度图中的3D点云,这改变了图像表示的方式。与立体声图像相比,立体声匹配网络生成的伪lidar点云可以得到显式的3D坐标。由于在3D空间中发生了6个自由度(DOF)姿势转换,因此伪宽点云提供的3D结构信息比图像更直接。与稀疏的激光雷达相比,伪驱动器具有较密集的点云。为了充分利用伪LIDAR提供的丰富点云信息,采用了投射感知的探测管道。以前的大多数基于激光雷达的算法从点云中采样了8192点,作为探视网络的输入。投影感知的密集探测管道采用从图像产生的所有伪lidar点云,除了误差点作为网络的输入。在图像中充分利用3D几何信息时,图像中的语义信息也用于探视任务中。 2D-3D的融合是在仅基于图像的进程中实现的。 Kitti数据集的实验证明了我们方法的有效性。据我们所知,这是使用伪LIDAR的第一种视觉探光法。
translated by 谷歌翻译
许多移动制造商最近在其旗舰模型中采用了双像素(DP)传感器,以便更快的自动对焦和美学图像捕获。尽管他们的优势,由于DT在DP图像中的视差缺失的数据集和算法设计,但对3D面部理解的使用研究受到限制。这是因为子孔图像的基线非常窄,并且散焦模糊区域存在视差。在本文中,我们介绍了一种以DP为导向的深度/普通网络,该网络重建3D面部几何。为此目的,我们使用我们的多摄像头结构光系统捕获的101人拥有超过135k张图片的DP面部数据。它包含相应的地面真值3D模型,包括度量刻度的深度图和正常。我们的数据集允许建议的匹配网络广泛化,以便以3D面部深度/正常估计。所提出的网络由两种新颖的模块组成:自适应采样模块和自适应正常模块,专门用于处理DP图像中的散焦模糊。最后,该方法实现了最近基于DP的深度/正常估计方法的最先进的性能。我们还展示了估计深度/正常的适用性面对欺骗和致密。
translated by 谷歌翻译
我们提出了一个新颖的高分辨率和具有挑战性的立体声数据集框架室内场景,并以致密而准确的地面真相差异注释。我们数据集的特殊是存在几个镜面和透明表面的存在,即最先进的立体声网络失败的主要原因。我们的采集管道利用了一个新颖的深度时空立体声框架,该框架可以轻松准确地使用子像素精度进行标记。我们总共发布了419个样本,这些样本在64个不同的场景中收集,并以致密的地面差异注释。每个样本包括高分辨率对(12 MPX)以及一个不平衡对(左:12 MPX,右:1.1 MPX)。此外,我们提供手动注释的材料分割面具和15K未标记的样品。我们根据我们的数据集评估了最新的深层网络,强调了它们在解决立体声方面的开放挑战方面的局限性,并绘制了未来研究的提示。
translated by 谷歌翻译
Recently, over-height vehicle strike frequently occurs, causing great economic cost and serious safety problems. Hence, an alert system which can accurately discover any possible height limiting devices in advance is necessary to be employed in modern large or medium sized cars, such as touring cars. Detecting and estimating the height limiting devices act as the key point of a successful height limit alert system. Though there are some works research height limit estimation, existing methods are either too computational expensive or not accurate enough. In this paper, we propose a novel stereo-based pipeline named SHLE for height limit estimation. Our SHLE pipeline consists of two stages. In stage 1, a novel devices detection and tracking scheme is introduced, which accurately locate the height limit devices in the left or right image. Then, in stage 2, the depth is temporally measured, extracted and filtered to calculate the height limit device. To benchmark the height limit estimation task, we build a large-scale dataset named "Disparity Height", where stereo images, pre-computed disparities and ground-truth height limit annotations are provided. We conducted extensive experiments on "Disparity Height" and the results show that SHLE achieves an average error below than 10cm though the car is 70m away from the devices. Our method also outperforms all compared baselines and achieves state-of-the-art performance. Code is available at https://github.com/Yang-Kaixing/SHLE.
translated by 谷歌翻译
深度完成旨在预测从深度传感器(例如Lidars)中捕获的极稀疏图的密集像素深度。它在各种应用中起着至关重要的作用,例如自动驾驶,3D重建,增强现实和机器人导航。基于深度学习的解决方案已经证明了这项任务的最新成功。在本文中,我们首次提供了全面的文献综述,可帮助读者更好地掌握研究趋势并清楚地了解当前的进步。我们通过通过对现有方法进行分类的新型分类法提出建议,研究网络体系结构,损失功能,基准数据集和学习策略的设计方面的相关研究。此外,我们在包括室内和室外数据集(包括室内和室外数据集)上进行了三个广泛使用基准测试的模型性能进行定量比较。最后,我们讨论了先前作品的挑战,并为读者提供一些有关未来研究方向的见解。
translated by 谷歌翻译
深度立体声匹配近年来取得了重大进展。然而,最先进的方法基于昂贵的4D成本体积,这限制了它们在现实世界中的应用。要解决此问题,已经提出了3D相关映射和迭代差异更新。关于在现实世界平台中,如自动驾驶汽车和机器人,通常安装LIDAR。因此,我们进一步将稀疏的LIDAR点引入了迭代更新,这减轻了网络更新从零状态的差异的负担。此外,我们提出以自我监督的方式培训网络,以便可以在任何捕获的数据上培训,以获得更好的泛化能力。实验和比较表明,呈现的方法是有效的,并通过相关方法实现了可比的结果。
translated by 谷歌翻译
作为许多自主驾驶和机器人活动的基本组成部分,如自我运动估计,障碍避免和场景理解,单眼深度估计(MDE)引起了计算机视觉和机器人社区的极大关注。在过去的几十年中,已经开发了大量方法。然而,据我们所知,对MDE没有全面调查。本文旨在通过审查1970年至2021年之间发布的197个相关条款来弥补这一差距。特别是,我们为涵盖各种方法的MDE提供了全面的调查,介绍了流行的绩效评估指标并汇总公开的数据集。我们还总结了一些代表方法的可用开源实现,并比较了他们的表演。此外,我们在一些重要的机器人任务中审查了MDE的应用。最后,我们通过展示一些有希望的未来研究方向来结束本文。预计本调查有助于读者浏览该研究领域。
translated by 谷歌翻译
这些年来,展示技术已经发展。开发实用的HDR捕获,处理和显示解决方案以将3D技术提升到一个新的水平至关重要。多曝光立体声图像序列的深度估计是开发成本效益3D HDR视频内容的重要任务。在本文中,我们开发了一种新颖的深度体系结构,以进行多曝光立体声深度估计。拟议的建筑有两个新颖的组成部分。首先,对传统立体声深度估计中使用的立体声匹配技术进行了修改。对于我们体系结构的立体深度估计部分,部署了单一到stereo转移学习方法。拟议的配方规避了成本量构造的要求,该要求由基于重新编码的单码编码器CNN取代,具有不同的重量以进行功能融合。基于有效网络的块用于学习差异。其次,我们使用强大的视差特征融合方法组合了从不同暴露水平上从立体声图像获得的差异图。使用针对不同质量度量计算的重量图合并在不同暴露下获得的差异图。获得的最终预测差异图更强大,并保留保留深度不连续性的最佳功能。提出的CNN具有使用标准动态范围立体声数据或具有多曝光低动态范围立体序列的训练的灵活性。在性能方面,所提出的模型超过了最新的单眼和立体声深度估计方法,无论是定量还是质量地,在具有挑战性的场景流以及暴露的Middlebury立体声数据集上。该体系结构在复杂的自然场景中表现出色,证明了其对不同3D HDR应用的有用性。
translated by 谷歌翻译
We present an end-to-end deep learning architecture for depth map inference from multi-view images. In the network, we first extract deep visual image features, and then build the 3D cost volume upon the reference camera frustum via the differentiable homography warping. Next, we apply 3D convolutions to regularize and regress the initial depth map, which is then refined with the reference image to generate the final output. Our framework flexibly adapts arbitrary N-view inputs using a variance-based cost metric that maps multiple features into one cost feature. The proposed MVSNet is demonstrated on the large-scale indoor DTU dataset. With simple post-processing, our method not only significantly outperforms previous state-of-the-arts, but also is several times faster in runtime. We also evaluate MVSNet on the complex outdoor Tanks and Temples dataset, where our method ranks first before April 18, 2018 without any fine-tuning, showing the strong generalization ability of MVSNet.
translated by 谷歌翻译
3D object detection is vital as it would enable us to capture objects' sizes, orientation, and position in the world. As a result, we would be able to use this 3D detection in real-world applications such as Augmented Reality (AR), self-driving cars, and robotics which perceive the world the same way we do as humans. Monocular 3D Object Detection is the task to draw 3D bounding box around objects in a single 2D RGB image. It is localization task but without any extra information like depth or other sensors or multiple images. Monocular 3D object detection is an important yet challenging task. Beyond the significant progress in image-based 2D object detection, 3D understanding of real-world objects is an open challenge that has not been explored extensively thus far. In addition to the most closely related studies.
translated by 谷歌翻译
Visual perception plays an important role in autonomous driving. One of the primary tasks is object detection and identification. Since the vision sensor is rich in color and texture information, it can quickly and accurately identify various road information. The commonly used technique is based on extracting and calculating various features of the image. The recent development of deep learning-based method has better reliability and processing speed and has a greater advantage in recognizing complex elements. For depth estimation, vision sensor is also used for ranging due to their small size and low cost. Monocular camera uses image data from a single viewpoint as input to estimate object depth. In contrast, stereo vision is based on parallax and matching feature points of different views, and the application of deep learning also further improves the accuracy. In addition, Simultaneous Location and Mapping (SLAM) can establish a model of the road environment, thus helping the vehicle perceive the surrounding environment and complete the tasks. In this paper, we introduce and compare various methods of object detection and identification, then explain the development of depth estimation and compare various methods based on monocular, stereo, and RDBG sensors, next review and compare various methods of SLAM, and finally summarize the current problems and present the future development trends of vision technologies.
translated by 谷歌翻译
深度估计是需要对环境的3D评估的广大应用程序的基石,例如机器人,增强现实和自主驱动来命名几个。深度估计的一个突出技术是立体声匹配,其具有多种优点:它被认为比其他深度传感技术更容易进入,可以实时产生密集的深度估计,并从近年来深度学习的进步中受益匪浅。然而,用于立体图像的深度估计的当前技术仍然遭受内置缺点。为了重建深度,立体声匹配算法首先在应用几何三角测量之前估计左图像和右图像之间的视差图。一个简单的分析表明,深度误差与对象距离相当成比例。因此,恒定的差异误差被转换为远离相机的物体的大深度误差。为了缓解这种二次关系,我们提出了一种简单但有效的方法,使用细化网络进行深度估计。我们展示了分析和经验结果表明所提出的学习程序减少了这种二次关系。我们评估了众所周知的基准和数据集的提出的细化程序,如演唱者和基提数据集,并在深度精度度量中展示了显着的改进。
translated by 谷歌翻译
In this paper, we present a novel method for integrating 3D LiDAR depth measurements into the existing ORB-SLAM3 by building upon the RGB-D mode. We propose and compare two methods of depth map generation: conventional computer vision methods, namely an inverse dilation operation, and a supervised deep learning-based approach. We integrate the former directly into the ORB-SLAM3 framework by adding a so-called RGB-L (LiDAR) mode that directly reads LiDAR point clouds. The proposed methods are evaluated on the KITTI Odometry dataset and compared to each other and the standard ORB-SLAM3 stereo method. We demonstrate that, depending on the environment, advantages in trajectory accuracy and robustness can be achieved. Furthermore, we demonstrate that the runtime of the ORB-SLAM3 algorithm can be reduced by more than 40 % compared to the stereo mode. The related code for the ORB-SLAM3 RGB-L mode will be available as open-source software under https://github.com/TUMFTM/ORB SLAM3 RGBL.
translated by 谷歌翻译
在接受高质量的地面真相(如LiDAR数据)培训时,监督的学习深度估计方法可以实现良好的性能。但是,LIDAR只能生成稀疏的3D地图,从而导致信息丢失。每个像素获得高质量的地面深度数据很难获取。为了克服这一限制,我们提出了一种新颖的方法,将有前途的平面和视差几何管道与深度信息与U-NET监督学习网络相结合的结构信息结合在一起,与现有的基于流行的学习方法相比,这会导致定量和定性的改进。特别是,该模型在两个大规模且具有挑战性的数据集上进行了评估:Kitti Vision Benchmark和CityScapes数据集,并在相对错误方面取得了最佳性能。与纯深度监督模型相比,我们的模型在薄物体和边缘的深度预测上具有令人印象深刻的性能,并且与结构预测基线相比,我们的模型的性能更加强大。
translated by 谷歌翻译
Our long term goal is to use image-based depth completion to quickly create 3D models from sparse point clouds, e.g. from SfM or SLAM. Much progress has been made in depth completion. However, most current works assume well distributed samples of known depth, e.g. Lidar or random uniform sampling, and perform poorly on uneven samples, such as from keypoints, due to the large unsampled regions. To address this problem, we extend CSPN with multiscale prediction and a dilated kernel, leading to much better completion of keypoint-sampled depth. We also show that a model trained on NYUv2 creates surprisingly good point clouds on ETH3D by completing sparse SfM points.
translated by 谷歌翻译
轻巧的飞行时间(TOF)深度传感器很小,便宜,低能量,并且已在移动设备上大量部署在移动设备上,以进行自动对焦,障碍物检测等。但是,由于其特定的测量值(深度分布)在某个像素时的区域而不是深度值,并且分辨率极低,它们不足以用于需要高保真深度(例如3D重建)的应用。在本文中,我们提出了Deltar,这是一种新颖的方法,可以通过与颜色图像合作来赋予高分辨率和准确深度的能力。作为Deltar的核心,提出了一种用于深度分布的特征提取器,并提出了基于注意力的神经体系结构,以有效地从颜色和TOF域中融合信息。为了在现实世界中评估我们的系统,我们设计了一个数据收集设备,并提出了一种校准RGB摄像头和TOF传感器的新方法。实验表明,我们的方法比旨在使用商品级RGB-D传感器的PAR性能实现的现有框架比现有的框架产生更准确的深度。代码和数据可在https://zju3dv.github.io/deltar/上获得。
translated by 谷歌翻译