适应不断发展的环境是所有自动驾驶系统不可避免地面临的安全挑战。但是,现有的图像和视频驾驶数据集未能捕获现实世界的可变性质。在本文中,我们介绍了最大的多任务合成数据集,用于自动驾驶,转移。它显示了云彩,雨水强度,一天中的时间以及车辆和行人密度的离散和连续变化。Shift采用全面的传感器套件和几个主流感知任务的注释,可以调查在域转移水平越来越高的感知系统性能下降,从而促进了持续适应策略的发展,以减轻此问题并评估模型的鲁棒性和一般性。我们的数据集和基准工具包可在www.vis.xyz/shift上公开获得。
translated by 谷歌翻译
交通场景边缘壳体的语义分割的鲁棒性是智能运输安全的重要因素。然而,交通事故的大多数关键场景都是非常动态和以前看不见的,这严重损害了语义分割方法的性能。另外,在高速驾驶期间传统相机的延迟将进一步降低时间尺寸中的上下文信息。因此,我们建议从基于事件的数据提取动态上下文,以更高的时间分辨率来增强静态RGB图像,即使对于来自运动模糊,碰撞,变形,翻转等的流量事故而言,此外,为评估分割交通事故中的性能,我们提供了一个像素 - 明智的注释事故数据集,即Dada-Seg,其中包含来自交通事故的各种临界情景。我们的实验表明,基于事件的数据可以通过在事故中保留快速移动的前景(碰撞物体)的微粒运动来提供互补信息以在不利条件下稳定语义分割。我们的方法在拟议的事故数据集中实现了+ 8.2%的性能增益,超过了20多种最先进的语义细分方法。已经证明该提案对于在多个源数据库中学到的模型,包括CityScapes,Kitti-360,BDD和Apolloscape的模型始终如一。
translated by 谷歌翻译
Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving. Researchers are usually constrained to study a small set of problems on one dataset, while real-world computer vision applications require performing tasks of various complexities. We construct BDD100K 1 , the largest driving video dataset with 100K videos and 10 tasks to evaluate the exciting progress of image recognition algorithms on autonomous driving. The dataset possesses geographic, environmental, and weather diversity, which is useful for training models that are less likely to be surprised by new conditions. Based on this diverse dataset, we build a benchmark for heterogeneous multitask learning and study how to solve the tasks together. Our experiments show that special training strategies are needed for existing models to perform such heterogeneous tasks. BDD100K opens the door for future studies in this important venue.
translated by 谷歌翻译
视频分析的图像分割在不同的研究领域起着重要作用,例如智能城市,医疗保健,计算机视觉和地球科学以及遥感应用。在这方面,最近致力于发展新的细分策略;最新的杰出成就之一是Panoptic细分。后者是由语义和实例分割的融合引起的。明确地,目前正在研究Panoptic细分,以帮助获得更多对视频监控,人群计数,自主驾驶,医学图像分析的图像场景的更细致的知识,以及一般对场景更深入的了解。为此,我们介绍了本文的首次全面审查现有的Panoptic分段方法,以获得作者的知识。因此,基于所采用的算法,应用场景和主要目标的性质,执行现有的Panoptic技术的明确定义分类。此外,讨论了使用伪标签注释新数据集的Panoptic分割。继续前进,进行消融研究,以了解不同观点的Panoptic方法。此外,讨论了适合于Panoptic分割的评估度量,并提供了现有解决方案性能的比较,以告知最先进的并识别其局限性和优势。最后,目前对主题技术面临的挑战和吸引不久的将来吸引相当兴趣的未来趋势,可以成为即将到来的研究研究的起点。提供代码的文件可用于:https://github.com/elharroussomar/awesome-panoptic-egation
translated by 谷歌翻译
具有丰富注释的高质量结构化数据是处理道路场景的智能车辆系统中的关键组件。但是,数据策展和注释需要大量投资并产生低多样性的情况。最近对合成数据的兴趣日益增长,提出了有关此类系统改进范围的问题,以及产生大量和变化的模拟数据所需的手动工作量。这项工作提出了一条合成数据生成管道,该管道利用现有数据集(如Nuscenes)来解决模拟数据集中存在的困难和域间隙。我们表明,使用现有数据集的注释和视觉提示,我们可以促进自动化的多模式数据生成,模仿具有高保真性的真实场景属性,以及以物理意义的方式使样本多样化的机制。我们通过提供定性和定量实验,并通过使用真实和合成数据来证明MIOU指标的改进,以实现CityScapes和Kitti-Step数据集的语义分割。所有相关代码和数据均在GitHub(https://github.com/shubham1810/trove_toolkit)上发布。
translated by 谷歌翻译
现代计算机视觉已超越了互联网照片集的领域,并进入了物理世界,通过非结构化的环境引导配备摄像头的机器人和自动驾驶汽车。为了使这些体现的代理与现实世界对象相互作用,相机越来越多地用作深度传感器,重建了各种下游推理任务的环境。机器学习辅助的深度感知或深度估计会预测图像中每个像素的距离。尽管已经在深入估算中取得了令人印象深刻的进步,但仍然存在重大挑战:(1)地面真相深度标签很难大规模收集,(2)通常认为相机信息是已知的,但通常是不可靠的,并且(3)限制性摄像机假设很常见,即使在实践中使用了各种各样的相机类型和镜头。在本论文中,我们专注于放松这些假设,并描述将相机变成真正通用深度传感器的最终目标的贡献。
translated by 谷歌翻译
Computer vision applications in intelligent transportation systems (ITS) and autonomous driving (AD) have gravitated towards deep neural network architectures in recent years. While performance seems to be improving on benchmark datasets, many real-world challenges are yet to be adequately considered in research. This paper conducted an extensive literature review on the applications of computer vision in ITS and AD, and discusses challenges related to data, models, and complex urban environments. The data challenges are associated with the collection and labeling of training data and its relevance to real world conditions, bias inherent in datasets, the high volume of data needed to be processed, and privacy concerns. Deep learning (DL) models are commonly too complex for real-time processing on embedded hardware, lack explainability and generalizability, and are hard to test in real-world settings. Complex urban traffic environments have irregular lighting and occlusions, and surveillance cameras can be mounted at a variety of angles, gather dirt, shake in the wind, while the traffic conditions are highly heterogeneous, with violation of rules and complex interactions in crowded scenarios. Some representative applications that suffer from these problems are traffic flow estimation, congestion detection, autonomous driving perception, vehicle interaction, and edge computing for practical deployment. The possible ways of dealing with the challenges are also explored while prioritizing practical deployment.
translated by 谷歌翻译
The last decade witnessed increasingly rapid progress in self-driving vehicle technology, mainly backed up by advances in the area of deep learning and artificial intelligence. The objective of this paper is to survey the current state-of-the-art on deep learning technologies used in autonomous driving. We start by presenting AI-based self-driving architectures, convolutional and recurrent neural networks, as well as the deep reinforcement learning paradigm. These methodologies form a base for the surveyed driving scene perception, path planning, behavior arbitration and motion control algorithms. We investigate both the modular perception-planning-action pipeline, where each module is built using deep learning methods, as well as End2End systems, which directly map sensory information to steering commands. Additionally, we tackle current challenges encountered in designing AI architectures for autonomous driving, such as their safety, training data sources and computational hardware. The comparison presented in this survey helps to gain insight into the strengths and limitations of deep learning and AI approaches for autonomous driving and assist with design choices. 1
translated by 谷歌翻译
在鸟眼中学习强大的表现(BEV),以进行感知任务,这是趋势和吸引行业和学术界的广泛关注。大多数自动驾驶算法的常规方法在正面或透视视图中执行检测,细分,跟踪等。随着传感器配置变得越来越复杂,从不同的传感器中集成了多源信息,并在统一视图中代表功能至关重要。 BEV感知继承了几个优势,因为代表BEV中的周围场景是直观和融合友好的。对于BEV中的代表对象,对于随后的模块,如计划和/或控制是最可取的。 BEV感知的核心问题在于(a)如何通过从透视视图到BEV来通过视图转换来重建丢失的3D信息; (b)如何在BEV网格中获取地面真理注释; (c)如何制定管道以合并来自不同来源和视图的特征; (d)如何适应和概括算法作为传感器配置在不同情况下各不相同。在这项调查中,我们回顾了有关BEV感知的最新工作,并对不同解决方案进行了深入的分析。此外,还描述了该行业的BEV方法的几种系统设计。此外,我们推出了一套完整的实用指南,以提高BEV感知任务的性能,包括相机,激光雷达和融合输入。最后,我们指出了该领域的未来研究指示。我们希望该报告能阐明社区,并鼓励对BEV感知的更多研究。我们保留一个活跃的存储库来收集最新的工作,并在https://github.com/openperceptionx/bevperception-survey-recipe上提供一包技巧的工具箱。
translated by 谷歌翻译
Panoptic图像分割是计算机视觉任务,即在图像中查找像素组并为其分配语义类别和对象实例标识符。由于其在机器人技术和自动驾驶中的关键应用,图像细分的研究变得越来越流行。因此,研究社区依靠公开可用的基准数据集来推进计算机视觉中的最新技术。但是,由于将图像标记为高昂的成本,因此缺乏适合全景分割的公开地面真相标签。高标签成本还使得将现有数据集扩展到视频域和多相机设置是一项挑战。因此,我们介绍了Waymo Open DataSet:全景视频全景分割数据集,这是一个大型数据集,它提供了用于自主驾驶的高质量的全景分割标签。我们使用公开的Waymo打开数据集生成数据集,利用各种相机图像集。随着时间的推移,我们的标签是一致的,用于视频处理,并且在车辆上安装的多个摄像头保持一致,以了解全景的理解。具体而言,我们为28个语义类别和2,860个时间序列提供标签,这些标签由在三个不同地理位置驾驶的自动驾驶汽车上安装的五个摄像机捕获,从而导致总共标记为100k标记的相机图像。据我们所知,这使我们的数据集比现有的数据集大量数据集大的数量级。我们进一步提出了一个新的基准,用于全景视频全景分割,并根据DeepLab模型家族建立许多强大的基准。我们将公开制作基准和代码。在https://waymo.com/open上找到数据集。
translated by 谷歌翻译
由于大规模数据集的可用性,通常在特定位置和良好的天气条件下收集的大规模数据集,近年来,自动驾驶汽车的感知进展已加速。然而,为了达到高安全要求,这些感知系统必须在包括雪和雨在内的各种天气条件下进行稳健运行。在本文中,我们提出了一个新数据集,以通过新颖的数据收集过程启用强大的自动驾驶 - 在不同场景(Urban,Highway,乡村,校园),天气,雪,雨,阳光下,沿着15公里的路线反复记录数据),时间(白天/晚上)以及交通状况(行人,骑自行车的人和汽车)。该数据集包括来自摄像机和激光雷达传感器的图像和点云,以及高精度GPS/ins以在跨路线上建立对应关系。该数据集包括使用Amodal掩码捕获部分遮挡和3D边界框的道路和对象注释。我们通过分析基准在道路和对象,深度估计和3D对象检测中的性能来证明该数据集的独特性。重复的路线为对象发现,持续学习和异常检测打开了新的研究方向。链接到ITHACA365:https://ithaca365.mae.cornell.edu/
translated by 谷歌翻译
自动化驾驶系统(广告)开辟了汽车行业的新领域,为未来的运输提供了更高的效率和舒适体验的新可能性。然而,在恶劣天气条件下的自主驾驶已经存在,使自动车辆(AVS)长时间保持自主车辆(AVS)或更高的自主权。本文评估了天气在分析和统计方式中为广告传感器带来的影响和挑战,并对恶劣天气条件进行了解决方案。彻底报道了关于对每种天气的感知增强的最先进技术。外部辅助解决方案如V2X技术,当前可用的数据集,模拟器和天气腔室的实验设施中的天气条件覆盖范围明显。通过指出各种主要天气问题,自主驾驶场目前正在面临,近年来审查硬件和计算机科学解决方案,这项调查概述了在不利的天气驾驶条件方面的障碍和方向的障碍和方向。
translated by 谷歌翻译
当标签稀缺时,域的适应性是使学习能够学习的重要任务。尽管大多数作品仅着眼于图像模式,但有许多重要的多模式数据集。为了利用多模式的域适应性,我们提出了跨模式学习,在这种学习中,我们通过相互模仿在两种模式的预测之间执行一致性。我们限制了我们的网络,以对未标记的目标域数据进行正确预测,并在标记的数据和跨模式的一致预测中进行预测。在无监督和半监督的域适应设置中进行的实验证明了这种新型域适应策略的有效性。具体而言,我们评估了从2D图像,3D点云或两者都从3D语义分割的任务进行评估。我们利用最近的驾驶数据集生产各种域名适应场景,包括场景布局,照明,传感器设置和天气以及合成到现实的设置的变化。我们的方法在所有适应方案上都显着改善了以前的单模式适应基线。我们的代码可在https://github.com/valeoai/xmuda_journal上公开获取
translated by 谷歌翻译
作为许多自主驾驶和机器人活动的基本组成部分,如自我运动估计,障碍避免和场景理解,单眼深度估计(MDE)引起了计算机视觉和机器人社区的极大关注。在过去的几十年中,已经开发了大量方法。然而,据我们所知,对MDE没有全面调查。本文旨在通过审查1970年至2021年之间发布的197个相关条款来弥补这一差距。特别是,我们为涵盖各种方法的MDE提供了全面的调查,介绍了流行的绩效评估指标并汇总公开的数据集。我们还总结了一些代表方法的可用开源实现,并比较了他们的表演。此外,我们在一些重要的机器人任务中审查了MDE的应用。最后,我们通过展示一些有希望的未来研究方向来结束本文。预计本调查有助于读者浏览该研究领域。
translated by 谷歌翻译
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc., and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.
translated by 谷歌翻译
TU Dresden www.cityscapes-dataset.net train/val -fine annotation -3475 images train -coarse annotation -20 000 images test -fine annotation -1525 images
translated by 谷歌翻译
确保所有交通参与者的安全性是将智能车辆更接近实际应用的先决条件。援助系统不仅应在正常条件下实现高精度,而是获得对极端情况的强大感知。然而,在大多数训练集中涉及对象碰撞,变形,翻转等的交通事故,但是看不见的,在很大程度上损害了现有语义分段模型的性能。为了解决这个问题,我们在意外场景中的语义细分,以及事故DADASET DADA-SEG,我们展示了一个很少有关的任务。它包含313个不同的事故序列,每个事故序列有40帧,其中时间窗口位于交通事故之前和期间。每11个帧都是手动注释,用于基准测试分割性能。此外,我们提出了一种新的基于事件的多模态分段架构励志。我们的实验表明,基于事件的数据可以通过在事故中保留快速移动的前景(碰撞物体)的微粒运动来提供互补信息以在不利条件下稳定语义分割。我们的方法达到+ 8.2%的Miou性能收益,拟议的评估集,超过了10多种最先进的细分方法。拟议的Issafe架构被证明对于在多个源数据库上学到的模型,包括CityScapes,Kitti-360,BDD和Apolloscape的模型始终如一。
translated by 谷歌翻译
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing selfdriving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with real-world selfdriving problems, we introduce a new large-scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest cam-era+LiDAR dataset available based on our proposed geographical coverage metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.
translated by 谷歌翻译
环绕视图相机是用于自动驾驶的主要传感器,用于近场感知。它是主要用于停车可视化和自动停车的商用车中最常用的传感器之一。四个带有190 {\ deg}视场覆盖车辆周围360 {\ deg}的鱼眼相机。由于其高径向失真,标准算法不容易扩展。以前,我们发布了第一个名为Woodscape的公共鱼眼环境视图数据集。在这项工作中,我们发布了环绕视图数据集的合成版本,涵盖了其许多弱点并扩展了它。首先,不可能获得像素光流和深度的地面真相。其次,为了采样不同的框架,木景没有同时注释的所有四个相机。但是,这意味着不能设计多相机算法以在新数据集中启用的鸟眼空间中获得统一的输出。我们在Carla模拟器中实现了环绕式鱼眼的几何预测,与木观的配置相匹配并创建了Synwoodscape。
translated by 谷歌翻译
Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image based benchmark datasets have driven development in computer vision tasks such as object detection, tracking and segmentation of agents in the environment. Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for detection and tracking become more prevalent, there is a need to train and evaluate such methods on datasets containing range sensor data along with images. In this work we present nuTonomy scenes (nuScenes), the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. We define novel 3D detection and tracking metrics. We also provide careful dataset analysis as well as baselines for lidar and image based detection and tracking. Data, development kit and more information are available online 1 .
translated by 谷歌翻译