To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings. Yet, camera-based 3D object tracking methods prioritize optimizing the single-camera setup and resort to post-hoc fusion in a multi-camera setup. In this paper, we propose a method for panoramic 3D object tracking, called CC-3DT, that associates and models object trajectories both temporally and across views, and improves the overall tracking consistency. In particular, our method fuses 3D detections from multiple cameras before association, reducing identity switches significantly and improving motion modeling. Our experiments on large-scale driving datasets show that fusion before association leads to a large margin of improvement over post-hoc fusion. We set a new state-of-the-art with 12.6% improvement in average multi-object tracking accuracy (AMOTA) among all camera-based methods on the competitive NuScenes 3D tracking benchmark, outperforming previously published methods by 6.5% in AMOTA with the same 3D detector.
translated by 谷歌翻译
本文旨在解决多个对象跟踪(MOT),这是计算机视觉中的一个重要问题,但由于许多实际问题,尤其是阻塞,因此仍然具有挑战性。确实,我们提出了一种新的实时深度透视图 - 了解多个对象跟踪(DP-MOT)方法,以解决MOT中的闭塞问题。首先提出了一个简单但有效的主题深度估计(SODE),以在2D场景中自动以无监督的方式自动订购检测到的受试者的深度位置。使用SODE的输出,提出了一个新的活动伪3D KALMAN滤波器,即具有动态控制变量的Kalman滤波器的简单但有效的扩展,以动态更新对象的运动。此外,在数据关联步骤中提出了一种新的高阶关联方法,以合并检测到的对象之间的一阶和二阶关系。与标准MOT基准的最新MOT方法相比,提出的方法始终达到最先进的性能。
translated by 谷歌翻译
3D多对象跟踪旨在唯一,始终如一地识别所有移动实体。尽管在此设置中提供了丰富的时空信息,但当前的3D跟踪方法主要依赖于抽象的信息和有限的历史记录,例如单帧对象边界框。在这项工作中,我们开发了对交通场景的整体表示,该场景利用了现场演员的空间和时间信息。具体而言,我们通过将跟踪的对象表示为时空点和边界框的序列来重新将跟踪作为时空问题,并在悠久的时间历史上进行重新制定。在每个时间戳上,我们通过对对象历史记录的完整顺序进行的细化来改善跟踪对象的位置和运动估计。通过共同考虑时间和空间,我们的代表自然地编码了基本的物理先验,例如对象持久性和整个时间的一致性。我们的时空跟踪框架在Waymo和Nuscenes基准测试中实现了最先进的性能。
translated by 谷歌翻译
以前的在线3D多对象跟踪(3DMOT)方法在与几帧的新检测无关时终止ROCKET。但是如果一个物体刚刚变暗,就像被其他物体暂时封闭或者只是从FOV暂时封闭一样,过早地终止ROCKET将导致身份切换。我们揭示了过早的轨迹终端是现代3DMOT系统中身份开关的主要原因。为了解决这个问题,我们提出了一个不朽的跟踪器,一个简单的跟踪系统,它利用轨迹预测来维护对象变暗的物体的轨迹。我们使用一个简单的卡尔曼滤波器进行轨迹预测,并在目标不可见时通过预测保留轨迹。通过这种方法,我们可以避免由过早托管终止产生的96%的车辆标识开关。如果没有任何学习的参数,我们的方法在Waymo Open DataSet测试集上的车载类别的0.0001级和竞争Mota处实现了不匹配的比率。我们的不匹配比率比任何先前发表的方法低一倍。在NUSCENes上报告了类似的结果。我们相信拟议的不朽追踪器可以为推动3DMOT的极限提供简单而强大的解决方案。我们的代码可在https://github.com/immortaltracker/immortaltracker中找到。
translated by 谷歌翻译
一方面,在最近的文献中,许多3D多对象跟踪(MOT)的作品集中在跟踪准确性和被忽视的计算速度上,通常是通过设计相当复杂的成本功能和功能提取器来进行的。另一方面,某些方法以跟踪准确性为代价过多地关注计算速度。鉴于这些问题,本文提出了一种强大而快速的基于相机融合的MOT方法,该方法在准确性和速度之间取决于良好的权衡。依靠相机和激光雷达传感器的特性,设计并嵌入了提出的MOT方法中的有效的深层关联机制。该关联机制在对象远处并仅由摄像机检测到2D域中的对象,并在对象出现在LIDAR的视野中以实现平滑融合时获得的2D轨迹进行更新,并更新2D轨迹。 2D和3D轨迹。基于典型数据集的广泛实验表明,就跟踪准确性和处理速度而言,我们提出的方法在最先进的MOT方法上具有明显的优势。我们的代码可公开用于社区的利益。
translated by 谷歌翻译
多对象跟踪(MOT)的目标是检测和跟踪场景中的所有对象,同时为每个对象保留唯一的标识符。在本文中,我们提出了一种新的可靠的最新跟踪器,该跟踪器可以结合运动和外观信息的优势,以及摄像机运动补偿以及更准确的Kalman滤波器状态矢量。我们的新跟踪器在Mot17和Mot20测试集的Motchallenge [29,11]的数据集[29,11]中,Bot-Sort-Reid排名第一,就所有主要MOT指标而言:MOTA,IDF1和HOTA。对于Mot17:80.5 Mota,80.2 IDF1和65.0 HOTA。源代码和预培训模型可在https://github.com/niraharon/bot-sort上找到
translated by 谷歌翻译
3D多对象跟踪(MOT)确保在连续动态检测过程中保持一致性,有利于自动驾驶中随后的运动计划和导航任务。但是,基于摄像头的方法在闭塞情况下受到影响,准确跟踪基于激光雷达的方法的对象的不规则运动可能是具有挑战性的。某些融合方法效果很好,但不认为在遮挡下出现外观特征的不可信问题。同时,错误检测问题也显着影响跟踪。因此,我们根据组合的外观运动优化(Camo-Mot)提出了一种新颖的相机融合3D MOT框架,该框架使用相机和激光镜数据,并大大减少了由遮挡和错误检测引起的跟踪故障。对于遮挡问题,我们是第一个提出遮挡头来有效地选择最佳对象外观的人,从而减少了闭塞的影响。为了减少错误检测在跟踪中的影响,我们根据置信得分设计一个运动成本矩阵,从而提高了3D空间中的定位和对象预测准确性。由于现有的多目标跟踪方法仅考虑一个类别,因此我们还建议建立多类损失,以在多类别场景中实现多目标跟踪。在Kitti和Nuscenes跟踪基准测试上进行了一系列验证实验。我们提出的方法在KITTI测试数据集上的所有多模式MOT方法中实现了最先进的性能和最低的身份开关(IDS)值(CAR为23,行人为137)。并且我们提出的方法在Nuscenes测试数据集上以75.3%的AMOTA进行了所有算法中的最新性能。
translated by 谷歌翻译
由于3D对象检测和2D MOT的快速发展,3D多对象跟踪(MOT)已取得了巨大的成就。最近的高级工作通常采用一系列对象属性,例如位置,大小,速度和外观,以提供3D MOT的关联线索。但是,由于某些视觉噪音,例如遮挡和模糊,这些提示可能无法可靠,从而导致跟踪性能瓶颈。为了揭示困境,我们进行了广泛的经验分析,以揭示每个线索的关键瓶颈及其彼此之间的相关性。分析结果激发了我们有效地吸收所有线索之间的优点,并适应性地产生最佳的应对方式。具体而言,我们提出位置和速度质量学习,该学习有效地指导网络估计预测对象属性的质量。基于这些质量估计,我们提出了一种质量意识的对象关联(QOA)策略,以利用质量得分作为实现强大关联的重要参考因素。尽管具有简单性,但广泛的实验表明,提出的策略可显着提高2.2%的AMOTA跟踪性能,而我们的方法的表现优于所有现有的最先进的Nuscenes上的最新作品。此外,Qtrack在Nuscenes验证和测试集上实现了48.0%和51.1%的AMOTA跟踪性能,这大大降低了纯摄像头和基于LIDAR的跟踪器之间的性能差距。
translated by 谷歌翻译
Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. We present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That's it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves 67.8% MOTA on the MOT17 challenge at 22 FPS and 89.4% MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves 28.3% AMOTA@0.2 on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.
translated by 谷歌翻译
多任务学习的最新研究揭示了解决单个神经网络中相关问题的好处。 3D对象检测和多对象跟踪(MOT)是两个严重的相互交织的问题,可以预测并关联整个时间的对象实例位置。但是,3D MOT中的大多数先前作品都将检测器视为先前的分离管道,不一致地将检测器的输出作为跟踪器的输入。在这项工作中,我们提出了Minkowski Tracker,这是一种稀疏的时空R-CNN,可以共同解决对象检测和跟踪。受基于区域的CNN(R-CNN)的启发,我们建议将跟踪作为对象检测器R-CNN的第二阶段,该跟踪预测了轨道的分配概率。首先,Minkowski Tracker将4D点云作为输入,以生成时空鸟的视图(BEV)特征通过4D稀疏卷积编码器网络。然后,我们提出的TrackAlign聚集了BEV功能的轨道区域(ROI)功能。最后,Minkowski Tracker根据ROI功能预测的检测到追踪匹配概率更新了跟踪及其置信得分。我们在大规模实验中显示,我们方法的总体性能增益是由于四个因素:1。4D编码器的时间推理提高了检测性能2.对象检测的多任务学习和MOT共同增强了彼此3.检测到轨道比赛得分学习隐式运动模型以增强轨道分配4.检测到轨道匹配分数提高了轨道置信度得分的质量。结果,Minkowski Tracker在没有手工设计的运动模型的情况下实现了Nuscenes数据集跟踪任务上的最新性能。
translated by 谷歌翻译
3D多对象跟踪(MOT)是自动驾驶汽车的关键问题,需要在动态环境中执行信息良好的运动计划。特别是对于密集的占领场景,将现有曲目与新检测相关联仍然具有挑战性,因为现有系统倾向于省略关键的上下文信息。我们提出的解决方案InterTrack引入了3D MOT的相互作用变压器,以生成数据关联的区分对象表示。我们为每个轨道和检测提取状态和形状特征,并通过注意力有效地汇总全局信息。然后,我们对每个轨道/检测功能对进行学习的回归以估计亲和力,并使用强大的两阶段数据关联和轨道管理方法来生成最终轨道。我们在Nuscenes 3D MOT基准上验证了我们的方法,在那里我们观察到了显着的改进,尤其是在物理大小和聚类对象的类别上。从提交开始时,InterTrack在使用CenterPoint检测的方法中排名第1位AMOTA。
translated by 谷歌翻译
近年来,多个对象跟踪引起了研究人员的极大兴趣,它已成为计算机视觉中的趋势问题之一,尤其是随着自动驾驶的最新发展。 MOT是针对不同问题的关键视觉任务之一,例如拥挤的场景中的闭塞,相似的外观,小物体检测难度,ID切换等,以应对这些挑战,因为研究人员试图利用变压器的注意力机制,与田径的相互关系,与田径的相互关系,图形卷积神经网络,与暹罗网络不同帧中对象的外观相似性,他们还尝试了基于IOU匹配的CNN网络,使用LSTM的运动预测。为了将这些零散的技术在雨伞下采用,我们研究了过去三年发表的一百多篇论文,并试图提取近代研究人员更关注的技术来解决MOT的问题。我们已经征集了许多应用,可能性以及MOT如何与现实生活有关。我们的评论试图展示研究人员使用过时的技术的不同观点,并为潜在的研究人员提供了一些未来的方向。此外,我们在这篇评论中包括了流行的基准数据集和指标。
translated by 谷歌翻译
Figure 1: We introduce datasets for 3D tracking and motion forecasting with rich maps for autonomous driving. Our 3D tracking dataset contains sequences of LiDAR measurements, 360 • RGB video, front-facing stereo (middle-right), and 6-dof localization. All sequences are aligned with maps containing lane center lines (magenta), driveable region (orange), and ground height. Sequences are annotated with 3D cuboid tracks (green). A wider map view is shown in the bottom-right.
translated by 谷歌翻译
Multi-object tracking is a cornerstone capability of any robotic system. Most approaches follow a tracking-by-detection paradigm. However, within this framework, detectors function in a low precision-high recall regime, ensuring a low number of false-negatives while producing a high rate of false-positives. This can negatively affect the tracking component by making data association and track lifecycle management more challenging. Additionally, false-negative detections due to difficult scenarios like occlusions can negatively affect tracking performance. Thus, we propose a method that learns shape and spatio-temporal affinities between consecutive frames to better distinguish between true-positive and false-positive detections and tracks, while compensating for false-negative detections. Our method provides a probabilistic matching of detections that leads to robust data association and track lifecycle management. We quantitatively evaluate our method through ablative experiments and on the nuScenes tracking benchmark where we achieve state-of-the-art results. Our method not only estimates accurate, high-quality tracks but also decreases the overall number of false-positive and false-negative tracks. Please see our project website for source code and demo videos: sites.google.com/view/shasta-3d-mot/home.
translated by 谷歌翻译
流行的对象检测度量平均精度(3D AP)依赖于预测的边界框和地面真相边界框之间的结合。但是,基于摄像机的深度估计的精度有限,这可能会导致其他合理的预测,这些预测遭受了如此纵向定位错误,被视为假阳性和假阴性。因此,我们提出了流行的3D AP指标的变体,这些变体旨在在深度估计误差方面更具允许性。具体而言,我们新颖的纵向误差耐受度指标,Let-3D-AP和Let-3D-APL,允许预测的边界框的纵向定位误差,最高为给定的公差。所提出的指标已在Waymo Open DataSet 3D摄像头仅检测挑战中使用。我们认为,它们将通过提供更有信息的性能信号来促进仅相机3D检测领域的进步。
translated by 谷歌翻译
长期以来,多对象跟踪中最常见的范式是逐个检测(TBD),首先检测到对象,然后通过视频帧关联。对于关联,大多数模型用于运动和外观提示。尽管仍然依靠这些提示,但最新的方法(例如,注意力)表明对训练数据和整体复杂框架的需求不断增加。我们声称1)如果采用某些关键的设计选择,可以从很少的培训数据中获得强大的提示,2)鉴于这些强大的提示,标准的基于匈牙利匹配的关联足以获得令人印象深刻的结果。我们的主要见解是确定允许标准重新识别网络在基于外观的跟踪方面表现出色的关键组件。我们广泛地分析了其故障案例,并表明我们的外观特征与简单运动模型的结合导致了强大的跟踪结果。我们的模型在MOT17和MOT20数据集上实现了最新的性能,在IDF1中最多可超过5.4pp,在IDF1和HOTA中的4.4pp优于先前的最新跟踪器。我们将在本文接受后发布代码和模型。
translated by 谷歌翻译
The problem of tracking multiple objects in a video sequence poses several challenging tasks. For tracking-bydetection, these include object re-identification, motion prediction and dealing with occlusions. We present a tracker (without bells and whistles) that accomplishes tracking without specifically targeting any of these tasks, in particular, we perform no training or optimization on tracking data. To this end, we exploit the bounding box regression of an object detector to predict the position of an object in the next frame, thereby converting a detector into a Tracktor. We demonstrate the potential of Tracktor and provide a new state-of-the-art on three multi-object tracking benchmarks by extending it with a straightforward re-identification and camera motion compensation.We then perform an analysis on the performance and failure cases of several state-of-the-art tracking methods in comparison to our Tracktor. Surprisingly, none of the dedicated tracking methods are considerably better in dealing with complex tracking scenarios, namely, small and occluded objects or missing detections. However, our approach tackles most of the easy tracking scenarios. Therefore, we motivate our approach as a new tracking paradigm and point out promising future research directions. Overall, Tracktor yields superior tracking performance than any current tracking method and our analysis exposes remaining and unsolved tracking challenges to inspire future research directions.
translated by 谷歌翻译
对象运动和对象外观是多个对象跟踪(MOT)应用中的常用信息,用于将帧跨越帧的检测相关联,或用于联合检测和跟踪方法的直接跟踪预测。然而,不仅是这两种类型的信息通常是单独考虑的,而且它们也没有帮助直接从当前感兴趣帧中使用视觉信息的用法。在本文中,我们提出了PatchTrack,一种基于变压器的联合检测和跟踪系统,其使用当前感兴趣的帧帧的曲线预测曲目。我们使用卡尔曼滤波器从前一帧预测当前帧中的现有轨道的位置。从预测边界框裁剪的补丁被发送到变压器解码器以推断新曲目。通过利用在补丁中编码的对象运动和对象外观信息,所提出的方法将更多地关注新曲目更有可能发生的位置。我们展示了近期MOT基准的Patchtrack的有效性,包括MOT16(MOTA 73.71%,IDF1 65.77%)和MOT17(MOTA 73.59%,IDF1 65.23%)。结果在https://motchallenge.net/method/mot=4725&chl=10上发布。
translated by 谷歌翻译
Three-dimensional objects are commonly represented as 3D boxes in a point-cloud. This representation mimics the well-studied image-based 2D bounding-box detection but comes with additional challenges. Objects in a 3D world do not follow any particular orientation, and box-based detectors have difficulties enumerating all orientations or fitting an axis-aligned bounding box to rotated objects. In this paper, we instead propose to represent, detect, and track 3D objects as points. Our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity. In a second stage, it refines these estimates using additional point features on the object. In CenterPoint, 3D object tracking simplifies to greedy closest-point matching. The resulting detection and tracking algorithm is simple, efficient, and effective. CenterPoint achieved state-of-theart performance on the nuScenes benchmark for both 3D detection and tracking, with 65.5 NDS and 63.8 AMOTA for a single model. On the Waymo Open Dataset, Center-Point outperforms all previous single model methods by a large margin and ranks first among all Lidar-only submissions. The code and pretrained models are available at https://github.com/tianweiy/CenterPoint.
translated by 谷歌翻译
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing selfdriving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with real-world selfdriving problems, we introduce a new large-scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest cam-era+LiDAR dataset available based on our proposed geographical coverage metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.
translated by 谷歌翻译