多摄像机多对象跟踪目前在计算机视野中引起了注意力,因为它在现实世界应用中的卓越性能,如具有拥挤场景或巨大空间的视频监控。在这项工作中,我们提出了一种基于空间升降的多乳制型配方的数学上优雅的多摄像多对象跟踪方法。我们的模型利用单摄像头跟踪器产生的最先进的TOOTWLET作为提案。由于这些Tracklet可能包含ID-Switch错误,因此我们通过从3D几何投影获得的新型预簇来完善它们。因此,我们派生了更好的跟踪图,没有ID交换机,更精确的数据关联阶段的亲和力成本。然后通过求解全局提升的多乳制型制剂,将轨迹与多摄像机轨迹匹配,该组件包含位于同一相机和相互相机间的Tracklet上的短路和远程时间交互。在Wildtrack DataSet的实验结果是近乎完美的结果,在校园上表现出最先进的追踪器,同时在PETS-09数据集上处于校准状态。我们将在接受纸质时进行我们的实施。
translated by 谷歌翻译
本文旨在解决多个对象跟踪(MOT),这是计算机视觉中的一个重要问题,但由于许多实际问题,尤其是阻塞,因此仍然具有挑战性。确实,我们提出了一种新的实时深度透视图 - 了解多个对象跟踪(DP-MOT)方法,以解决MOT中的闭塞问题。首先提出了一个简单但有效的主题深度估计(SODE),以在2D场景中自动以无监督的方式自动订购检测到的受试者的深度位置。使用SODE的输出,提出了一个新的活动伪3D KALMAN滤波器,即具有动态控制变量的Kalman滤波器的简单但有效的扩展,以动态更新对象的运动。此外,在数据关联步骤中提出了一种新的高阶关联方法,以合并检测到的对象之间的一阶和二阶关系。与标准MOT基准的最新MOT方法相比,提出的方法始终达到最先进的性能。
translated by 谷歌翻译
图提供了一种自然的方式来制定多个对象跟踪(MOT)和多个对象跟踪和分割(MOTS),逐个检测范式中。但是,他们还引入了学习方法的主要挑战,因为定义可以在这种结构化领域运行的模型并不是微不足道的。在这项工作中,我们利用MOT的经典网络流程公式来定义基于消息传递网络(MPN)的完全微分框架。通过直接在图形域上操作,我们的方法可以在整个检测和利用上下文特征上全球推理。然后,它共同预测了数据关联问题的最终解决方案和场景中所有对象的分割掩码,同时利用这两个任务之间的协同作用。我们在几个公开可用的数据集中获得跟踪和细分的最新结果。我们的代码可在github.com/ocetintas/mpntrackseg上找到。
translated by 谷歌翻译
The problem of tracking multiple objects in a video sequence poses several challenging tasks. For tracking-bydetection, these include object re-identification, motion prediction and dealing with occlusions. We present a tracker (without bells and whistles) that accomplishes tracking without specifically targeting any of these tasks, in particular, we perform no training or optimization on tracking data. To this end, we exploit the bounding box regression of an object detector to predict the position of an object in the next frame, thereby converting a detector into a Tracktor. We demonstrate the potential of Tracktor and provide a new state-of-the-art on three multi-object tracking benchmarks by extending it with a straightforward re-identification and camera motion compensation.We then perform an analysis on the performance and failure cases of several state-of-the-art tracking methods in comparison to our Tracktor. Surprisingly, none of the dedicated tracking methods are considerably better in dealing with complex tracking scenarios, namely, small and occluded objects or missing detections. However, our approach tackles most of the easy tracking scenarios. Therefore, we motivate our approach as a new tracking paradigm and point out promising future research directions. Overall, Tracktor yields superior tracking performance than any current tracking method and our analysis exposes remaining and unsolved tracking challenges to inspire future research directions.
translated by 谷歌翻译
多对象跟踪(MOT)的目标是检测和跟踪场景中的所有对象,同时为每个对象保留唯一的标识符。在本文中,我们提出了一种新的可靠的最新跟踪器,该跟踪器可以结合运动和外观信息的优势,以及摄像机运动补偿以及更准确的Kalman滤波器状态矢量。我们的新跟踪器在Mot17和Mot20测试集的Motchallenge [29,11]的数据集[29,11]中,Bot-Sort-Reid排名第一,就所有主要MOT指标而言:MOTA,IDF1和HOTA。对于Mot17:80.5 Mota,80.2 IDF1和65.0 HOTA。源代码和预培训模型可在https://github.com/niraharon/bot-sort上找到
translated by 谷歌翻译
To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings. Yet, camera-based 3D object tracking methods prioritize optimizing the single-camera setup and resort to post-hoc fusion in a multi-camera setup. In this paper, we propose a method for panoramic 3D object tracking, called CC-3DT, that associates and models object trajectories both temporally and across views, and improves the overall tracking consistency. In particular, our method fuses 3D detections from multiple cameras before association, reducing identity switches significantly and improving motion modeling. Our experiments on large-scale driving datasets show that fusion before association leads to a large margin of improvement over post-hoc fusion. We set a new state-of-the-art with 12.6% improvement in average multi-object tracking accuracy (AMOTA) among all camera-based methods on the competitive NuScenes 3D tracking benchmark, outperforming previously published methods by 6.5% in AMOTA with the same 3D detector.
translated by 谷歌翻译
数据关联是遵循逐个检测范式跟踪的任何多个对象跟踪方法(MOT)方法的关键组件。为了生成完整的轨迹,这种方法采用数据关联过程来在每个时间步长期间建立检测和现有目标之间的分配。最近的数据关联方法试图解决多维线性分配任务或网络流量最小化问题,或者要么通过多个假设跟踪解决。但是,在推论过程中,每个序列帧都需要计算最佳分配的优化步骤,并在任何给定的解决方案中添加显着的计算复杂性。为此,在这项工作的背景下,我们介绍了基于变压器的作业决策网络(TADN),该决策网络(TADN)可以解决数据关联,而无需在推理过程中进行任何明确的优化。特别是,TADN可以在网络的单个正向传球中直接推断检测和活动目标之间的分配对。我们已经将TADN整合到了一个相当简单的MOT框架中,我们设计了一种新颖的培训策略,用于有效的端到端培训,并在两个流行的基准上展示了我们在线视觉跟踪MOT的高潜力,即Mot17和Mot17和UA-DETRAC。我们提出的方法在大多数评估指标中的最新方法都优于最先进的方法,尽管它作为跟踪器的简单性质缺乏重要的辅助组件,例如闭塞处理或重新识别。我们的方法的实现可在https://github.com/psaltaath/tadn-mot上公开获得。
translated by 谷歌翻译
多目标多摄像机跟踪(MTMCT)中的数据关联通常从重新识别(RE-ID)特征距离直接估计亲和力。但是,我们认为它可能不是最佳选择,因为匹配范围与MTMCT问题之间的匹配范围差异。重新ID系统专注于全局匹配,从而从所有相机和常规检索目标。相反,跟踪中的数据关联是一个本地匹配问题,因为其候选者仅来自相邻位置和时间框架。在本文中,我们设计实验,以验证全局重新ID功能距离和本地匹配在跟踪中的本地匹配之间的这种错误,并提出了一种简单但有效的方法来适应MTMCT中的相应匹配范围。我们不是尝试处理所有外观变化,而不是在数据关联期间专门调整关联度量来专门化。为此,我们介绍了一种新的数据采样方案,其中包含用于跟踪中的数据关联的时间窗口。自适应亲和模块最小化不匹配,对全局重新ID距离具有显着的改进,并在CityFlow和DukemTMC数据集中生成竞争性能。
translated by 谷歌翻译
3D多对象跟踪(MOT)确保在连续动态检测过程中保持一致性,有利于自动驾驶中随后的运动计划和导航任务。但是,基于摄像头的方法在闭塞情况下受到影响,准确跟踪基于激光雷达的方法的对象的不规则运动可能是具有挑战性的。某些融合方法效果很好,但不认为在遮挡下出现外观特征的不可信问题。同时,错误检测问题也显着影响跟踪。因此,我们根据组合的外观运动优化(Camo-Mot)提出了一种新颖的相机融合3D MOT框架,该框架使用相机和激光镜数据,并大大减少了由遮挡和错误检测引起的跟踪故障。对于遮挡问题,我们是第一个提出遮挡头来有效地选择最佳对象外观的人,从而减少了闭塞的影响。为了减少错误检测在跟踪中的影响,我们根据置信得分设计一个运动成本矩阵,从而提高了3D空间中的定位和对象预测准确性。由于现有的多目标跟踪方法仅考虑一个类别,因此我们还建议建立多类损失,以在多类别场景中实现多目标跟踪。在Kitti和Nuscenes跟踪基准测试上进行了一系列验证实验。我们提出的方法在KITTI测试数据集上的所有多模式MOT方法中实现了最先进的性能和最低的身份开关(IDS)值(CAR为23,行人为137)。并且我们提出的方法在Nuscenes测试数据集上以75.3%的AMOTA进行了所有算法中的最新性能。
translated by 谷歌翻译
The recent trend in multiple object tracking (MOT) is jointly solving detection and tracking, where object detection and appearance feature (or motion) are learned simultaneously. Despite competitive performance, in crowded scenes, joint detection and tracking usually fail to find accurate object associations due to missed or false detections. In this paper, we jointly model counting, detection and re-identification in an end-to-end framework, named CountingMOT, tailored for crowded scenes. By imposing mutual object-count constraints between detection and counting, the CountingMOT tries to find a balance between object detection and crowd density map estimation, which can help it to recover missed detections or reject false detections. Our approach is an attempt to bridge the gap of object detection, counting, and re-Identification. This is in contrast to prior MOT methods that either ignore the crowd density and thus are prone to failure in crowded scenes, or depend on local correlations to build a graphical relationship for matching targets. The proposed MOT tracker can perform online and real-time tracking, and achieves the state-of-the-art results on public benchmarks MOT16 (MOTA of 77.6), MOT17 (MOTA of 78.0%) and MOT20 (MOTA of 70.2%).
translated by 谷歌翻译
Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. We present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That's it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves 67.8% MOTA on the MOT17 challenge at 22 FPS and 89.4% MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves 28.3% AMOTA@0.2 on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.
translated by 谷歌翻译
本文提出了一个自我监督的目标,用于学习表征,将对象定位在遮挡下 - 一种被称为对象永久的属性。一个中心问题是在全部阻塞的情况下选择学习信号。我们没有直接监督看不见的对象的位置,而是提出一个自制的目标,该目标既不需要人类注释,也不需要对对象动态的假设。我们表明,对象永久性可以通过优化内存的时间连贯性来出现:我们沿着马尔可夫沿着记忆的时空图,每个时间步骤中的状态都是序列编码器中的非马克维亚特征。这导致了存储器表示,该内存表示存储遮挡的对象并预测其运动,以更好地本地化。最终的模型在数个复杂性和现实主义的数据集上的现有方法优于现有方法,尽管需要最少的监督,从而广泛适用。
translated by 谷歌翻译
We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios where targets correspond to passengers and their baggage items. We propose a Self-Supervised Learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images. Our SSL approach improves object detection by employing a test-time data augmentation and a regression-based, rotation-invariant pseudo-label refinement technique. Our pseudo-label generation method provides multiple geometrically-transformed images as inputs to a Convolutional Neural Network (CNN), regresses the augmented detections generated by the network to reduce localization errors, and then clusters them using the mean-shift algorithm. The self-supervised detector model is used in a single-camera tracking algorithm to generate temporal identifiers for the targets. Our method also incorporates a multi-view trajectory association mechanism to maintain consistent temporal identifiers as passengers travel across camera views. An evaluation of detection, tracking, and association performances on videos obtained from multiple overhead cameras in a realistic airport checkpoint environment demonstrates the effectiveness of the proposed approach. Our results show that self-supervision improves object detection accuracy by up to $42\%$ without increasing the inference time of the model. Our multi-camera association method achieves up to $89\%$ multi-object tracking accuracy with an average computation time of less than $15$ ms.
translated by 谷歌翻译
长期以来,多对象跟踪中最常见的范式是逐个检测(TBD),首先检测到对象,然后通过视频帧关联。对于关联,大多数模型用于运动和外观提示。尽管仍然依靠这些提示,但最新的方法(例如,注意力)表明对训练数据和整体复杂框架的需求不断增加。我们声称1)如果采用某些关键的设计选择,可以从很少的培训数据中获得强大的提示,2)鉴于这些强大的提示,标准的基于匈牙利匹配的关联足以获得令人印象深刻的结果。我们的主要见解是确定允许标准重新识别网络在基于外观的跟踪方面表现出色的关键组件。我们广泛地分析了其故障案例,并表明我们的外观特征与简单运动模型的结合导致了强大的跟踪结果。我们的模型在MOT17和MOT20数据集上实现了最新的性能,在IDF1中最多可超过5.4pp,在IDF1和HOTA中的4.4pp优于先前的最新跟踪器。我们将在本文接受后发布代码和模型。
translated by 谷歌翻译
Tracking objects over long videos effectively means solving a spectrum of problems, from short-term association for un-occluded objects to long-term association for objects that are occluded and then reappear in the scene. Methods tackling these two tasks are often disjoint and crafted for specific scenarios, and top-performing approaches are often a mix of techniques, which yields engineering-heavy solutions that lack generality. In this work, we question the need for hybrid approaches and introduce SUSHI, a unified and scalable multi-object tracker. Our approach processes long clips by splitting them into a hierarchy of subclips, which enables high scalability. We leverage graph neural networks to process all levels of the hierarchy, which makes our model unified across temporal scales and highly general. As a result, we obtain significant improvements over state-of-the-art on four diverse datasets. Our code and models will be made available.
translated by 谷歌翻译
由于卷积神经网络(CNN)在过去的十年中检测成功,多对象跟踪(MOT)通过检测方法的使用来控制。随着数据集和基础标记网站的发布,研究方向已转向在跟踪时在包括重新识别对象的通用场景(包括重新识别(REID))上的最佳准确性。在这项研究中,我们通过提供专用的行人数据集并专注于对性能良好的多对象跟踪器的深入分析来缩小监视的范围)现实世界应用的技术。为此,我们介绍SOMPT22数据集;一套新的,用于多人跟踪的新套装,带有带注释的简短视频,该视频从位于杆子上的静态摄像头捕获,高度为6-8米,用于城市监视。与公共MOT数据集相比,这提供了室外监视的MOT的更为集中和具体的基准。我们分析了该新数据集上检测和REID网络的使用方式,分析了将MOT跟踪器分类为单发和两阶段。我们新数据集的实验结果表明,SOTA远非高效率,而单一跟踪器是统一快速执行和准确性的良好候选者,并具有竞争性的性能。该数据集将在以下网址提供:sompt22.github.io
translated by 谷歌翻译
近年来,多个对象跟踪引起了研究人员的极大兴趣,它已成为计算机视觉中的趋势问题之一,尤其是随着自动驾驶的最新发展。 MOT是针对不同问题的关键视觉任务之一,例如拥挤的场景中的闭塞,相似的外观,小物体检测难度,ID切换等,以应对这些挑战,因为研究人员试图利用变压器的注意力机制,与田径的相互关系,与田径的相互关系,图形卷积神经网络,与暹罗网络不同帧中对象的外观相似性,他们还尝试了基于IOU匹配的CNN网络,使用LSTM的运动预测。为了将这些零散的技术在雨伞下采用,我们研究了过去三年发表的一百多篇论文,并试图提取近代研究人员更关注的技术来解决MOT的问题。我们已经征集了许多应用,可能性以及MOT如何与现实生活有关。我们的评论试图展示研究人员使用过时的技术的不同观点,并为潜在的研究人员提供了一些未来的方向。此外,我们在这篇评论中包括了流行的基准数据集和指标。
translated by 谷歌翻译
This paper explores a pragmatic approach to multiple object tracking where the main focus is to associate objects efficiently for online and realtime applications. To this end, detection quality is identified as a key factor influencing tracking performance, where changing the detector can improve tracking by up to 18.9%. Despite only using a rudimentary combination of familiar techniques such as the Kalman Filter and Hungarian algorithm for the tracking components, this approach achieves an accuracy comparable to state-of-the-art online trackers. Furthermore, due to the simplicity of our tracking method, the tracker updates at a rate of 260 Hz which is over 20x faster than other state-of-the-art trackers.
translated by 谷歌翻译
我们提出了一种新的方法,用于从室内环境中的RGB-D序列进行连接3D多对象跟踪和重建。为此,我们在每个帧中检测并重建对象,同时预测密集的对应关系映射到归一化对象空间中。我们利用这些对应关系来告知图神经网络,以解决所有对象的最佳,时间一致的7-DOF姿势轨迹。我们方法的新颖性是两个方面:首先,我们提出了一种基于图的新方法,用于随着时间的流逝而进行区分姿势估计,以学习最佳的姿势轨迹。其次,我们提出了沿时间轴的重建和姿势估计的联合公式,以实现健壮和几何一致的多对象跟踪。为了验证我们的方法,我们引入了一个新的合成数据集,其中包含2381个唯一室内序列,总共有60k渲染的RGB-D图像,用于多对象跟踪,并带有移动对象和来自合成3D-Front数据集的相机位置。我们证明,与现有最新方法相比,我们的方法将所有测试序列的累积MOTA得分提高了24.8%。在关于合成和现实世界序列的几个消融中,我们表明我们的基于图的完全端到端学习方法可以显着提高跟踪性能。
translated by 谷歌翻译
为了克服多个对象跟踪任务中的挑战,最近的算法将交互线索与运动和外观特征一起使用。这些算法使用图形神经网络或变压器来提取导致高计算成本的交互功能。在本文中,提出了一种基于几何特征的新型交互提示,旨在检测遮挡和重新识别计算成本低的丢失目标。此外,在大多数算法中,摄像机运动被认为可以忽略不计,这是一个强有力的假设,并不总是正确的,并且导致目标转换或目标不匹配。在本文中,提出了一种测量相机运动和删除其效果的方法,可有效地降低相机运动对跟踪的影响。该算法在MOT17和MOT20数据集上进行了评估,并在MOT20上实现了MOT17的最先进性能和可比较的结果。该代码也可以公开使用。
translated by 谷歌翻译