Accomplishing safe and efficient driving is one of the predominant challenges in the controller design of connected automated vehicles (CAVs). It is often more convenient to address these goals separately and integrate the resulting controllers. In this study, we propose a controller integration scheme to fuse performance-based controllers and safety-oriented controllers safely for the longitudinal motion of a CAV. The resulting structure is compatible with a large class of controllers, and offers flexibility to design each controller individually without affecting the performance of the others. We implement the proposed safe integration scheme on a connected automated truck using an optimal-in-energy controller and a safety-oriented connected cruise controller. We validate the premise of the safe integration through experiments with a full-scale truck in two scenarios: a controlled experiment on a test track and a real-world experiment on a public highway. In both scenarios, we achieve energy efficient driving without violating safety.
translated by 谷歌翻译
Video Question Answering methods focus on commonsense reasoning and visual cognition of objects or persons and their interactions over time. Current VideoQA approaches ignore the textual information present in the video. Instead, we argue that textual information is complementary to the action and provides essential contextualisation cues to the reasoning process. To this end, we propose a novel VideoQA task that requires reading and understanding the text in the video. To explore this direction, we focus on news videos and require QA systems to comprehend and answer questions about the topics presented by combining visual and textual cues in the video. We introduce the ``NewsVideoQA'' dataset that comprises more than $8,600$ QA pairs on $3,000+$ news videos obtained from diverse news channels from around the world. We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods.
translated by 谷歌翻译
在本报告中,我们展示了ICDAR 2021版文档视觉问题挑战的结果。此版本在单个文档VQA和Document Collection VQA上补充了以前的任务,并在Infographics VQA上进行了新引入的。信息图表VQA基于超过5,000个信息图表图像和30,000个问题答案对的新数据集。获胜者方法在Infographics VQA任务中获得了0.6120个ANL,0.7743 anlsl在文档集中的VQA任务和单个文档VQA中的0.8705 ANL中。我们展示了用于每个任务的数据集的摘要,每个提交的方法的描述以及它们的性能的结果和分析。由于还提出了自从第一版DocVQA 2020挑战以来在单个文档VQA上取得的摘要。
translated by 谷歌翻译
在智能手机和控制器系统中的爆炸性增长之后,在从集中数据朝向设备生成的数据中消除数据如何生成数据的加速偏移。作为响应,机器学习算法正在适于在本地运行,潜在的硬件有限,设备,以改善用户隐私,减少延迟并更节能。但是,我们对这些方向算法的表现方式和应培训的理解仍然相当有限。为了解决这个问题,介绍了一种方法来自动综合降低的神经网络(具有较少的神经元)近似近似较大的输入/输出映射。从凸的半定程序生成降低的神经网络的权重和偏差,该凸形半定程序产生相对于较大网络的最坏情况近似误差。获得该近似误差的最坏情况界限,并且该方法可以应用于各种神经网络架构。例如,如何区分所提出的方法来产生小型神经网络的现有方法。修剪是在训练成本函数中直接包含最坏情况近似误差,这应该增加鲁棒性。数值示例突出了所提出的方法的潜力。本文的重新实现目的是概括最近导致神经网络对其重量和偏差的鲁棒合成问题的鲁棒性分析。
translated by 谷歌翻译
There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .
translated by 谷歌翻译
The purpose of this work was to tackle practical issues which arise when using a tendon-driven robotic manipulator with a long, passive, flexible proximal section in medical applications. A separable robot which overcomes difficulties in actuation and sterilization is introduced, in which the body containing the electronics is reusable and the remainder is disposable. A control input which resolves the redundancy in the kinematics and a physical interpretation of this redundancy are provided. The effect of a static change in the proximal section angle on bending angle error was explored under four testing conditions for a sinusoidal input. Bending angle error increased for increasing proximal section angle for all testing conditions with an average error reduction of 41.48% for retension, 4.28% for hysteresis, and 52.35% for re-tension + hysteresis compensation relative to the baseline case. Two major sources of error in tracking the bending angle were identified: time delay from hysteresis and DC offset from the proximal section angle. Examination of these error sources revealed that the simple hysteresis compensation was most effective for removing time delay and re-tension compensation for removing DC offset, which was the primary source of increasing error. The re-tension compensation was also tested for dynamic changes in the proximal section and reduced error in the final configuration of the tip by 89.14% relative to the baseline case.
translated by 谷歌翻译
Compliance in actuation has been exploited to generate highly dynamic maneuvers such as throwing that take advantage of the potential energy stored in joint springs. However, the energy storage and release could not be well-timed yet. On the contrary, for multi-link systems, the natural system dynamics might even work against the actual goal. With the introduction of variable stiffness actuators, this problem has been partially addressed. With a suitable optimal control strategy, the approximate decoupling of the motor from the link can be achieved to maximize the energy transfer into the distal link prior to launch. However, such continuous stiffness variation is complex and typically leads to oscillatory swing-up motions instead of clear launch sequences. To circumvent this issue, we investigate decoupling for speed maximization with a dedicated novel actuator concept denoted Bi-Stiffness Actuation. With this, it is possible to fully decouple the link from the joint mechanism by a switch-and-hold clutch and simultaneously keep the elastic energy stored. We show that with this novel paradigm, it is not only possible to reach the same optimal performance as with power-equivalent variable stiffness actuation, but even directly control the energy transfer timing. This is a major step forward compared to previous optimal control approaches, which rely on optimizing the full time-series control input.
translated by 谷歌翻译
The previous fine-grained datasets mainly focus on classification and are often captured in a controlled setup, with the camera focusing on the objects. We introduce the first Fine-Grained Vehicle Detection (FGVD) dataset in the wild, captured from a moving camera mounted on a car. It contains 5502 scene images with 210 unique fine-grained labels of multiple vehicle types organized in a three-level hierarchy. While previous classification datasets also include makes for different kinds of cars, the FGVD dataset introduces new class labels for categorizing two-wheelers, autorickshaws, and trucks. The FGVD dataset is challenging as it has vehicles in complex traffic scenarios with intra-class and inter-class variations in types, scale, pose, occlusion, and lighting conditions. The current object detectors like yolov5 and faster RCNN perform poorly on our dataset due to a lack of hierarchical modeling. Along with providing baseline results for existing object detectors on FGVD Dataset, we also present the results of a combination of an existing detector and the recent Hierarchical Residual Network (HRN) classifier for the FGVD task. Finally, we show that FGVD vehicle images are the most challenging to classify among the fine-grained datasets.
translated by 谷歌翻译
The task of reconstructing 3D human motion has wideranging applications. The gold standard Motion capture (MoCap) systems are accurate but inaccessible to the general public due to their cost, hardware and space constraints. In contrast, monocular human mesh recovery (HMR) methods are much more accessible than MoCap as they take single-view videos as inputs. Replacing the multi-view Mo- Cap systems with a monocular HMR method would break the current barriers to collecting accurate 3D motion thus making exciting applications like motion analysis and motiondriven animation accessible to the general public. However, performance of existing HMR methods degrade when the video contains challenging and dynamic motion that is not in existing MoCap datasets used for training. This reduces its appeal as dynamic motion is frequently the target in 3D motion recovery in the aforementioned applications. Our study aims to bridge the gap between monocular HMR and multi-view MoCap systems by leveraging information shared across multiple video instances of the same action. We introduce the Neural Motion (NeMo) field. It is optimized to represent the underlying 3D motions across a set of videos of the same action. Empirically, we show that NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection. To further validate NeMo using 3D metrics, we collected a small MoCap dataset mimicking actions in Penn Action,and show that NeMo achieves better 3D reconstruction compared to various baselines.
translated by 谷歌翻译
Rigorous guarantees about the performance of predictive algorithms are necessary in order to ensure their responsible use. Previous work has largely focused on bounding the expected loss of a predictor, but this is not sufficient in many risk-sensitive applications where the distribution of errors is important. In this work, we propose a flexible framework to produce a family of bounds on quantiles of the loss distribution incurred by a predictor. Our method takes advantage of the order statistics of the observed loss values rather than relying on the sample mean alone. We show that a quantile is an informative way of quantifying predictive performance, and that our framework applies to a variety of quantile-based metrics, each targeting important subsets of the data distribution. We analyze the theoretical properties of our proposed method and demonstrate its ability to rigorously control loss quantiles on several real-world datasets.
translated by 谷歌翻译