Video, as a key driver in the global explosion of digital information, can create tremendous benefits for human society. Governments and enterprises are deploying innumerable cameras for a variety of applications, e.g., law enforcement, emergency management, traffic control, and security surveillance, all facilitated by video analytics (VA). This trend is spurred by the rapid advancement of deep learning (DL), which enables more precise models for object classification, detection, and tracking. Meanwhile, with the proliferation of Internet-connected devices, massive amounts of data are generated daily, overwhelming the cloud. Edge computing, an emerging paradigm that moves workloads and services from the network core to the network edge, has been widely recognized as a promising solution. The resulting new intersection, edge video analytics (EVA), begins to attract widespread attention. Nevertheless, only a few loosely-related surveys exist on this topic. A dedicated venue for collecting and summarizing the latest advances of EVA is highly desired by the community. Besides, the basic concepts of EVA (e.g., definition, architectures, etc.) are ambiguous and neglected by these surveys due to the rapid development of this domain. A thorough clarification is needed to facilitate a consensus on these concepts. To fill in these gaps, we conduct a comprehensive survey of the recent efforts on EVA. In this paper, we first review the fundamentals of edge computing, followed by an overview of VA. The EVA system and its enabling techniques are discussed next. In addition, we introduce prevalent frameworks and datasets to aid future researchers in the development of EVA systems. Finally, we discuss existing challenges and foresee future research directions. We believe this survey will help readers comprehend the relationship between VA and edge computing, and spark new ideas on EVA.
translated by 谷歌翻译
应用于设备的射频指纹〜(RFF)的深度学习(DL)由于其非凡的分类性能而引起了物理层认证的极大关注。传统的DL-RFF技术通过采用最大似然估计〜(MLE)训练,倾向于过度拟合培训数据集中嵌入的通道统计信息。这限制了他们的实际应用,因为收集足够的培训数据来捕获所有可能的无线渠道环境的特征是具有挑战性的。为了应对这一挑战,我们提出了一个DL表示的DL框架学习〜(DRL),该框架首先学会通过对抗学习将输入信号分解为相关的组件和设备 - iRretrelevant组件。然后,它通过在给定的培训数据集中洗牌以训练后续的RFF提取器来综合一组增强信号。所提出的框架中的隐式数据增强在RFF提取器上实施了正则化,以避免在不收集未知通道的其他数据的情况下,可能会过度拟合设备 - IRRELELERVENT的通道统计。实验验证了所提出的方法,称为DR-RFF,就不明复杂的传播环境的普遍性而言,均优于常规方法,例如,即使所有训练数据都在简单的直接线上收集,即使所有训练数据都收集到分散多径褪色通道,即使 - 见面〜(LOS)传播路径。
translated by 谷歌翻译
在监控和搜索和救援应用程序中,重要的是在低端设备上实时执行多目标跟踪(MOT)。今天的MOT解决方案采用深度神经网络,往往具有高计算复杂性。识别帧大小对跟踪性能的影响,我们提出了深度,一种模型不可知框架尺寸选择方法,可在现有的全卷积网络基跟踪器之上进行操作,以加速跟踪吞吐量。在培训阶段,我们将可检测性分数纳入单次跟踪器架构,使得DeepScale以自我监督的方式学习不同帧大小的表示估计。在推理期间,它可以根据基于用户控制参数根据视觉内容的复杂性来调整帧大小。为了利用边缘服务器上的计算资源,我们提出了两个计算分区模式,即仅使用自适应帧大小传输和边缘服务器辅助跟踪仅适用于MOT,即边缘服务器。 MOT数据集的广泛实验和基准测试证明了深度的有效性和灵活性。与最先进的追踪器相比,DeepScale ++,DeepScale的变种实现1.57倍加速,仅在一个配置中的MOT15数据集上跟踪准确性。我们已经实现和评估了DeepScale ++,以及由NVIDIA JETSON TX2板和GPU服务器组成的小型测试平台上所提出的计算分区方案。实验显示与仅服务器或智能相机的解决方案相比跟踪性能和延迟之间的非琐碎权衡。
translated by 谷歌翻译
The most useful data mining primitives are distance measures. With an effective distance measure, it is possible to perform classification, clustering, anomaly detection, segmentation, etc. For single-event time series Euclidean Distance and Dynamic Time Warping distance are known to be extremely effective. However, for time series containing cyclical behaviors, the semantic meaningfulness of such comparisons is less clear. For example, on two separate days the telemetry from an athlete workout routine might be very similar. The second day may change the order in of performing push-ups and squats, adding repetitions of pull-ups, or completely omitting dumbbell curls. Any of these minor changes would defeat existing time series distance measures. Some bag-of-features methods have been proposed to address this problem, but we argue that in many cases, similarity is intimately tied to the shapes of subsequences within these longer time series. In such cases, summative features will lack discrimination ability. In this work we introduce PRCIS, which stands for Pattern Representation Comparison in Series. PRCIS is a distance measure for long time series, which exploits recent progress in our ability to summarize time series with dictionaries. We will demonstrate the utility of our ideas on diverse tasks and datasets.
translated by 谷歌翻译
Most existing distillation methods ignore the flexible role of the temperature in the loss function and fix it as a hyper-parameter that can be decided by an inefficient grid search. In general, the temperature controls the discrepancy between two distributions and can faithfully determine the difficulty level of the distillation task. Keeping a constant temperature, i.e., a fixed level of task difficulty, is usually sub-optimal for a growing student during its progressive learning stages. In this paper, we propose a simple curriculum-based technique, termed Curriculum Temperature for Knowledge Distillation (CTKD), which controls the task difficulty level during the student's learning career through a dynamic and learnable temperature. Specifically, following an easy-to-hard curriculum, we gradually increase the distillation loss w.r.t. the temperature, leading to increased distillation difficulty in an adversarial manner. As an easy-to-use plug-in technique, CTKD can be seamlessly integrated into existing knowledge distillation frameworks and brings general improvements at a negligible additional computation cost. Extensive experiments on CIFAR-100, ImageNet-2012, and MS-COCO demonstrate the effectiveness of our method. Our code is available at https://github.com/zhengli97/CTKD.
translated by 谷歌翻译
Neural architectures can be naturally viewed as computational graphs. Motivated by this perspective, we, in this paper, study neural architecture search (NAS) through the lens of learning random graph models. In contrast to existing NAS methods which largely focus on searching for a single best architecture, i.e, point estimation, we propose GraphPNAS a deep graph generative model that learns a distribution of well-performing architectures. Relying on graph neural networks (GNNs), our GraphPNAS can better capture topologies of good neural architectures and relations between operators therein. Moreover, our graph generator leads to a learnable probabilistic search method that is more flexible and efficient than the commonly used RNN generator and random search methods. Finally, we learn our generator via an efficient reinforcement learning formulation for NAS. To assess the effectiveness of our GraphPNAS, we conduct extensive experiments on three search spaces, including the challenging RandWire on TinyImageNet, ENAS on CIFAR10, and NAS-Bench-101/201. The complexity of RandWire is significantly larger than other search spaces in the literature. We show that our proposed graph generator consistently outperforms RNN-based one and achieves better or comparable performances than state-of-the-art NAS methods.
translated by 谷歌翻译
Non-IID data distribution across clients and poisoning attacks are two main challenges in real-world federated learning systems. While both of them have attracted great research interest with specific strategies developed, no known solution manages to address them in a unified framework. To jointly overcome both challenges, we propose SmartFL, a generic approach that optimizes the server-side aggregation process with a small clean server-collected proxy dataset (e.g., around one hundred samples, 0.2% of the dataset) via a subspace training technique. Specifically, the aggregation weight of each participating client at each round is optimized using the server-collected proxy data, which is essentially the optimization of the global model in the convex hull spanned by client models. Since at each round, the number of tunable parameters optimized on the server side equals the number of participating clients (thus independent of the model size), we are able to train a global model with massive parameters using only a small amount of proxy data. We provide theoretical analyses of the convergence and generalization capacity for SmartFL. Empirically, SmartFL achieves state-of-the-art performance on both federated learning with non-IID data distribution and federated learning with malicious clients. The source code will be released.
translated by 谷歌翻译
Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes given a speech example and its transcription. By learning to reconstruct the masked parts of the input in different languages, our model shows great improvements over speaker-embedding-based multi-speaker TTS methods. Moreover, our framework is end-to-end for both the training and the inference without any finetuning effort. In cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing tasks, our experiments show that our model outperforms speaker-embedding-based multi-speaker TTS methods. The code and model are publicly available at PaddleSpeech.
translated by 谷歌翻译
从嘈杂的点云中恢复高质量的表面,称为点云降级,是几何处理中的一个基本而又具有挑战性的问题。大多数现有方法要么直接将嘈杂的输入或过滤器原始正态变为更新点位置。由点云降解和正常过滤之间的基本相互作用的动机,我们从多任务的角度重新访问点云,并提出一个名为PCDNF的端到端网络,以通过关节正常滤波来denoise点云。特别是,我们引入了一项辅助正常过滤任务,以帮助整体网络更有效地消除噪声,同时更准确地保留几何特征。除了整体体系结构外,我们的网络还具有两个新型模块。一方面,为了提高降噪性能,我们设计了一种形状感知的选择器,以全面考虑学习点,正常特征和几何学先验,以构建特定点的潜在切线空间表示。另一方面,点特征更适合描述几何细节,正常特征更有利于表示几何结构(例如,边缘和角落)。结合点和正常特征使我们能够克服它们的弱点。因此,我们设计一个功能改进模块,以融合点和正常功能,以更好地恢复几何信息。广泛的评估,比较和消融研究表明,所提出的方法在点云降解和正常过滤方面优于最先进的方法。
translated by 谷歌翻译
射血分数(EF)是心脏功能的关键指标,可以鉴定患有心脏失败等心脏功能障碍的患者。通过手动追踪左心室并估算其在某些帧上的体积,可以从被称为超声心动图(ECHO)的心脏超声视频估计。由于手动过程和视频质量的变化,这些估计表现出很高的观察者间变异性。这种不准确的来源和对快速评估的需求需要可靠且可解释的机器学习技术。在这项工作中,我们介绍了基于图神经网络(GNN)的模型Echognn,以从Echo视频中估算EF。我们的模型首先从一个或多个Echo Cine系列的框架中输入潜在的回声图。然后,它估计了该图的节点和边缘的权重,表明各个框架的重要性有助于EF估计。 GNN回归器使用此加权图来预测EF。我们在定性和定量上表明,学到的图形权重通过识别临界帧进行EF估计提供了解释性,可用于确定何时需要人类干预。在Echonet-Dynamic公共EF数据集上,ECHOGNN实现了与最新状态相当的EF预测性能,并提供了解释性,鉴于此任务中固有的高观察者可变异性至关重要。
translated by 谷歌翻译