Deep Neural Networks (DNN) are increasingly used as components of larger software systems that need to process complex data, such as images, written texts, audio/video signals. DNN predictions cannot be assumed to be always correct for several reasons, among which the huge input space that is dealt with, the ambiguity of some inputs data, as well as the intrinsic properties of learning algorithms, which can provide only statistical warranties. Hence, developers have to cope with some residual error probability. An architectural pattern commonly adopted to manage failure-prone components is the supervisor, an additional component that can estimate the reliability of the predictions made by untrusted (e.g., DNN) components and can activate an automated healing procedure when these are likely to fail, ensuring that the Deep Learning based System (DLS) does not cause damages, despite its main functionality being suspended. In this paper, we consider DLS that implement a supervisor by means of uncertainty estimation. After overviewing the main approaches to uncertainty estimation and discussing their pros and cons, we motivate the need for a specific empirical assessment method that can deal with the experimental setting in which supervisors are used, where the accuracy of the DNN matters only as long as the supervisor lets the DLS continue to operate. Then we present a large empirical study conducted to compare the alternative approaches to uncertainty estimation. We distilled a set of guidelines for developers that are useful to incorporate a supervisor based on uncertainty monitoring into a DLS.
translated by 谷歌翻译
在本文中,我们将预处理技术应用于具有不同长度的多通道时间序列数据,我们称之为对齐问题,用于下游机器学习。多种原因可能发生多种渠道时间序列数据的未对准,原因有多种原因,例如丢失的数据,变化的采样率或不一致的收集时间。我们考虑从MIT SuperCloud高性能计算(HPC)中心收集的多渠道时间序列数据,其中不同的工作开始时间和HPC作业的运行时间不同,导致数据不对准。这种未对准使得为计算工作负载分类等任务构建AI/ML方法具有挑战性。在先前使用MIT SuperCloud数据集的监督分类工作的基础上,我们通过三种宽阔的低间接空间方法解决了对齐问题:从全职系列中抽样固定子集,在全职系列上执行摘要统计信息,并对系数进行取样。从映射到频域的时间序列。我们最佳性能模型的分类精度大于95%,以先前的方法对MIT SuperCloud数据集的多通道时间序列分类的表现优于5%。这些结果表明,我们的低间接费用方法与标准机器学习技术结合使用,能够达到高水平的分类准确性,并作为解决对齐问题(例如内核方法)的未来方法的基准。
translated by 谷歌翻译
在复杂的问题上,可以使用非常大规模的模型来实现深神网络(DNN)的最佳预测准确性,包括数十亿个参数。这样的模型只能在专用服务器上运行,该服务器通常由第三方服务提供,这会导致每个预测的巨额货币成本。我们为客户端应用程序提出了一种新的软件体系结构,该软件架构与远程大规模型号一起使用,旨在以可忽略的货币成本在本地进行简单的预测,同时仍然利用大型模型的好处来挑战性投入。。在概念证明中,我们将预测成本降低了多达50%,而不会对系统的准确性产生负面影响。
translated by 谷歌翻译
深度神经网络(DNNS)已成为现代软件系统的关键组成部分,但是在与训练期间观察到的条件不同的条件下,它们很容易失败,或者对真正模棱两可的输入,即。 ,在其地面真实标签中接受多个类别的多个类别的输入。最近的工作提出了DNN主管在可能的错误分类之前检测高确定性输入会导致任何伤害。为了测试和比较DNN主管的能力,研究人员提出了测试生成技术,将测试工作集中在高度确定性输入上,这些输入应被主管识别为异常。但是,现有的测试发电机只能产生分布式输入。没有现有的模型和主管与无关的技术支持真正模棱两可的测试输入。在本文中,我们提出了一种新的方法来生成模棱两可的输入来测试DNN主管,并将其用于比较几种现有的主管技术。特别是,我们建议歧义生成图像分类问题的模棱两可的样本。模棱两可的基于正规化对抗自动编码器的潜在空间中的梯度引导采样。此外,据我们所知,我们进行了最广泛的DNN主管比较研究,考虑到它们可以检测到4种不同类型的高级输入(包括真正模棱两可的)的能力。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
The visual dimension of cities has been a fundamental subject in urban studies, since the pioneering work of scholars such as Sitte, Lynch, Arnheim, and Jacobs. Several decades later, big data and artificial intelligence (AI) are revolutionizing how people move, sense, and interact with cities. This paper reviews the literature on the appearance and function of cities to illustrate how visual information has been used to understand them. A conceptual framework, Urban Visual Intelligence, is introduced to systematically elaborate on how new image data sources and AI techniques are reshaping the way researchers perceive and measure cities, enabling the study of the physical environment and its interactions with socioeconomic environments at various scales. The paper argues that these new approaches enable researchers to revisit the classic urban theories and themes, and potentially help cities create environments that are more in line with human behaviors and aspirations in the digital age.
translated by 谷歌翻译
Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
translated by 谷歌翻译