通过脑电图信号的情绪分类取得了许多进步。但是,诸如缺乏数据和学习重要特征和模式之类的问题始终是具有在计算和预测准确性方面改进的领域。这项工作分析了基线机器学习分类器在DEAP数据集上的性能以及一种表格学习方法,该方法提供了最新的可比结果,从而利用了性能提升,这是由于其深度学习架构而无需部署重型神经网络。
translated by 谷歌翻译
A major goal of multimodal research is to improve machine understanding of images and text. Tasks include image captioning, text-to-image generation, and vision-language representation learning. So far, research has focused on the relationships between images and text. For example, captioning models attempt to understand the semantics of images which are then transformed into text. An important question is: which annotation reflects best a deep understanding of image content? Similarly, given a text, what is the best image that can present the semantics of the text? In this work, we argue that the best text or caption for a given image is the text which would generate the image which is the most similar to that image. Likewise, the best image for a given text is the image that results in the caption which is best aligned with the original text. To this end, we propose a unified framework that includes both a text-to-image generative model and an image-to-text generative model. Extensive experiments validate our approach.
translated by 谷歌翻译
Speech-centric machine learning systems have revolutionized many leading domains ranging from transportation and healthcare to education and defense, profoundly changing how people live, work, and interact with each other. However, recent studies have demonstrated that many speech-centric ML systems may need to be considered more trustworthy for broader deployment. Specifically, concerns over privacy breaches, discriminating performance, and vulnerability to adversarial attacks have all been discovered in ML research fields. In order to address the above challenges and risks, a significant number of efforts have been made to ensure these ML systems are trustworthy, especially private, safe, and fair. In this paper, we conduct the first comprehensive survey on speech-centric trustworthy ML topics related to privacy, safety, and fairness. In addition to serving as a summary report for the research community, we point out several promising future research directions to inspire the researchers who wish to explore further in this area.
translated by 谷歌翻译
This work presents a physics-informed deep learning-based super-resolution framework to enhance the spatio-temporal resolution of the solution of time-dependent partial differential equations (PDE). Prior works on deep learning-based super-resolution models have shown promise in accelerating engineering design by reducing the computational expense of traditional numerical schemes. However, these models heavily rely on the availability of high-resolution (HR) labeled data needed during training. In this work, we propose a physics-informed deep learning-based framework to enhance the spatial and temporal resolution of coarse-scale (both in space and time) PDE solutions without requiring any HR data. The framework consists of two trainable modules independently super-resolving the PDE solution, first in spatial and then in temporal direction. The physics based losses are implemented in a novel way to ensure tight coupling between the spatio-temporally refined outputs at different times and improve framework accuracy. We analyze the capability of the developed framework by investigating its performance on an elastodynamics problem. It is observed that the proposed framework can successfully super-resolve (both in space and time) the low-resolution PDE solutions while satisfying physics-based constraints and yielding high accuracy. Furthermore, the analysis and obtained speed-up show that the proposed framework is well-suited for integration with traditional numerical methods to reduce computational complexity during engineering design.
translated by 谷歌翻译
We introduce Action-GPT, a plug and play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. Our experiments show qualitative and quantitative improvement in the quality of synthesized motions produced by recent text-to-motion models. Code, pretrained models and sample videos will be made available at https://actiongpt.github.io
translated by 谷歌翻译
在过去的几年中,多方计算(MPC)作为安全计算模型一直在越来越受欢迎,尤其是对于机器学习(ML)推断。与竞争对手相比,MPC的开销少于同构加密(HE),并且比基于硬件的可信执行环境(TEE)(例如Intel SGX)具有更强的威胁模型。尽管具有明显的优势,但在应用于ML算法时,MPC协议仍然与针对性相比,仍要支付大量的绩效罚款。开销是由于增加的计算和通信成本。对于在ML算法中无处不在的乘法,MPC协议在MPC服务器之间增加了32x更多的计算成本和1轮广播。此外,由于SoftMax,Relu和其他非线性操作,其具有微不足道的成本的ML计算由于增加了沟通而变得非常昂贵。这些添加的开销使MPC不太适合在实时ML推理框架(例如语音翻译)中部署。在这项工作中,我们提出了MPC-Pipe,这是一种使用两种ML特异性方法的MPC管道推理技术。 1)内线间管道和2)内层管道。这两种技术缩短了机器学习模型的总推理运行时。与当前的MPC协议实现相比,当模型权重公开时,我们的实验已显示可将ML推断潜伏期降低多达12.6%,而在模型权重公开时,将ML推断潜伏期最高12.6%。
translated by 谷歌翻译
室内运动计划的重点是解决通过混乱环境导航代理的问题。迄今为止,在该领域已经完成了很多工作,但是这些方法通常无法找到计算廉价的在线路径计划和路径最佳之间的最佳平衡。除此之外,这些作品通常证明是单一启动单目标世界的最佳性。为了应对这些挑战,我们为在未知室内环境中进行导航的多个路径路径计划者和控制器堆栈,在该环境中,路点将目标与机器人必须在达到目标之前必须穿越的中介点一起。我们的方法利用全球规划师(在任何瞬间找到下一个最佳航路点),本地规划师(计划通往特定航路点的路径)以及自适应模型预测性控制策略(用于强大的系统控制和更快的操作) 。我们在一组随机生成的障碍图,中间航路点和起始目标对上评估了算法,结果表明计算成本显着降低,具有高度准确性和可靠的控制。
translated by 谷歌翻译
语义3D场景理解是机器人技术至关重要的问题。尽管在空间感知方面已经取得了重大进展,但机器人仍然远非对普通人的家庭对象和位置具有常识性知识。因此,我们研究了大型语言模型来传授常识以进行场景理解。具体来说,我们介绍了三个范式,用于利用语言根据其包含的对象在室内环境中分类房间:(i)零摄像的方法,(ii)馈送前向分类器方法,以及(iii)对比分类器方法。这些方法在现代空间感知系统产生的3D场景图上运行。然后,我们分析了每种方法,证明了由于使用语言而引起的显着零拍概括和传递功能。最后,我们表明这些方法还适用于从包含房间中推断建筑标签,并在真实环境中演示我们的零弹方法。所有代码均可在https://github.com/mit-spark/llm_scene_understanding上找到。
translated by 谷歌翻译
因果推理提供了一种语言,以提出纯粹统计关联以外的重要介入和反事实问题。例如,在医学成像中,我们可能希望研究遗传,环境或生活方式因素对解剖表型正常和病理变异的因果关系。但是,尽管可以可靠地构建从自动图像分割中提取的3D表面网格的解剖形状模型,但缺乏计算工具来实现有关形态变化的因果推理。为了解决这个问题,我们提出了深层结构性因果形状模型(CSM),该模型利用了高质量的网格生成技术,从几何深度学习,在深层结构性因果模型的表达框架内。 CSM可以通过反事实网格产生来实现特定于受试者的预后(“如果患者大十岁,该患者的大脑结构将如何变化?”),这与大多数当前有关纯粹人口级统计形状建模的作品形成鲜明对比。我们通过许多定性和定量实验利用了3D脑结构的大数据集,证明了Pearl因果关系层次结构的所有级别CSM的能力。
translated by 谷歌翻译
最近的基于变压器的离线视频实例细分(VIS)方法取得了令人鼓舞的结果,并明显胜过在线方法。但是,它们对整个视频的依赖以及由全时空的注意力引起的巨大计算复杂性限制了它们在现实生活中的应用中,例如处理冗长的视频。在本文中,我们提出了一个基于单级变压器的高效在线VIS框架,名为InstanceFormer,该框架特别适合长期挑战性的视频。我们提出了三个新的组件来建模短期和长期依赖性和时间连贯性。首先,我们传播了对短期更改建模的先前实例的表示形式,位置和语义信息。其次,我们在解码器中提出了一种新颖的记忆交叉注意,该记忆使网络可以在某个时间窗口内研究早期实例。最后,我们采用时间对比度损失,在所有框架的实例表示中施加连贯性。记忆注意力和时间连贯性特别有益于远程依赖建模,包括诸如遮挡等挑战的情况。所提出的实例形式优于以前的在线基准方法在多个数据集上的较大边距。最重要的是,InstanceFormer超过了挑战和长数据集(例如YouTube-Vis-2021和OVIS)的离线方法。代码可从https://github.com/rajatkoner08/instanceformer获得。
translated by 谷歌翻译