The Transformer is an extremely powerful and prominent deep learning architecture. In this work, we challenge the commonly held belief in deep learning that going deeper is better, and show an alternative design approach that is building wider attention Transformers. We demonstrate that wide single layer Transformer models can compete with or outperform deeper ones in a variety of Natural Language Processing (NLP) tasks when both are trained from scratch. The impact of changing the model aspect ratio on Transformers is then studied systematically. This ratio balances the number of layers and the number of attention heads per layer while keeping the total number of attention heads and all other hyperparameters constant. On average, across 4 NLP tasks and 10 attention types, single layer wide models perform 0.3% better than their deep counterparts. We show an in-depth evaluation and demonstrate how wide models require a far smaller memory footprint and can run faster on commodity hardware, in addition, these wider models are also more interpretable. For example, a single layer Transformer on the IMDb byte level text classification has 3.1x faster inference latency on a CPU than its equally accurate deeper counterpart, and is half the size. We therefore put forward wider and shallower models as a viable and desirable alternative for small models on NLP tasks, and as an important area of research for domains beyond this.
translated by 谷歌翻译
We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines. Drawing on the history of tangram puzzles as stimuli in cognitive science, we build a richly annotated dataset that, with >1k distinct stimuli, is orders of magnitude larger and more diverse than prior resources. It is both visually and linguistically richer, moving beyond whole shape descriptions to include segmentation maps and part labels. We use this resource to evaluate the abstract visual reasoning capacities of recent multi-modal models. We observe that pre-trained weights demonstrate limited abstract reasoning, which dramatically improves with fine-tuning. We also observe that explicitly describing parts aids abstract reasoning for both humans and models, especially when jointly encoding the linguistic and visual inputs. KiloGram is available at https://lil.nlp.cornell.edu/kilogram .
translated by 谷歌翻译
We explore unifying a neural segmenter with two-pass cascaded encoder ASR into a single model. A key challenge is allowing the segmenter (which runs in real-time, synchronously with the decoder) to finalize the 2nd pass (which runs 900 ms behind real-time) without introducing user-perceived latency or deletion errors during inference. We propose a design where the neural segmenter is integrated with the causal 1st pass decoder to emit a end-of-segment (EOS) signal in real-time. The EOS signal is then used to finalize the non-causal 2nd pass. We experiment with different ways to finalize the 2nd pass, and find that a novel dummy frame injection strategy allows for simultaneous high quality 2nd pass results and low finalization latency. On a real-world long-form captioning task (YouTube), we achieve 2.4% relative WER and 140 ms EOS latency gains over a baseline VAD-based segmenter with the same cascaded encoder.
translated by 谷歌翻译
Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic features are poorly defined. Here, we present a method for improving explainability of DNN models using synthetic histology generated by a conditional generative adversarial network (cGAN). We show that cGANs generate high-quality synthetic histology images that can be leveraged for explaining DNN models trained to classify molecularly-subtyped tumors, exposing histologic features associated with molecular state. Fine-tuning synthetic histology through class and layer blending illustrates nuanced morphologic differences between tumor subtypes. Finally, we demonstrate the use of synthetic histology for augmenting pathologist-in-training education, showing that these intuitive visualizations can reinforce and improve understanding of histologic manifestations of tumor biology.
translated by 谷歌翻译
在不同的运动模式之间切换(例如,楼梯上升/下降,坡道上升/下降)时,动力的假肢腿必须预见用户的意图。许多数据驱动的分类技术已经证明了预测用户意图的有希望的结果,但是这些意图预测模型对新主题的表现仍然不受欢迎。在其他域(例如,图像分类)中,通过从大型数据集(即预训练的模型)中使用先前学习的功能,然后将此学模型转移到可用的新任务中,可以提高转移学习的精度。在本文中,我们开发了一个基于人类运动数据集的内部受试者(受试者)和主体间(主体独立)验证的深卷卷神经网络。然后,我们使用剩下的主题中的一小部分(10%)将转移学习应用于主题独立的模型。我们比较了这三个模型的性能。我们的结果表明,转移学习(TL)模型的表现优于主题无关(IND)模型,并且与主题依赖性(DEP)模型(DEP错误:0.74 $ \ pm $ 0.002%,IND错误:11.59 $ \ \ PM $ 0.076%,TL错误:3.57 $ \ pm $ 0.02%,有10%的数据)。此外,正如预期的那样,随着剩余主题的更多数据的可用性,转移学习精度会提高。我们还通过各种传感器配置评估了意图预测系统的性能,这些传感器配置可能会在假肢应用程序中可用。我们的结果表明,假体的大腿IMU足以预测实践中的运动意图。
translated by 谷歌翻译
当前的图表学习技术使用图形神经网络(GNN)从数据集嵌入中提取功能。在这项工作中,我们检查了这些嵌入的质量,并评估改变它们如何影响GNN的准确性。我们探索图像和文本的不同嵌入提取技术。我们发现,嵌入的选择会偏见不同GNN体系结构的性能,因此嵌入的选择会影响GNN的选择,而与基础数据集无关。此外,与从划痕训练或在基础数据上进行微调的模型的准确性相比,我们只能看到一些GNN模型的准确性提高,而无需使用图形连接。作为替代方案,我们提出了与图形连接的网络(GRANET)层,该网络使用GNN消息传递在大型模型中以允许邻居聚集。如果可能的话,这为模型提供了从大型预训练模型继承权重的机会,我们证明与先前方法相比,这种方法提高了准确性:在FlickR_V2上,Granet击败GAT2和GraphSage和图形分别提高了7.7%和1.7%。
translated by 谷歌翻译
我们挑战AI模型,以“展示”对《纽约客》标题比赛的复杂多模式幽默的理解。具体而言,我们开发了三个精心限制的任务,以掌握图像和标题之间的潜在复杂和意外的关系,并且对人类经验的广泛品种产生了复杂和意外的寓意;这些是纽约口径卡通的标志。我们调查了直接将卡通像素和字幕输入的视觉和语言模型,以及仅通过提供图像的文本描述来规避图像处理的仅限语言模型。即使我们为卡通图像提供了丰富的多方面注释,我们也可以确定高质量的机器学习模型(例如,微调,175b参数语言模型)和人类之间的性能差距。我们公开发布我们的语料库,包括描述图像的位置/实体的注释,场景的不寻常以及对笑话的解释。
translated by 谷歌翻译
与LTE网络相比,5G的愿景在于提供较高的数据速率,低延迟(为了实现近实时应用程序),大大增加了基站容量以及用户的接近完美服务质量(QoS)。为了提供此类服务,5G系统将支持LTE,NR,NR-U和Wi-Fi等访问技术的各种组合。每种无线电访问技术(RAT)都提供不同类型的访问,这些访问应在用户中对其进行最佳分配和管理。除了资源管理外,5G系统还将支持双重连接服务。因此,网络的编排对于系统经理在旧式访问技术方面来说是一个更困难的问题。在本文中,我们提出了一种基于联合元学习(FML)的大鼠分配算法,该算法使RAN Intelligent Controller(RIC)能够更快地适应动态变化的环境。我们设计了一个包含LTE和5G NR服务技术的模拟环境。在模拟中,我们的目标是在传输的截止日期内满足UE需求,以提供更高的QoS值。我们将提出的算法与单个RL试剂,爬行动物算法和基于规则的启发式方法进行了比较。仿真结果表明,提出的FML方法分别在第一部部署回合21%和12%时达到了较高的缓存率。此外,在比较方法中,提出的方法最快地适应了新任务和环境。
translated by 谷歌翻译
游戏理论一直是控制疾病传播并提出个人和地区级别最佳政策的有效工具。在此AMS通知文章中,我们关注Covid-19的干预的决策制定,旨在提供数学模型和有效的机器学习方法,以及对过去实施的相关政策的理由,并如何解释当局如何解释当局从游戏理论的角度来看,决策会影响其邻近地区。
translated by 谷歌翻译
现有的数据集用于训练窄带射频(RF)信号分类的深度学习模型缺乏信号类型和渠道障碍的多样性,无法充分评估现实世界中的模型性能。我们介绍了SIG53数据集,该数据集由500万个合成生成的样品组成,来自53个不同的信号类别和专业选择的损害。我们还介绍了Torchsig,这是一种信号处理机学习工具包,可用于生成此数据集。 Torchsig结合了视觉域共有的数据处理原理,它旨在作为未来信号机器学习研究的开源基础。使用SIG53数据集的初始实验是使用最新技术(SOTA)卷积神经网络(Convnets)和变压器进行的。这些实验揭示了变形金刚在不需要额外正规化或转向师教师的情况下优于转向器,这与视觉领域的结果相反。其他实验表明,火炬的特定于域的数据增强功能有助于模型培训,最终使模型性能受益。最后,Torchsig在训练时支持即时的合成数据创建,从而可以通过几乎无限的数据集实现大规模训练会话。
translated by 谷歌翻译