成本敏感的分类对于错误分类错误的成本差异很大,至关重要。但是,过度参数化对深神经网络(DNNS)的成本敏感建模构成了基本挑战。 DNN完全插值训练数据集的能力可以渲染DNN,纯粹在训练集上进行评估,无效地区分了成本敏感的解决方案和其总体准确性最大化。这需要重新思考DNN中的成本敏感分类。为了应对这一挑战,本文提出了一个具有成本敏感的对抗数据增强(CSADA)框架,以使过度参数化的模型成本敏感。总体想法是生成针对性的对抗示例,以推动成本感知方向的决策边界。这些有针对性的对抗样本是通过最大化关键分类错误的可能性而产生的,并用于训练一个模型,以更加保守的对成对的决策。公开可用的有关著名数据集和药物药物图像(PMI)数据集的实验表明,我们的方法可以有效地最大程度地减少整体成本并减少关键错误,同时在整体准确性方面达到可比的性能。
translated by 谷歌翻译
Inspired by foundational studies in classical and quantum physics, and by information retrieval studies in quantum information theory, we have recently proved that the notions of 'energy' and 'entropy' can be consistently introduced in human language and, more generally, in human culture. More explicitly, if energy is attributed to words according to their frequency of appearance in a text, then the ensuing energy levels are distributed non-classically, namely, they obey Bose-Einstein, rather than Maxwell-Boltzmann, statistics, as a consequence of the genuinely 'quantum indistinguishability' of the words that appear in the text. Secondly, the 'quantum entanglement' due to the way meaning is carried by a text reduces the (von Neumann) entropy of the words that appear in the text, a behaviour which cannot be explained within classical (thermodynamic or information) entropy. We claim here that this 'quantum-type behaviour is valid in general in human cognition', namely, any text is conceptually more concrete than the words composing it, which entails that the entropy of the overall text decreases. This result can be prolonged to human culture and its collaborative entities having lower entropy than their constituent elements. We use these findings to propose the development of a new 'non-classical thermodynamic theory for human cognition and human culture', which bridges concepts and quantum entities and agrees with some recent findings on the conceptual, not physical, nature of quantum entities.
translated by 谷歌翻译
The distributed representation of symbols is one of the key technologies in machine learning systems today, playing a pivotal role in modern natural language processing. Traditional word embeddings associate a separate vector with each word. While this approach is simple and leads to good performance, it requires a lot of memory for representing a large vocabulary. To reduce the memory footprint, the default embedding layer in spaCy is a hash embeddings layer. It is a stochastic approximation of traditional embeddings that provides unique vectors for a large number of words without explicitly storing a separate vector for each of them. To be able to compute meaningful representations for both known and unknown words, hash embeddings represent each word as a summary of the normalized word form, subword information and word shape. Together, these features produce a multi-embedding of a word. In this technical report we lay out a bit of history and introduce the embedding methods in spaCy in detail. Second, we critically evaluate the hash embedding architecture with multi-embeddings on Named Entity Recognition datasets from a variety of domains and languages. The experiments validate most key design choices behind spaCy's embedders, but we also uncover a few surprising results.
translated by 谷歌翻译
Quantifying the deviation of a probability distribution is challenging when the target distribution is defined by a density with an intractable normalizing constant. The kernel Stein discrepancy (KSD) was proposed to address this problem and has been applied to various tasks including diagnosing approximate MCMC samplers and goodness-of-fit testing for unnormalized statistical models. This article investigates a convergence control property of the diffusion kernel Stein discrepancy (DKSD), an instance of the KSD proposed by Barp et al. (2019). We extend the result of Gorham and Mackey (2017), which showed that the KSD controls the bounded-Lipschitz metric, to functions of polynomial growth. Specifically, we prove that the DKSD controls the integral probability metric defined by a class of pseudo-Lipschitz functions, a polynomial generalization of Lipschitz functions. We also provide practical sufficient conditions on the reproducing kernel for the stated property to hold. In particular, we show that the DKSD detects non-convergence in moments with an appropriate kernel.
translated by 谷歌翻译
最大平均差异(MMD)(例如内核Stein差异(KSD))已成为广泛应用的中心,包括假设测试,采样器选择,分布近似和变异推断。在每种情况下,这些基于内核的差异度量都需要(i)(i)将目标p与其他概率度量分开,甚至(ii)控制弱收敛到P。在本文中,我们得出了新的足够和必要的条件,以确保(i) (ii)。对于可分开的度量空间上的MMD,我们表征了那些将BOCHNER嵌入量度分开的内核,并引入了简单条件,以将所有措施用无限的内核分开,并控制与有界内核的收敛。我们在$ \ mathbb {r}^d $上使用这些结果来实质性地扩大了KSD分离和收敛控制的已知条件,并开发了已知的第一个KSD,以恰好将弱收敛到P。我们的假设检验,测量和改善样本质量以及用Stein变异梯度下降进行抽样的结果。
translated by 谷歌翻译
季节预测$ \ unicode {x2013} $预测温度和降水量为2至6周$ \ unicode {x2013} $,对于有效的水分配,野火管理,干旱和缓解洪水至关重要。最近的国际研究工作提高了操作动力学模型的亚季节能力,但是温度和降水预测技能仍然很差,部分原因是代表动态模型内大气动力学和物理学的顽固错误。为了应对这些错误,我们引入了一种自适应偏置校正(ABC)方法,该方法将最新的动力学预测与使用机器学习的观察结合在一起。当应用于欧洲中等天气预测中心(ECMWF)的领先的亚季节模型时,ABC将温度预测技能提高了60-90%,在美国的连续美国,降水预测技能提高了40-69%基于Shapley队列的实用工作流程,用于解释ABC技能的提高并根据特定的气候条件识别机遇的高技能窗口。
translated by 谷歌翻译
大脑和计算机之间的关系通常只是隐喻。但是,实际上可以在任何媒体中实现真正的计算系统。因此,人们可以认真对待大脑从字面上计算的观点。但是,如果没有使物理系统真正成为计算系统的经验标准,计算仍然是一个视角问题,尤其是对于没有明确设计和设计为计算机的自然系统(例如,大脑)。来自物理计算机和数字,当代和历史记录的实际示例的考虑因素清楚了这些经验标准。最后,将这些标准应用到大脑中显示了我们如何将大脑视为计算机(可能是类似的计算机),这反过来又阐明了该主张既有信息又可以伪造。
translated by 谷歌翻译
参数效率的方法能够使用单个冷冻的预训练的大语言模型(LLM)来通过学习特定于任务的软提示来执行许多任务,从而在串联到输入文本时调节模型行为。但是,这些学习的提示与给定的冷冻模型紧密耦合 - 如果模型已更新,则需要获得相应的新提示。在这项工作中,我们提出并调查了几种“提示回收”的方法,其中将在源模型上进行了及时培训以与新目标模型一起使用。我们的方法不依赖于目标模型的有监督的提示,特定于任务的数据或培训更新,这与从头开始的目标模型重新调整提示一样昂贵。我们表明,模型之间的回收是可能的(我们的最佳设置能够成功回收$ 88.9 \%的提示,从而产生一个提示,即表现出色的基线),但是剩下的大量性能净空,需要改进的回收技术。
translated by 谷歌翻译
培训强化学习者在多种环境中不断学习是一个具有挑战性的问题。缺乏可重复的实验和标准指标来比较不同的持续学习方法,这变得更加困难。为了解决这个问题,我们提出了Tella,这是一种测试和评估终身学习代理商的工具。Tella为终身学习代理提供了指定的,可重复的课程,同时记录详细数据进行评估和标准化分析。研究人员可以在各种学习环境中定义和分享自己的课程,或与DARPA终身学习机(L2M)计划创建的课程相抵触。
translated by 谷歌翻译
机器学习和临床研究社区利用现实世界数据(RWD)的方法,包括电子健康记录中捕获的数据(EHR)截然不同。虽然临床研究人员谨慎使用RWD进行临床研究,但用于医疗团队的ML会消费公共数据集,并以最少的审查来开发新算法。这项研究通过开发和验证ML-DQA来弥合这一差距,ML-DQA是基于RWD最佳实践的数据质量保证框架。 ML-DQA框架适用于两个地理位置的五个ML项目,分别是不同的医疗状况和不同的人群。在这五个项目中,共收集了247,536名患者的RWD,共有2,999项质量检查和24份质量报告。出现了五种可推广的实践:所有项目都使用类似的方法来分组冗余数据元素表示;所有项目都使用自动实用程序来构建诊断和药物数据元素;所有项目都使用了一个共同的基于规则的转换库;所有项目都使用统一的方法将数据质量检查分配给数据元素;所有项目都使用类似的临床裁决方法。包括临床医生,数据科学家和受训者在内的平均有5.8个人参与每个项目实施ML-DQA,每个项目平均进行了23.4个数据元素。这项研究证明了ML-DQA在医疗项目中的重要性作用,并为团队提供了开展这些基本活动的框架。
translated by 谷歌翻译