Incorporating large-scale pre-trained models with the prototypical neural networks is a de-facto paradigm in few-shot named entity recognition. Existing methods, unfortunately, are not aware of the fact that embeddings from pre-trained models contain a prominently large amount of information regarding word frequencies, biasing prototypical neural networks against learning word entities. This discrepancy constrains the two models' synergy. Thus, we propose a one-line-code normalization method to reconcile such a mismatch with empirical and theoretical grounds. Our experiments based on nine benchmark datasets show the superiority of our method over the counterpart models and are comparable to the state-of-the-art methods. In addition to the model enhancement, our work also provides an analytical viewpoint for addressing the general problems in few-shot name entity recognition or other tasks that rely on pre-trained models or prototypical neural networks.
translated by 谷歌翻译
Metric-based meta-learning is one of the de facto standards in few-shot learning. It composes of representation learning and metrics calculation designs. Previous works construct class representations in different ways, varying from mean output embedding to covariance and distributions. However, using embeddings in space lacks expressivity and cannot capture class information robustly, while statistical complex modeling poses difficulty to metric designs. In this work, we use tensor fields (``areas'') to model classes from the geometrical perspective for few-shot learning. We present a simple and effective method, dubbed hypersphere prototypes (HyperProto), where class information is represented by hyperspheres with dynamic sizes with two sets of learnable parameters: the hypersphere's center and the radius. Extending from points to areas, hyperspheres are much more expressive than embeddings. Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere. Following this idea, we also develop two variants of prototypes under other measurements. Extensive experiments and analysis on few-shot learning tasks across NLP and CV and comparison with 20+ competitive baselines demonstrate the effectiveness of our approach.
translated by 谷歌翻译
几个名称的实体识别(NER)使我们能够使用很少的标记示例为新域构建一个NER系统。但是,该任务的现有原型网络具有大致估计的标签依赖性和紧密分布的原型,因此经常导致错误分类。为了解决上述问题,我们提出了EP-NET,这是一个实体级原型网络,通过分散分布的原型增强。EP-NET构建实体级原型,并认为文本跨度为候选实体,因此它不再需要标签依赖性。此外,EP-NET从头开始训练原型,以分散分配它们,并使用空间投影将跨度与嵌入空间中的原型对齐。两项评估任务和少量网络设置的实验结果表明,EP-NET在整体性能方面始终优于先前的强大模型。广泛的分析进一步验证了EP-NET的有效性。
translated by 谷歌翻译
It has been experimentally demonstrated that humans are able to learn in a manner that allows them to make predictions on categories for which they have not seen any examples (Malaviya et al., 2022). Sucholutsky and Schonlau (2020) have recently presented a machine learning approach that aims to do the same. They utilise synthetically generated data and demonstrate that it is possible to achieve sub-linear scaling and develop models that can learn to recognise N classes from M training samples where M is less than N - aka less-than-one shot learning. Their method was, however, defined for univariate or simple multivariate data (Sucholutsky et al., 2021). We extend it to work on large, high-dimensional and real-world datasets and empirically validate it in this new and challenging setting. We apply this method to learn previously unseen NLP tasks from very few examples (4, 8 or 16). We first generate compact, sophisticated less-than-one shot representations called soft-label prototypes which are fitted on training data, capturing the distribution of different classes across the input domain space. We then use a modified k-Nearest Neighbours classifier to demonstrate that soft-label prototypes can classify data competitively, even outperforming much more computationally complex few-shot learning methods.
translated by 谷歌翻译
Few-shot relation extraction (FSRE) aims at recognizing unseen relations by learning with merely a handful of annotated instances. To generalize to new relations more effectively, this paper proposes a novel pipeline for the FSRE task based on queRy-information guided Attention and adaptive Prototype fuSion, namely RAPS. Specifically, RAPS first derives the relation prototype by the query-information guided attention module, which exploits rich interactive information between the support instances and the query instances, in order to obtain more accurate initial prototype representations. Then RAPS elaborately combines the derived initial prototype with the relation information by the adaptive prototype fusion mechanism to get the integrated prototype for both train and prediction. Experiments on the benchmark dataset FewRel 1.0 show a significant improvement of our method against state-of-the-art methods.
translated by 谷歌翻译
Few-shot named entity recognition (NER) targets generalizing to unseen labels and/or domains with few labeled examples. Existing metric learning methods compute token-level similarities between query and support sets, but are not able to fully incorporate label semantics into modeling. To address this issue, we propose a simple method to largely improve metric learning for NER: 1) multiple prompt schemas are designed to enhance label semantics; 2) we propose a novel architecture to effectively combine multiple prompt-based representations. Empirically, our method achieves new state-of-the-art (SOTA) results under 16 of the 18 considered settings, substantially outperforming the previous SOTA by an average of 8.84% and a maximum of 34.51% in relative gains of micro F1. Our code is available at https://github.com/AChen-qaq/ProML.
translated by 谷歌翻译
我们提出了一个零射门学习关系分类(ZSLRC)框架,通过其识别训练数据中不存在的新颖关系的能力来提高最先进的框架。零射击学习方法模仿人类学习和识别新概念的方式,没有先前的知识。为此,ZSLRC使用修改的高级原型网络来利用加权侧(辅助)信息。 ZSLRC的侧面信息是由关键字,名称实体的高度和标签及其同义词构建的。 ZSLRC还包括一个自动高义的提取框架,可直接从Web获取各种名称实体的高型。 ZSLRC提高了最先进的少量学习关系分类方法,依赖于标记的培训数据,因此即使在现实世界方案中也适用于某些关系对相应标记的培训示例。我们在两种公共数据集(NYT和NEREREL)上使用广泛的实验显示结果,并显示ZSLRC显着优于最先进的方法对监督学习,少量学习和零射击学习任务。我们的实验结果还展示了我们所提出的模型的有效性和稳健性。
translated by 谷歌翻译
在新课程训练时,几乎没有射击学习(FSL)方法通常假设具有准确标记的样品的清洁支持集。这个假设通常可能是不现实的:支持集,无论多么小,仍然可能包括标签错误的样本。因此,对标签噪声的鲁棒性对于FSL方法是实用的,但是这个问题令人惊讶地在很大程度上没有探索。为了解决FSL设置中标签错误的样品,我们做出了一些技术贡献。 (1)我们提供了简单而有效的特征聚合方法,改善了流行的FSL技术Protonet使用的原型。 (2)我们描述了一种嘈杂的噪声学习的新型变压器模型(TRANFS)。 TRANFS利用变压器的注意机制称重标记为错误的样品。 (3)最后,我们对迷你胶原和tieredimagenet的嘈杂版本进行了广泛的测试。我们的结果表明,TRANFS与清洁支持集的领先FSL方法相对应,但到目前为止,在存在标签噪声的情况下,它们的表现优于它们。
translated by 谷歌翻译
很少有图像分类是一个具有挑战性的问题,旨在仅基于少量培训图像来达到人类的识别水平。少数图像分类的一种主要解决方案是深度度量学习。这些方法是,通过将看不见的样本根据距离的距离进行分类,可在强大的深神经网络中学到的嵌入空间中看到的样品,可以避免以少数图像分类的少数训练图像过度拟合,并实现了最新的图像表现。在本文中,我们提供了对深度度量学习方法的最新审查,以进行2018年至2022年的少量图像分类,并根据度量学习的三个阶段将它们分为三组,即学习功能嵌入,学习课堂表示和学习距离措施。通过这种分类法,我们确定了他们面临的不同方法和问题的新颖性。我们通过讨论当前的挑战和未来趋势进行了少量图像分类的讨论。
translated by 谷歌翻译
We propose prototypical networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class. Prototypical networks learn a metric space in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve excellent results. We provide an analysis showing that some simple design decisions can yield substantial improvements over recent approaches involving complicated architectural choices and meta-learning. We further extend prototypical networks to zero-shot learning and achieve state-of-theart results on the CU-Birds dataset.
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
如今,基于变压器的模型逐渐成为人工智能先驱的默认选择。即使在几个镜头的情况下,这些模型也会显示出优势。在本文中,我们重新审视了经典方法,并提出了一种新的几次替代方法。具体而言,我们研究了几个镜头的单级问题,该问题实际上以已知样本为参考来检测未知实例是否属于同一类。可以从序列匹配的角度研究此问题。结果表明,使用元学习,经典序列匹配方法,即比较聚集,显着优于变压器。经典方法所需的培训成本要少得多。此外,我们在简单的微调和元学习下进行两种序列匹配方法之间进行了经验比较。元学习导致变压器模型的特征具有高相关尺寸。原因与变压器模型的层和头数密切相关。实验代码和数据可从https://github.com/hmt2014/fewone获得
translated by 谷歌翻译
关系提取(RE)是指在输入文本中提取关系三元组。现有的基于神经工作的系统在很大程度上依赖于手动标记的培训数据,但是仍然有很多域中不存在足够的标记数据。受到基于距离的几弹性实体识别方法的启发,我们根据序列标记的关节提取方法提出了几个弹出任务的定义,并为任务提出了一些弹出框架。此外,我们将两个实际的序列标记模型应用于我们的框架(称为少数Tplinker和几杆Bitt),并在从公共数据集构建的两个少量RE任务上实现了可靠的结果。
translated by 谷歌翻译
The distributed representation of symbols is one of the key technologies in machine learning systems today, playing a pivotal role in modern natural language processing. Traditional word embeddings associate a separate vector with each word. While this approach is simple and leads to good performance, it requires a lot of memory for representing a large vocabulary. To reduce the memory footprint, the default embedding layer in spaCy is a hash embeddings layer. It is a stochastic approximation of traditional embeddings that provides unique vectors for a large number of words without explicitly storing a separate vector for each of them. To be able to compute meaningful representations for both known and unknown words, hash embeddings represent each word as a summary of the normalized word form, subword information and word shape. Together, these features produce a multi-embedding of a word. In this technical report we lay out a bit of history and introduce the embedding methods in spaCy in detail. Second, we critically evaluate the hash embedding architecture with multi-embeddings on Named Entity Recognition datasets from a variety of domains and languages. The experiments validate most key design choices behind spaCy's embedders, but we also uncover a few surprising results.
translated by 谷歌翻译
在本文中,我们考虑了多任务表示(MTR)的框架学习的目标是使用源任务来学习降低求解目标任务的样本复杂性的表示形式。我们首先回顾MTR理论的最新进展,并表明它们可以在此框架内进行分析时为流行的元学习算法提供新颖的见解。特别是,我们重点介绍了实践中基于梯度和基于度量的算法之间的根本差异,并提出了理论分析来解释它。最后,我们使用派生的见解来通过新的基于光谱的正则化项来提高元学习方法的性能,并通过对少量分类基准的实验研究确认其效率。据我们所知,这是将MTR理论的最新学习范围付诸实践的第一项贡献,以实现几乎没有射击分类的任务。
translated by 谷歌翻译
Partial label learning (PLL) is an important problem that allows each training example to be labeled with a coarse candidate set, which well suits many real-world data annotation scenarios with label ambiguity. Despite the promise, the performance of PLL often lags behind the supervised counterpart. In this work, we bridge the gap by addressing two key research challenges in PLL -- representation learning and label disambiguation -- in one coherent framework. Specifically, our proposed framework PiCO consists of a contrastive learning module along with a novel class prototype-based label disambiguation algorithm. PiCO produces closely aligned representations for examples from the same classes and facilitates label disambiguation. Theoretically, we show that these two components are mutually beneficial, and can be rigorously justified from an expectation-maximization (EM) algorithm perspective. Moreover, we study a challenging yet practical noisy partial label learning setup, where the ground-truth may not be included in the candidate set. To remedy this problem, we present an extension PiCO+ that performs distance-based clean sample selection and learns robust classifiers by a semi-supervised contrastive learning algorithm. Extensive experiments demonstrate that our proposed methods significantly outperform the current state-of-the-art approaches in standard and noisy PLL tasks and even achieve comparable results to fully supervised learning.
translated by 谷歌翻译
具有许多预训练模型(PTM)的模型中心已经是深度学习的基石。尽管以高成本建造,但它们仍然保持\ emph {探索}:从业人员通常会通过普及从提供的模型中心中选择一个PTM,然后对PTM进行微调以解决目标任务。这种na \“我的但共同的实践构成了两个障碍,以充分利用预训练的模型中心:(1)通过受欢迎程度选择的PTM选择没有最佳保证;(2)仅使用一个PTM,而其余的PTM则被忽略。理想情况下。理想情况下。 ,为了最大程度地利用预训练的模型枢纽,需要尝试所有PTM的所有组合和广泛的微调每个PTM组合,这会产生指数组合和不可偿还的计算预算。在本文中,我们提出了一种新的范围排名和调整预训练的模型:(1)我们的会议论文〜\ citep {you_logme:_2021}提出的logMe,以估算预先训练模型提取的标签证据的最大值,该标签证据可以在模型中排名所有PTMS用于各种类型的PTM和任务的枢纽\ Emph {微调之前}。(2)如果我们不偏爱模型的体系结构,则可以对排名最佳的PTM进行微调和部署,或者可以通过TOPE调整目标PTM -k通过t排名PTM他提出了b-tuning算法。排名部分基于会议论文,我们在本文中完成了其理论分析,包括启发式证据最大化程序的收敛证明和特征维度的影响。调整零件引入了一种用于调整多个PTM的新型贝叶斯调整(B-Tuning)方法,该方法超过了专门的方法,该方法旨在调整均匀的PTMS,并为调整异质PTMS设置了一种新的技术。利用PTM枢纽的新范式对于整个机器学习社区的大量受众来说可能会很有趣。
translated by 谷歌翻译
我们提出了弗雷多(Fredo),几张文档级别的关系提取(FSDLRE)基准。与基于句子级别的关系提取语料库建立的现有基准相反,我们认为文档级的语料库提供了更多的现实主义,尤其是关于无原始的(nota)分布。因此,我们建议一组FSDLRE任务,并基于两个现有的监督学习数据集(DOCRED和SCIERC)构建基准测试。我们将最先进的句子级方法MNAV调整为文档级别,并进一步开发它以改善域的适应性。我们发现FSDLRE是一个充满挑战的环境,具有有趣的新特征,例如从支持集中进行nota实例的能力。数据,代码和训练的模型可在线获得(https://github.com/nicpopovic/fredo)。
translated by 谷歌翻译
很少有开放式识别旨在对可见类别的培训数据进行有限的培训数据进行分类和新颖的图像。这项任务的挑战是,该模型不仅需要学习判别性分类器,以用很少的培训数据对预定的类进行分类,而且还要拒绝从未见过的培训时间出现的未见类别的输入。在本文中,我们建议从两个新方面解决问题。首先,我们没有像在标准的封闭设置分类中那样学习看到类之间的决策边界,而是为看不见的类保留空间,因此位于这些区域中的图像被认为是看不见的类。其次,为了有效地学习此类决策边界,我们建议利用所见类的背景功能。由于这些背景区域没有显着促进近距离分类的决定,因此自然地将它们用作分类器学习的伪阶层。我们的广泛实验表明,我们提出的方法不仅要优于多个基线,而且还为三个流行的基准测试(即Tieredimagenet,Miniimagenet和Caltech-uscd Birds-birds-2011-2011(Cub))设定了新的最先进结果。
translated by 谷歌翻译
命名实体识别(ner)旨在标识在非结构化文本中的命名实体的提到,并将它们分类为预定义的命名实体类。尽管基于深度学习的预先训练的语言模型实现了良好的预测性能,但许多域特定的NERTASK仍然需要足够量的标记数据。主动学习(AL)是标签采集问题的一般框架,已用于NER任务,以最大限度地降低注释成本而不会牺牲模型性能。然而,令牌的严重不平衡的课程分布引入了设计有效的NER Querying方法的挑战。我们提出了al句子查询评估函数,这些函数更加关注可能的积极令牌,并评估基于句子和基于令牌的成本评估策略的这些提出的功能。我们还提出了更好的数据驱动的归一化方法来惩罚太长或太短的句子。我们在来自不同域的三个数据集上的实验表明,所提出的方法减少了带有常规方法的更好或可比预测性能的增注令牌的数量。
translated by 谷歌翻译