细颗粒实体打字(FET)旨在推断本文中提及的特定语义类型。 FET的现代方法主要集中于学习某种类型的外观。很少有作品直接建模类型差异,也就是说,让模型知道一种类型与其他类型不同的程度。为了减轻这个问题,我们提出了一种富含类型的FET的分层对比策略。我们的方法可以直接建模层次类型之间的差异,并提高区分多元类似类型的能力。一方面,我们将类型嵌入到实体上下文中,以使类型的信息直接感知。另一方面,我们在层次结构上设计了一个约束的对比策略,以直接建模类型差异,这可以同时感知不同粒度下类型之间的区分性。 BBN,Ontonotes和Figer的三个基准测试的实验结果表明,我们的方法通过有效建模类型差异在FET上实现了显着性能。
translated by 谷歌翻译
Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks. However, the existing pre-trained language models rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative entities in KGs can enhance language representation with external knowledge. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks. The source code and experiment details of this paper can be obtained from https:// github.com/thunlp/ERNIE.
translated by 谷歌翻译
关系提取(RE)是自然语言处理的基本任务。RE试图通过识别文本中的实体对之间的关系信息来将原始的,非结构化的文本转变为结构化知识。RE有许多用途,例如知识图完成,文本摘要,提问和搜索查询。RE方法的历史可以分为四个阶段:基于模式的RE,基于统计的RE,基于神经的RE和大型语言模型的RE。这项调查始于对RE的早期阶段的一些示例性作品的概述,突出了局限性和缺点,以使进度相关。接下来,我们回顾流行的基准测试,并严格检查用于评估RE性能的指标。然后,我们讨论遥远的监督,这是塑造现代RE方法发展的范式。最后,我们回顾了重点是降级和培训方法的最新工作。
translated by 谷歌翻译
我们研究了很少的细粒实体键入(FET)的问题,其中只有几个带注释的实体对每种实体类型提供了上下文。最近,基于及时的调整通过将实体类型分类任务作为“填补空白”的问题来表明在几次射击方案中表现出优越的性能。这允许有效利用预训练的语言模型(PLM)的强语建模能力。尽管当前基于及时的调整方法成功了,但仍有两个主要挑战:(1)提示中的口头化器要么是由外部知识基础手动设计或构建的,而无需考虑目标语料库和标签层次结构信息,而且(2)当前方法主要利用PLM的表示能力,但没有通过广泛的通用域预训练来探索其产生的功率。在这项工作中,我们为由两个模块组成的几个弹药fet提出了一个新颖的框架:(1)实体类型标签解释模块自动学习将类型标签与词汇联系起来,通过共同利用几个播放实例和标签层次结构和标签层次结构,以及(2)基于类型的上下文化实例生成器根据给定实例生成新实例,以扩大培训集以更好地概括。在三个基准数据集上,我们的模型优于大量利润的现有方法。可以在https://github.com/teapot123/fine-graining-entity-typing上找到代码。
translated by 谷歌翻译
与伯特(Bert)等语言模型相比,已证明知识增强语言表示的预培训模型在知识基础构建任务(即〜关系提取)中更有效。这些知识增强的语言模型将知识纳入预训练中,以生成实体或关系的表示。但是,现有方法通常用单独的嵌入表示每个实体。结果,这些方法难以代表播出的实体和大量参数,在其基础代币模型之上(即〜变压器),必须使用,并且可以处理的实体数量为由于内存限制,实践限制。此外,现有模型仍然难以同时代表实体和关系。为了解决这些问题,我们提出了一个新的预培训模型,该模型分别从图书中学习实体和关系的表示形式,并分别在文本中跨越跨度。通过使用SPAN模块有效地编码跨度,我们的模型可以代表实体及其关系,但所需的参数比现有模型更少。我们通过从Wikipedia中提取的知识图对我们的模型进行了预训练,并在广泛的监督和无监督的信息提取任务上进行了测试。结果表明,我们的模型比基线学习对实体和关系的表现更好,而在监督的设置中,微调我们的模型始终优于罗伯塔,并在信息提取任务上取得了竞争成果。
translated by 谷歌翻译
Pre-trained Language Models (PLMs) have been applied in NLP tasks and achieve promising results. Nevertheless, the fine-tuning procedure needs labeled data of the target domain, making it difficult to learn in low-resource and non-trivial labeled scenarios. To address these challenges, we propose Prompt-based Text Entailment (PTE) for low-resource named entity recognition, which better leverages knowledge in the PLMs. We first reformulate named entity recognition as the text entailment task. The original sentence with entity type-specific prompts is fed into PLMs to get entailment scores for each candidate. The entity type with the top score is then selected as final label. Then, we inject tagging labels into prompts and treat words as basic units instead of n-gram spans to reduce time complexity in generating candidates by n-grams enumeration. Experimental results demonstrate that the proposed method PTE achieves competitive performance on the CoNLL03 dataset, and better than fine-tuned counterparts on the MIT Movie and Few-NERD dataset in low-resource settings.
translated by 谷歌翻译
我们为指定实体识别(NER)提出了一个有效的双重编码框架,该框架将对比度学习用于映射候选文本跨度,并将实体类型映射到同一矢量表示空间中。先前的工作主要将NER作为序列标记或跨度分类。相反,我们将NER视为一个度量学习问题,它最大程度地提高了实体提及的向量表示之间的相似性及其类型。这使得易于处理嵌套和平坦的ner,并且可以更好地利用嘈杂的自我诉讼信号。 NER对本双重编码器制定的主要挑战在于将非实体跨度与实体提及分开。我们没有明确标记所有非实体跨度为外部(O)与大多数先前方法相同的类别(O),而是引入了一种新型的动态阈值损失,这与标准的对比度损失一起学习。实验表明,我们的方法在受到监督和远处有监督的设置中的表现良好(例如,Genia,NCBI,BC5CDR,JNLPBA)。
translated by 谷歌翻译
Document-level relation extraction faces two overlooked challenges: long-tail problem and multi-label problem. Previous work focuses mainly on obtaining better contextual representations for entity pairs, hardly address the above challenges. In this paper, we analyze the co-occurrence correlation of relations, and introduce it into DocRE task for the first time. We argue that the correlations can not only transfer knowledge between data-rich relations and data-scarce ones to assist in the training of tailed relations, but also reflect semantic distance guiding the classifier to identify semantically close relations for multi-label entity pairs. Specifically, we use relation embedding as a medium, and propose two co-occurrence prediction sub-tasks from both coarse- and fine-grained perspectives to capture relation correlations. Finally, the learned correlation-aware embeddings are used to guide the extraction of relational facts. Substantial experiments on two popular DocRE datasets are conducted, and our method achieves superior results compared to baselines. Insightful analysis also demonstrates the potential of relation correlations to address the above challenges.
translated by 谷歌翻译
With the success of the prompt-tuning paradigm in Natural Language Processing (NLP), various prompt templates have been proposed to further stimulate specific knowledge for serving downstream tasks, e.g., machine translation, text generation, relation extraction, and so on. Existing prompt templates are mainly shared among all training samples with the information of task description. However, training samples are quite diverse. The sharing task description is unable to stimulate the unique task-related information in each training sample, especially for tasks with the finite-label space. To exploit the unique task-related information, we imitate the human decision process which aims to find the contrastive attributes between the objective factual and their potential counterfactuals. Thus, we propose the \textbf{C}ounterfactual \textbf{C}ontrastive \textbf{Prompt}-Tuning (CCPrompt) approach for many-class classification, e.g., relation classification, topic classification, and entity typing. Compared with simple classification tasks, these tasks have more complex finite-label spaces and are more rigorous for prompts. First of all, we prune the finite label space to construct fact-counterfactual pairs. Then, we exploit the contrastive attributes by projecting training instances onto every fact-counterfactual pair. We further set up global prototypes corresponding with all contrastive attributes for selecting valid contrastive attributes as additional tokens in the prompt template. Finally, a simple Siamese representation learning is employed to enhance the robustness of the model. We conduct experiments on relation classification, topic classification, and entity typing tasks in both fully supervised setting and few-shot setting. The results indicate that our model outperforms former baselines.
translated by 谷歌翻译
Ultra-fine entity typing (UFET) aims to predict a wide range of type phrases that correctly describe the categories of a given entity mention in a sentence. Most recent works infer each entity type independently, ignoring the correlations between types, e.g., when an entity is inferred as a president, it should also be a politician and a leader. To this end, we use an undirected graphical model called pairwise conditional random field (PCRF) to formulate the UFET problem, in which the type variables are not only unarily influenced by the input but also pairwisely relate to all the other type variables. We use various modern backbones for entity typing to compute unary potentials, and derive pairwise potentials from type phrase representations that both capture prior semantic information and facilitate accelerated inference. We use mean-field variational inference for efficient type inference on very large type sets and unfold it as a neural network module to enable end-to-end training. Experiments on UFET show that the Neural-PCRF consistently outperforms its backbones with little cost and results in a competitive performance against cross-encoder based SOTA while being thousands of times faster. We also find Neural- PCRF effective on a widely used fine-grained entity typing dataset with a smaller type set. We pack Neural-PCRF as a network module that can be plugged onto multi-label type classifiers with ease and release it in https://github.com/modelscope/adaseq/tree/master/examples/NPCRF.
translated by 谷歌翻译
旨在从非结构化文本中提取结构信息的知识提取(KE)通常会遭受数据稀缺性和新出现的看不见类型,即低资源场景。许多低资源KE的神经方法已广泛研究并取得了令人印象深刻的表现。在本文中,我们在低资源场景中介绍了对KE的文献综述,并将现有作品分为三个范式:(1)利用更高的资源数据,(2)利用更强的模型,(3)利用数据和模型一起。此外,我们描述了有前途的应用,并概述了未来研究的一些潜在方向。我们希望我们的调查能够帮助学术和工业界更好地理解这一领域,激发更多的想法并提高更广泛的应用。
translated by 谷歌翻译
事实证明,将先验知识纳入预训练的语言模型中对知识驱动的NLP任务有效,例如实体键入和关系提取。当前的培训程序通常通过使用知识掩盖,知识融合和知识更换将外部知识注入模型。但是,输入句子中包含的事实信息尚未完全开采,并且尚未严格检查注射的外部知识。结果,无法完全利用上下文信息,并将引入额外的噪音,或者注入的知识量受到限制。为了解决这些问题,我们提出了MLRIP,该MLRIP修改了Ernie-Baidu提出的知识掩盖策略,并引入了两阶段的实体替代策略。进行全面分析的广泛实验说明了MLRIP在军事知识驱动的NLP任务中基于BERT的模型的优势。
translated by 谷歌翻译
Few-shot named entity recognition (NER) targets generalizing to unseen labels and/or domains with few labeled examples. Existing metric learning methods compute token-level similarities between query and support sets, but are not able to fully incorporate label semantics into modeling. To address this issue, we propose a simple method to largely improve metric learning for NER: 1) multiple prompt schemas are designed to enhance label semantics; 2) we propose a novel architecture to effectively combine multiple prompt-based representations. Empirically, our method achieves new state-of-the-art (SOTA) results under 16 of the 18 considered settings, substantially outperforming the previous SOTA by an average of 8.84% and a maximum of 34.51% in relative gains of micro F1. Our code is available at https://github.com/AChen-qaq/ProML.
translated by 谷歌翻译
Contrastive learning has become a new paradigm for unsupervised sentence embeddings. Previous studies focus on instance-wise contrastive learning, attempting to construct positive pairs with textual data augmentation. In this paper, we propose a novel Contrastive learning method with Prompt-derived Virtual semantic Prototypes (ConPVP). Specifically, with the help of prompts, we construct virtual semantic prototypes to each instance, and derive negative prototypes by using the negative form of the prompts. Using a prototypical contrastive loss, we enforce the anchor sentence embedding to be close to its corresponding semantic prototypes, and far apart from the negative prototypes as well as the prototypes of other sentences. Extensive experimental results on semantic textual similarity, transfer, and clustering tasks demonstrate the effectiveness of our proposed model compared to strong baselines. Code is available at https://github.com/lemon0830/promptCSE.
translated by 谷歌翻译
Alignment between image and text has shown promising improvements on patch-level pre-trained document image models. However, investigating more effective or finer-grained alignment techniques during pre-training requires a large amount of computation cost and time. Thus, a question naturally arises: Could we fine-tune the pre-trained models adaptive to downstream tasks with alignment objectives and achieve comparable or better performance? In this paper, we propose a new model architecture with alignment-enriched tuning (dubbed AETNet) upon pre-trained document image models, to adapt downstream tasks with the joint task-specific supervised and alignment-aware contrastive objective. Specifically, we introduce an extra visual transformer as the alignment-ware image encoder and an extra text transformer as the alignment-ware text encoder before multimodal fusion. We consider alignment in the following three aspects: 1) document-level alignment by leveraging the cross-modal and intra-modal contrastive loss; 2) global-local alignment for modeling localized and structural information in document images; and 3) local-level alignment for more accurate patch-level information. Experiments on various downstream tasks show that AETNet can achieve state-of-the-art performance on various downstream tasks. Notably, AETNet consistently outperforms state-of-the-art pre-trained models, such as LayoutLMv3 with fine-tuning techniques, on three different downstream tasks.
translated by 谷歌翻译
机器学习方法尤其是深度神经网络取得了巨大的成功,但其中许多往往依赖于一些标记的样品进行训练。在真实世界的应用中,我们经常需要通过例如具有新兴预测目标和昂贵的样本注释的动态上下文来解决样本短缺。因此,低资源学习,旨在学习具有足够资源(特别是培训样本)的强大预测模型,现在正在被广泛调查。在所有低资源学习研究中,许多人更喜欢以知识图(kg)的形式利用一些辅助信息,这对于知识表示变得越来越受欢迎,以减少对标记样本的依赖。在这项调查中,我们非常全面地审查了90美元的报纸关于两个主要的低资源学习设置 - 零射击学习(ZSL)的预测,从未出现过训练,而且很少拍摄的学习(FSL)预测的新类仅具有可用的少量标记样本。我们首先介绍了ZSL和FSL研究中使用的KGS以及现有的和潜在的KG施工解决方案,然后系统地分类和总结了KG感知ZSL和FSL方法,将它们划分为不同的范例,例如基于映射的映射,数据增强,基于传播和基于优化的。我们接下来呈现了不同的应用程序,包括计算机视觉和自然语言处理中的kg增强预测任务,还包括kg完成的任务,以及每个任务的一些典型评估资源。我们最终讨论了一些关于新学习和推理范式的方面的一些挑战和未来方向,以及高质量的KGs的建设。
translated by 谷歌翻译
当大型训练数据集不可用于低资源域时,命名实体识别(NER)模型通常表现不佳。最近,预先训练大规模语言模型已成为应对数据稀缺问题的有希望的方向。然而,语言建模和ner任务之间的潜在差异可能会限制模型的性能,并且由于收集的网数据集通常很小或大而是低质量,因此已经研究了NER任务的预训练。在本文中,我们构建了一个具有相对高质量的大规模核心语料库,我们基于创建的数据集预先列车。实验结果表明,我们的预训练模型可以显着优于八大域的低资源场景中的百合形和其他强基线。此外,实体表示的可视化进一步指示Ner-BERT用于对各种实体进行分类的有效性。
translated by 谷歌翻译
Ultra-fine entity typing (UFET) predicts extremely free-formed types (e.g., president, politician) of a given entity mention (e.g., Joe Biden) in context. State-of-the-art (SOTA) methods use the cross-encoder (CE) based architecture. CE concatenates the mention (and its context) with each type and feeds the pairs into a pretrained language model (PLM) to score their relevance. It brings deeper interaction between mention and types to reach better performance but has to perform N (type set size) forward passes to infer types of a single mention. CE is therefore very slow in inference when the type set is large (e.g., N = 10k for UFET). To this end, we propose to perform entity typing in a recall-expand-filter manner. The recall and expand stages prune the large type set and generate K (K is typically less than 256) most relevant type candidates for each mention. At the filter stage, we use a novel model called MCCE to concurrently encode and score these K candidates in only one forward pass to obtain the final type prediction. We investigate different variants of MCCE and extensive experiments show that MCCE under our paradigm reaches SOTA performance on ultra-fine entity typing and is thousands of times faster than the cross-encoder. We also found MCCE is very effective in fine-grained (130 types) and coarse-grained (9 types) entity typing. Our code is available at \url{https://github.com/modelscope/AdaSeq/tree/master/examples/MCCE}.
translated by 谷歌翻译
As an important fine-grained sentiment analysis problem, aspect-based sentiment analysis (ABSA), aiming to analyze and understand people's opinions at the aspect level, has been attracting considerable interest in the last decade. To handle ABSA in different scenarios, various tasks are introduced for analyzing different sentiment elements and their relations, including the aspect term, aspect category, opinion term, and sentiment polarity. Unlike early ABSA works focusing on a single sentiment element, many compound ABSA tasks involving multiple elements have been studied in recent years for capturing more complete aspect-level sentiment information. However, a systematic review of various ABSA tasks and their corresponding solutions is still lacking, which we aim to fill in this survey. More specifically, we provide a new taxonomy for ABSA which organizes existing studies from the axes of concerned sentiment elements, with an emphasis on recent advances of compound ABSA tasks. From the perspective of solutions, we summarize the utilization of pre-trained language models for ABSA, which improved the performance of ABSA to a new stage. Besides, techniques for building more practical ABSA systems in cross-domain/lingual scenarios are discussed. Finally, we review some emerging topics and discuss some open challenges to outlook potential future directions of ABSA.
translated by 谷歌翻译
Relation extraction (RE), which has relied on structurally annotated corpora for model training, has been particularly challenging in low-resource scenarios and domains. Recent literature has tackled low-resource RE by self-supervised learning, where the solution involves pretraining the relation embedding by RE-based objective and finetuning on labeled data by classification-based objective. However, a critical challenge to this approach is the gap in objectives, which prevents the RE model from fully utilizing the knowledge in pretrained representations. In this paper, we aim at bridging the gap and propose to pretrain and finetune the RE model using consistent objectives of contrastive learning. Since in this kind of representation learning paradigm, one relation may easily form multiple clusters in the representation space, we further propose a multi-center contrastive loss that allows one relation to form multiple clusters to better align with pretraining. Experiments on two document-level RE datasets, BioRED and Re-DocRED, demonstrate the effectiveness of our method. Particularly, when using 1% end-task training data, our method outperforms PLM-based RE classifier by 10.5% and 5.8% on the two datasets, respectively.
translated by 谷歌翻译