最近有关增强基于BERT的语言表示模型的工作,具有知识图(KGS)对多个NLP任务具有有希望的结果。最先进的方法通常将原始输入句型与KGS中的三元组集成,并将组合的表示馈送到BERT模型中。然而,随着BERT模型的序列长度是有限的,除了原始输入句子之外,框架不能包含太多知识,因此被迫丢弃一些知识。对于那些输入的下游任务是一个很长的段落甚至是文档,例如QA或阅读理解任务,问题尤其严重。为了解决这个问题,我们提出屋顶,一个模型,具有两个底层臀部和融合层。其中一个底层伯特编码了知识资源,另一个伯特编码了原始输入句子,并且屋顶等融合层集成了伯特的编码。 QA任务的实验结果揭示了拟议模型的有效性。
translated by 谷歌翻译
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a;Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
translated by 谷歌翻译
Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks. However, the existing pre-trained language models rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative entities in KGs can enhance language representation with external knowledge. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks. The source code and experiment details of this paper can be obtained from https:// github.com/thunlp/ERNIE.
translated by 谷歌翻译
Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.
translated by 谷歌翻译
问题答案(QA)是自然语言处理中最具挑战性的最具挑战性的问题之一(NLP)。问答(QA)系统试图为给定问题产生答案。这些答案可以从非结构化或结构化文本生成。因此,QA被认为是可以用于评估文本了解系统的重要研究区域。大量的QA研究致力于英语语言,调查最先进的技术和实现最先进的结果。然而,由于阿拉伯QA中的研究努力和缺乏大型基准数据集,在阿拉伯语问答进展中的研究努力得到了很大速度的速度。最近许多预先接受的语言模型在许多阿拉伯语NLP问题中提供了高性能。在这项工作中,我们使用四个阅读理解数据集来评估阿拉伯QA的最先进的接种变压器模型,它是阿拉伯语 - 队,ArcD,AQAD和TYDIQA-GoldP数据集。我们微调并比较了Arabertv2基础模型,ArabertV0.2大型型号和ARAElectra模型的性能。在最后,我们提供了一个分析,了解和解释某些型号获得的低绩效结果。
translated by 谷歌翻译
Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks. By integrating external knowledge into PLMs, \textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained \underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to overcome the above-mentioned limitations. In this paper, we examine KEPLMs systematically through a series of studies. Specifically, we outline the common types and different formats of knowledge to be integrated into KEPLMs, detail the existing methods for building and evaluating KEPLMS, present the applications of KEPLMs in downstream tasks, and discuss the future research directions. Researchers will benefit from this survey by gaining a quick and comprehensive overview of the latest developments in this field.
translated by 谷歌翻译
We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. Span-BERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERT large , our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0 respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6% F1), strong performance on the TACRED relation extraction benchmark, and even gains on GLUE. 1 * Equal contribution. 1 Our code and pre-trained models are available at https://github.com/facebookresearch/ SpanBERT.
translated by 谷歌翻译
This paper presents a new UNIfied pre-trained Language Model (UNILM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UNILM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, UNILM achieves new state-ofthe-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.51 (2.04 absolute improvement), the Gigaword abstractive summarization ROUGE-L to 35.75 (0.86 absolute improvement), the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), the SQuAD question generation BLEU-4 to 22.12 (3.75 absolute improvement), and the DSTC7 document-grounded dialog response generation NIST-4 to 2.67 (human performance is 2.65). The code and pre-trained models are available at https://github.com/microsoft/unilm. * Equal contribution. † Contact person.
translated by 谷歌翻译
知识库问题应答(KBQA)旨在在外部知识库的帮助下回答自然语言问题。核心思想是找到内部知识与知识库的已知三元组之间的内部知识之间的联系。 KBQA任务管道包含几个步骤,包括实体识别,关系提取和实体链接。这种管道方法意味着任何过程中的错误将不可避免地传播到最终预测。为了解决上述问题,本文提出了一种具有预培训语言模型(PLM)和知识图(KG)的语料库生成 - 检索方法(CGRM)。首先,基于MT5模型,我们设计了两个新的预训练任务:基于段落的知识屏蔽语言建模和问题,以获取知识增强型T5(KT5)模型。其次,在用一系列启发式规则预处理知识图的预处理之后,KT5模型基于处理的三元组生成自然语言QA对。最后,我们通过检索合成数据集直接解决QA。我们在NLPCC-ICCPOL 2016 KBQA数据集上测试我们的方法,结果表明,我们的框架提高了KBQA的性能,直接向前的方法与最先进的方法竞争。
translated by 谷歌翻译
与伯特(Bert)等语言模型相比,已证明知识增强语言表示的预培训模型在知识基础构建任务(即〜关系提取)中更有效。这些知识增强的语言模型将知识纳入预训练中,以生成实体或关系的表示。但是,现有方法通常用单独的嵌入表示每个实体。结果,这些方法难以代表播出的实体和大量参数,在其基础代币模型之上(即〜变压器),必须使用,并且可以处理的实体数量为由于内存限制,实践限制。此外,现有模型仍然难以同时代表实体和关系。为了解决这些问题,我们提出了一个新的预培训模型,该模型分别从图书中学习实体和关系的表示形式,并分别在文本中跨越跨度。通过使用SPAN模块有效地编码跨度,我们的模型可以代表实体及其关系,但所需的参数比现有模型更少。我们通过从Wikipedia中提取的知识图对我们的模型进行了预训练,并在广泛的监督和无监督的信息提取任务上进行了测试。结果表明,我们的模型比基线学习对实体和关系的表现更好,而在监督的设置中,微调我们的模型始终优于罗伯塔,并在信息提取任务上取得了竞争成果。
translated by 谷歌翻译
事实证明,将先验知识纳入预训练的语言模型中对知识驱动的NLP任务有效,例如实体键入和关系提取。当前的培训程序通常通过使用知识掩盖,知识融合和知识更换将外部知识注入模型。但是,输入句子中包含的事实信息尚未完全开采,并且尚未严格检查注射的外部知识。结果,无法完全利用上下文信息,并将引入额外的噪音,或者注入的知识量受到限制。为了解决这些问题,我们提出了MLRIP,该MLRIP修改了Ernie-Baidu提出的知识掩盖策略,并引入了两阶段的实体替代策略。进行全面分析的广泛实验说明了MLRIP在军事知识驱动的NLP任务中基于BERT的模型的优势。
translated by 谷歌翻译
三重提取是自然语言处理和知识图构建信息提取的重要任务。在本文中,我们重新审视了序列生成的端到端三重提取任务。由于生成三重提取可能难以捕获长期依赖性并产生不忠的三元组,因此我们引入了一种新型模型,即与生成变压器的对比度三重提取。具体而言,我们为基于编码器的生成引入了一个共享的变压器模块。为了产生忠实的结果,我们提出了一个新颖的三胞胎对比训练对象。此外,我们引入了两种机制,以进一步提高模型性能(即,批处理动态注意力掩盖和三个方面的校准)。在三个数据集(即NYT,WebNLG和MIE)上进行的实验结果表明,我们的方法比基线的方法更好。
translated by 谷歌翻译
学术知识图(KGS)提供了代表科学出版物编码的知识的丰富的结构化信息来源。随着出版的科学文学的庞大,包括描述科学概念的过多的非均匀实体和关系,这些公斤本质上是不完整的。我们呈现Exbert,一种利用预先训练的变压器语言模型来执行学术知识图形完成的方法。我们将知识图形的三元组模型为文本并执行三重分类(即,属于KG或不属于KG)。评估表明,在三重分类,链路预测和关系预测的任务中,Exbert在三个学术kg完成数据集中表现出其他基线。此外,我们将两个学术数据集作为研究界的资源,从公共公共公报和在线资源中收集。
translated by 谷歌翻译
链路预测在知识图中起着重要作用,这是许多人工智能任务的重要资源,但它通常受不完整的限制。在本文中,我们提出了知识图表BERT for Link预测,名为LP-BERT,其中包含两个培训阶段:多任务预训练和知识图微调。预训练策略不仅使用掩码语言模型(MLM)来学习上下文语料库的知识,还引入掩模实体模型(MEM)和掩模关系模型(MRM),其可以通过预测语义来学习三元组的关系信息基于实体和关系元素。结构化三维关系信息可以转换为非结构化语义信息,可以将其与上下文语料库信息一起集成到培训模型中。在微调阶段,灵感来自对比学习,我们在样本批量中进行三样式的负面取样,这大大增加了负采样的比例,同时保持训练时间几乎不变。此外,我们提出了一种基于Triples的逆关系的数据增强方法,以进一步增加样本分集。我们在WN18RR和UMLS数据集上实现最先进的结果,特别是HITS @ 10指示器从WN18RR数据集上的先前最先进的结果提高了5 \%。
translated by 谷歌翻译
知识增强的预训练预审语言模型(Keplms)是预先接受的模型,具有从知识图中注入的关系三元组,以提高语言理解能力。为了保证有效的知识注入,之前的研究将模型与知识编码器集成,以表示从知识图表中检索的知识。知识检索和编码的操作带来了重要的计算负担,限制了在需要高推理速度的现实应用程序中使用这些模型。在本文中,我们提出了一种名为DKPLM的新型KEPLM,其在预训练,微调和推理阶段进行了预先训练的语言模型的知识注射过程,这有助于KEPLMS在现实世界场景中的应用。具体而言,我们首先检测知识感知的长尾实体作为知识注射的目标,增强了Keplms的语义理解能力,避免注入冗余信息。长尾实体的嵌入式被相关知识三元组形成的“伪令牌表示”取代。我们进一步设计了用于预培训的关系知识解码任务,以强制模型通过关系三重重建来真正了解注入的知识。实验表明,我们的模型在零拍摄知识探测任务和多种知识意识语言理解任务中显着优于其他KEPLS。我们进一步表明,由于分解机制,DKPLM具有比其他竞争模型更高的推理速度。
translated by 谷歌翻译
如今,预先训练的语言模型对于问题产生(QG)任务取得了巨大成功,并明显超过传统的顺序到序列方法。但是,预训练的模型将输入段视为平坦序列,因此不了解输入段的文本结构。对于QG任务,我们将文本结构建模为答案位置和句法依赖性,并提出答案局部性建模和句法掩盖的注意,以解决这些局限性。特别是,我们以高斯偏见为局部建模,以使模型能够专注于答案的上下文,并提出一种掩盖注意机制,以使输入段落的句法结构在问题生成过程中访问。在小队数据集上进行的实验表明,我们提出的两个模块改善了强大的预训练模型ProPHETNET的性能,并将它们梳理在一起,可以通过最先进的预培训模型来实现非常有竞争力的结果。
translated by 谷歌翻译
来自变压器(BERT)的双向编码器表示显示了各种NLP任务的奇妙改进,并且已经提出了其连续的变体来进一步提高预先训练的语言模型的性能。在本文中,我们的目标是首先介绍中国伯特的全文掩蔽(WWM)策略,以及一系列中国预培训的语言模型。然后我们还提出了一种简单但有效的型号,称为Macbert,这在几种方面提高了罗伯塔。特别是,我们提出了一种称为MLM作为校正(MAC)的新掩蔽策略。为了展示这些模型的有效性,我们创建了一系列中国预先培训的语言模型,作为我们的基线,包括BERT,Roberta,Electra,RBT等。我们对十个中国NLP任务进行了广泛的实验,以评估创建的中国人托管语言模型以及提议的麦克白。实验结果表明,Macbert可以在许多NLP任务上实现最先进的表演,我们还通过几种可能有助于未来的研究的调查结果来消融细节。我们开源我们的预先培训的语言模型,以进一步促进我们的研究界。资源可用:https://github.com/ymcui/chinese-bert-wwm
translated by 谷歌翻译
Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets. 1
translated by 谷歌翻译
尽管预训练的语言模型(PLM)已经在各种自然语言处理(NLP)任务上实现了最先进的表现,但在处理知识驱动的任务时,它们被证明缺乏知识。尽管为PLM注入知识做出了许多努力,但此问题仍然开放。为了应对挑战,我们提出\ textbf {dictbert},这是一种具有词典知识增强PLM的新方法,比知识图(kg)更容易获取。在预训练期间,我们通过对比度学习将两个新颖的预训练任务注入plms:\ textit {dictionary输入预测}和\ textit {entriT {输入描述歧视}。在微调中,我们使用预先训练的dictbert作为插件知识库(KB)来检索输入序列中确定的条目的隐式知识,并将检索到的知识注入输入中,以通过新颖的额外跳动来增强其表示形式注意机制。我们评估了我们的方法,包括各种知识驱动和语言理解任务,包括NER,关系提取,CommonSenseQA,OpenBookQa和Glue。实验结果表明,我们的模型可以显着改善典型的PLM:它的大幅提高了0.5 \%,2.9 \%,9.0 \%,7.1 \%\%和3.3 \%,分别对Roberta--也有效大的。
translated by 谷歌翻译
Entities, as important carriers of real-world knowledge, play a key role in many NLP tasks. We focus on incorporating entity knowledge into an encoder-decoder framework for informative text generation. Existing approaches tried to index, retrieve, and read external documents as evidence, but they suffered from a large computational overhead. In this work, we propose an encoder-decoder framework with an entity memory, namely EDMem. The entity knowledge is stored in the memory as latent representations, and the memory is pre-trained on Wikipedia along with encoder-decoder parameters. To precisely generate entity names, we design three decoding methods to constrain entity generation by linking entities in the memory. EDMem is a unified framework that can be used on various entity-intensive question answering and generation tasks. Extensive experimental results show that EDMem outperforms both memory-based auto-encoder models and non-memory encoder-decoder models.
translated by 谷歌翻译