机器学习正在转向通用佩带的生成模型,以自我监督的方式在大量数据上训练,然后可以应用于解决大量任务。然而,由于其通用培训方法,这些模型通常无法满足一些下游要求(例如,在自动代码生成中的抽象摘要或错误格式的幻觉)。这提出了关于如何在不破坏其功能的情况下将预先训练的生成模型调整到新任务的重要问题。最近的工作建议通过代表基于能量的模型(EBMS)来解决任务特定要求,并使用分配策略梯度(DPG)近似这些EBM来解决这个问题。不幸的是,这种方法仅限于无条件的分布,由无条件的EBM表示。在本文中,我们通过提出条件DPG(CDPG)来扩展这种方法。我们在两个任务中评估了三种不同控制目标的CDPG:与T5和GPT-Neo的代码生成摘要。我们的结果表明,使用CDPG的微调稳健地将这些佩带的模型更接近地满足控制目标,而 - 与基线​​方法相比 - 不会导致灾难性的遗忘。
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
基于能量的模型(EBMS)允许极其灵活的概率分布规范。然而,它们不提供从这些分布中获得精确样本的机制。蒙特卡罗技术可以帮助我们获得样品,如果我们可以轻易采用可用的一些建议分布。例如,抑制采样可以提供精确的样本,但由于需要找到上限目标分布的提案分布,通常难以或不可能应用。大致马克洛夫链Monte Carlo采样技术通常更容易设计,利用在不断发展的样本上执行本地编辑的本地提案分布。然而,由于提案分布的本地性质,这些技术可能效率低下,并且不提供对样品质量的估计。在这项工作中,我们提出了一种新的近似采样技术,准拒绝采样(QRS),允许采样效率和采样质量之间进行权衡,同时提供显式收敛界限和诊断。 QRS大写从深度学习模型获得的高质量全球提案分布的可用性。我们展示了QRS采样对具有分布约束和解释生成的受控文本生成任务的分离EBMS对文本的有效性。我们表明,我们可以以采样效率的成本,从这些eBMS采样。
translated by 谷歌翻译
自然语言生成模型的力量引起了一种对自动方法的兴趣,以检测一段文本是人类或机器撰写的。到目前为止的问题已经以标准的监督方式框架,包括培训关于注释数据的分类器,以预测一个给定新文档的起源。在本文中,我们以无监督和分配方式框架问题:我们假设我们可以访问大量未经发布的文件,其中一大部分是机器生成的。我们提出了一种方法来检测利用重复高阶n-gram的那些机器生成的文件,我们在与人类中相比,我们在机器生成的文本中显示出来。弱信号是自我训练设置的起点,其中伪标记的文档用于培训分类器的集合。我们的实验表明,利用该信号使我们能够准确地对待可疑文件。对于Top-K采样策略,5000的精度超过90%,核心采样超过80%,我们使用的最大型号(GPT2-大)。模型大小增加的下降很小,这可能表明结果适用于其他当前和未来的大型语言模型。
translated by 谷歌翻译
人工生命的主要目标之一是研究生命的出现的条件,而不是必然,但可能是。人工化学是为此目的最重要的工具之一,因为它们为我们提供了一个基本框架来调查,在这种情况下,可以出现能够再现自己的代谢,最终能够出现。虽然成功地在制定了紧急自我繁殖新陈代谢的例子的情况下,但涉及的规则仍然过于复杂,无法在工作中的基本原则上阐明。在本文中,我们假设自我繁殖代谢所需的关键性质是出现的是存在于自动催化的依赖性化合物的副本。我们通过保护法律验证了这一假设,与保护规律是基于一个称为组合逻辑的完整重写系统。我们的实验表明,从塔巴拉RAS状态开始的单一进行这种化学性,发现 - 没有外部干预 - 广泛的紧急结构包括在每个周期中自我繁殖的那些。所有这些结构采用从环境中获取基本成分的递归算法的形式,并将它们分解在与生物代谢相似的过程中。
translated by 谷歌翻译
Automated text analysis has become a widely used tool in political science. In this research, we use a BERT model trained on German party manifestos to identify the individual parties' contribution to the coalition agreement of 2021.
translated by 谷歌翻译
Text-to-SQL semantic parsing is an important NLP task, which greatly facilitates the interaction between users and the database and becomes the key component in many human-computer interaction systems. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MultiSpider, the largest multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MultiSpider, we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVe (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
translated by 谷歌翻译
Heating in private households is a major contributor to the emissions generated today. Heat pumps are a promising alternative for heat generation and are a key technology in achieving our goals of the German energy transformation and to become less dependent on fossil fuels. Today, the majority of heat pumps in the field are controlled by a simple heating curve, which is a naive mapping of the current outdoor temperature to a control action. A more advanced control approach is model predictive control (MPC) which was applied in multiple research works to heat pump control. However, MPC is heavily dependent on the building model, which has several disadvantages. Motivated by this and by recent breakthroughs in the field, this work applies deep reinforcement learning (DRL) to heat pump control in a simulated environment. Through a comparison to MPC, it could be shown that it is possible to apply DRL in a model-free manner to achieve MPC-like performance. This work extends other works which have already applied DRL to building heating operation by performing an in-depth analysis of the learned control strategies and by giving a detailed comparison of the two state-of-the-art control methods.
translated by 谷歌翻译
Early detection of relevant locations in a piece of news is especially important in extreme events such as environmental disasters, war conflicts, disease outbreaks, or political turmoils. Additionally, this detection also helps recommender systems to promote relevant news based on user locations. Note that, when the relevant locations are not mentioned explicitly in the text, state-of-the-art methods typically fail to recognize them because these methods rely on syntactic recognition. In contrast, by incorporating a knowledge base and connecting entities with their locations, our system successfully infers the relevant locations even when they are not mentioned explicitly in the text. To evaluate the effectiveness of our approach, and due to the lack of datasets in this area, we also contribute to the research community with a gold-standard multilingual news-location dataset, NewsLOC. It contains the annotation of the relevant locations (and their WikiData IDs) of 600+ Wikinews articles in five different languages: English, French, German, Italian, and Spanish. Through experimental evaluations, we show that our proposed system outperforms the baselines and the fine-tuned version of the model using semi-supervised data that increases the classification rate. The source code and the NewsLOC dataset are publicly available for being used by the research community at https://github.com/vsuarezpaniagua/NewsLocation.
translated by 谷歌翻译