将神经表示与语言因素联系起来至关重要,对于人类可以解释的NLP模型。在这些因素中,句法角色(例如主题,直接对象,$ \ dots $)及其实现是必不可少的标记,因为它们可以理解为谓语结构的分解,因此可以理解为句子的含义。从引起注意的深层概率生成模型开始,我们衡量潜在变量与句法角色实现之间的相互作用,并表明可以在不监督的情况下获得句子的表示,而不同的语法角色对应于清晰识别不同的潜在变量。我们提出的概率模型是注意力驱动的变异自动编码器(Advae)。从基于变压器的机器翻译模型中汲取灵感,可以通过注意力分析潜在变量和输入令牌之间的相互作用。我们还制定了一个评估协议,以衡量句法角色的实现。该协议基于对编码器的注意最大值和解码器的潜在变量扰动。我们在SNLI数据集中对原始英语文本进行的实验表明,可以在没有监督的情况下诱导$ \ textit {i)} $ dentangement句法角色,$ \ textit {ii)} $ advae分离句法角色比经典序列VAE和Transferaler sequence VAE和Transformer Vaes更好,$ \ textit {iii)} $句法角色的实现可以通过仅仅干预相关的潜在变量在句子中分别修改。我们的工作构成了无监督的可控内容生成的第一步。我们的工作代码公开可用。
translated by 谷歌翻译
We present Azimuth, an open-source and easy-to-use tool to perform error analysis for text classification. Compared to other stages of the ML development cycle, such as model training and hyper-parameter tuning, the process and tooling for the error analysis stage are less mature. However, this stage is critical for the development of reliable and trustworthy AI systems. To make error analysis more systematic, we propose an approach comprising dataset analysis and model quality assessment, which Azimuth facilitates. We aim to help AI practitioners discover and address areas where the model does not generalize by leveraging and integrating a range of ML techniques, such as saliency maps, similarity, uncertainty, and behavioral analyses, all in one tool. Our code and documentation are available at github.com/servicenow/azimuth.
translated by 谷歌翻译
Recent research has shown remarkable performance in leveraging multiple extraneous conditional and non-mutually exclusive semantic concepts for sound source separation, allowing the flexibility to extract a given target source based on multiple different queries. In this work, we propose a new optimal condition training (OCT) method for single-channel target source separation, based on greedy parameter updates using the highest performing condition among equivalent conditions associated with a given target source. Our experiments show that the complementary information carried by the diverse semantic concepts significantly helps to disentangle and isolate sources of interest much more efficiently compared to single-conditioned models. Moreover, we propose a variation of OCT with condition refinement, in which an initial conditional vector is adapted to the given mixture and transformed to a more amenable representation for target source extraction. We showcase the effectiveness of OCT on diverse source separation experiments where it improves upon permutation invariant models with oracle assignment and obtains state-of-the-art performance in the more challenging task of text-based source separation, outperforming even dedicated text-only conditioned models.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Parameter-efficient fine-tuning (PEFT) methods can adapt large language models to downstream tasks by training a small amount of newly added parameters. In multi-task settings, PEFT adapters typically train on each task independently, inhibiting transfer across tasks, or on the concatenation of all tasks, which can lead to negative interference. To address this, Polytropon (Ponti et al.) jointly learns an inventory of PEFT adapters and a routing function to share variable-size sets of adapters across tasks. Subsequently, adapters can be re-combined and fine-tuned on novel tasks even with limited data. In this paper, we investigate to what extent the ability to control which adapters are active for each task leads to sample-efficient generalization. Thus, we propose less expressive variants where we perform weighted averaging of the adapters before few-shot adaptation (Poly-mu) instead of learning a routing function. Moreover, we introduce more expressive variants where finer-grained task-adapter allocation is learned through a multi-head routing function (Poly-S). We test these variants on three separate benchmarks for multi-task learning. We find that Poly-S achieves gains on all three (up to 5.3 points on average) over strong baselines, while incurring a negligible additional cost in parameter count. In particular, we find that instruction tuning, where models are fully fine-tuned on natural language instructions for each task, is inferior to modular methods such as Polytropon and our proposed variants.
translated by 谷歌翻译
经常性的神经网络传感器(RNN-T)目标在建立当今最好的自动语音识别(ASR)系统中发挥着重要作用。与连接员时间分类(CTC)目标类似,RNN-T损失使用特定规则来定义生成一组对准以形成用于全汇训练的格子。但是,如果这些规则是最佳的,则在很大程度上未知,并且会导致最佳ASR结果。在这项工作中,我们介绍了一种新的传感器目标函数,它概括了RNN-T丢失来接受标签的图形表示,从而提供灵活和有效的框架来操纵训练格子,例如用于限制对齐或研究不同的转换规则。我们证明,与标准RNN-T相比,具有CTC样格子的基于传感器的ASR实现了更好的结果,同时确保了严格的单调对齐,这将允许更好地优化解码过程。例如,所提出的CTC样换能器系统对于测试 - LibrisPeech的其他条件,实现了5.9%的字误差率,相对于基于等效的RNN-T系统的提高,对应于4.8%。
translated by 谷歌翻译
常见的策略梯度方法依赖于代理函数序列的最大化。近年来,已经提出了许多这样的代理功能,大多数没有强烈的理论担保,导致TRPO,PPO或MPO等算法。我们而不是设计另一个代理函数,而是根据功能镜中的函数提出一般框架(FMA-PG),这导致了整个代理功能。我们构建了使策略改进保证能够担保的代理功能,这是由最现有的代理职能共享的属性。至关重要,无论政策参数化的选择如何,这些保证都会持有。此外,FMA-PG的特定实例化恢复了重要的实施启发式(例如,使用前向VS反向KL发散),导致TRPO的变体具有额外的理想性质。通过对简单强盗问题的实验,我们评估FMA-PG实例化的算法。拟议的框架还提出了一种改进的PPO变体,其鲁棒性和效率我们在Mujoco套件上证明。
translated by 谷歌翻译
我们研究了随机双线性最小利益的优化问题,呈现了恒定步长的相同样本随机以(SEG)方法的分析,并呈现了产生有利收敛的方法的变化。在锐度对比度与基本的SEG方法相比,其最后迭代仅对纳什均衡的固定邻域,SEG以相同的标准设置在相同的标准设置下可被提供给NASH均衡的迭代,并且通过结合预定,进一步提高了这种速率重新启动程序。在插值环境中,噪声在纳什均衡消失时,我们达到了最佳的常量收敛速度。我们展示了验证我们理论发现的数值实验,并在配备迭代平均和重启时证明SEG方法的有效性。
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
As various city agencies and mobility operators navigate toward innovative mobility solutions, there is a need for strategic flexibility in well-timed investment decisions in the design and timing of mobility service regions, i.e. cast as "real options" (RO). This problem becomes increasingly challenging with multiple interacting RO in such investments. We propose a scalable machine learning based RO framework for multi-period sequential service region design & timing problem for mobility-on-demand services, framed as a Markov decision process with non-stationary stochastic variables. A value function approximation policy from literature uses multi-option least squares Monte Carlo simulation to get a policy value for a set of interdependent investment decisions as deferral options (CR policy). The goal is to determine the optimal selection and timing of a set of zones to include in a service region. However, prior work required explicit enumeration of all possible sequences of investments. To address the combinatorial complexity of such enumeration, we propose a new variant "deep" RO policy using an efficient recurrent neural network (RNN) based ML method (CR-RNN policy) to sample sequences to forego the need for enumeration, making network design & timing policy tractable for large scale implementation. Experiments on multiple service region scenarios in New York City (NYC) shows the proposed policy substantially reduces the overall computational cost (time reduction for RO evaluation of > 90% of total investment sequences is achieved), with zero to near-zero gap compared to the benchmark. A case study of sequential service region design for expansion of MoD services in Brooklyn, NYC show that using the CR-RNN policy to determine optimal RO investment strategy yields a similar performance (0.5% within CR policy value) with significantly reduced computation time (about 5.4 times faster).
translated by 谷歌翻译