不变的风险最小化(IRM)框架旨在从一组环境中学习不变的功能,以解决分发超出(OOD)泛化问题。底层假设是数据生成分布的因果组件在环境中仍然是常量,或者交替地,跨环境中的数据“重叠”以找到有意义的不变功能。因此,当“重叠”假设不保持时,一组真正不变的特征可能不足以以获得最佳预测性能。这种情况自然地出现在网络设置和分层数据生成模型中,其中IRM性能变为次优。为了减轻这种故障情况,我们争论部分不变性框架。关键的想法是通过基于分层差异对环境进行分区来引入IRM框架的灵活性,同时在分区内本地实施不变性。我们在分类设置中激励此框架,其中包括跨环境的因果分布。我们的结果表明,部分不变风险最小化的能力,以减轻在某些环境中的公平性和风险之间的权衡。
translated by 谷歌翻译
Large pre-trained models, such as Bert, GPT, and Wav2Vec, have demonstrated great potential for learning representations that are transferable to a wide variety of downstream tasks . It is difficult to obtain a large quantity of supervised data due to the limited availability of resources and time. In light of this, a significant amount of research has been conducted in the area of adopting large pre-trained datasets for diverse downstream tasks via fine tuning, linear probing, or prompt tuning in low resource settings. Normalization techniques are essential for accelerating training and improving the generalization of deep neural networks and have been successfully used in a wide variety of applications. A lot of normalization techniques have been proposed but the success of normalization in low resource downstream NLP and speech tasks is limited. One of the reasons is the inability to capture expressiveness by rescaling parameters of normalization. We propose KullbackLeibler(KL) Regularized normalization (KL-Norm) which make the normalized data well behaved and helps in better generalization as it reduces over-fitting, generalises well on out of domain distributions and removes irrelevant biases and features with negligible increase in model parameters and memory overheads. Detailed experimental evaluation on multiple low resource NLP and speech tasks, demonstrates the superior performance of KL-Norm as compared to other popular normalization and regularization techniques.
translated by 谷歌翻译
We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. By leveraging a quantized representation of speech as a target, Mu$^{2}$SLAM trains the speech-text models with a sequence-to-sequence masked denoising objective similar to T5 on the decoder and a masked language modeling (MLM) objective on the encoder, for both unlabeled speech and text, while utilizing the supervised tasks to improve cross-lingual and cross-modal representation alignment within the model. On CoVoST AST, Mu$^{2}$SLAM establishes a new state-of-the-art for models trained on public datasets, improving on xx-en translation over the previous best by 1.9 BLEU points and on en-xx translation by 1.1 BLEU points. On Voxpopuli ASR, our model matches the performance of an mSLAM model fine-tuned with an RNN-T decoder, despite using a relatively weaker sequence-to-sequence architecture. On text understanding tasks, our model improves by more than 6\% over mSLAM on XNLI, getting closer to the performance of mT5 models of comparable capacity on XNLI and TydiQA, paving the way towards a single model for all speech and text understanding tasks.
translated by 谷歌翻译
End-to-end text-to-speech (TTS) systems have been developed for European languages like English and Spanish with state-of-the-art speech quality, prosody, and naturalness. However, development of end-to-end TTS for Indian languages is lagging behind in terms of quality. The challenges involved in such a task are: 1) scarcity of quality training data; 2) low efficiency during training and inference; 3) slow convergence in the case of large vocabulary size. In our work reported in this paper, we have investigated the use of fine-tuning the English-pretrained Tacotron2 model with limited Sanskrit data to synthesize natural sounding speech in Sanskrit in low resource settings. Our experiments show encouraging results, achieving an overall MOS of 3.38 from 37 evaluators with good Sanskrit spoken knowledge. This is really a very good result, considering the fact that the speech data we have used is of duration 2.5 hours only.
translated by 谷歌翻译
Our education system comprises a series of curricula. For example, when we learn mathematics at school, we learn in order from addition, to multiplication, and later to integration. Delineating a curriculum for teaching either a human or a machine shares the underlying goal of maximizing the positive knowledge transfer from early to later tasks and minimizing forgetting of the early tasks. Here, we exhaustively surveyed the effect of curricula on existing continual learning algorithms in the class-incremental setting, where algorithms must learn classes one at a time from a continuous stream of data. We observed that across a breadth of possible class orders (curricula), curricula influence the retention of information and that this effect is not just a product of stochasticity. Further, as a primary effort toward automated curriculum design, we proposed a method capable of designing and ranking effective curricula based on inter-class feature similarities. We compared the predicted curricula against empirically determined effectual curricula and observed significant overlaps between the two. To support the study of a curriculum designer, we conducted a series of human psychophysics experiments and contributed a new Continual Learning benchmark in object recognition. We assessed the degree of agreement in effective curricula between humans and machines. Surprisingly, our curriculum designer successfully predicts an optimal set of curricula that is effective for human learning. There are many considerations in curriculum design, such as timely student feedback and learning with multiple modalities. Our study is the first attempt to set a standard framework for the community to tackle the problem of teaching humans and machines to learn to learn continuously.
translated by 谷歌翻译
在本文中,我们通过神经生成编码的神经认知计算框架(NGC)提出了一种无反向传播的方法,以机器人控制(NGC),设计了一种完全由强大的预测性编码/处理电路构建的代理,体现计划的原则。具体而言,我们制作了一种自适应剂系统,我们称之为主动预测性编码(ACTPC),该系统可以平衡内部生成的认知信号(旨在鼓励智能探索)与内部生成的仪器信号(旨在鼓励寻求目标行为)最终学习如何使用现实的机器人模拟器(即超现实的机器人套件)来控制各种模拟机器人系统以及复杂的机器人臂,以解决块提升任务并可能选择问题。值得注意的是,我们的实验结果表明,我们提出的ACTPC代理在面对稀疏(外部)奖励信号方面表现良好,并且具有竞争力或竞争性或胜过几种强大的基于反向Prop的RL方法。
translated by 谷歌翻译
机器学习中的超参数优化通常是使用只会导致大约一组超参数的幼稚技术来实现的。尽管贝叶斯优化之类的技术在给定超参数的给定域进行了智能搜索,但不能保证最佳解决方案。大多数这些方法的一个主要缺点是用超参数数量增加其搜索域的指数增加,从而增加了计算成本并使方法缓慢。超参数优化问题本质上是双重优化任务,一些研究尝试了解决此问题的双重解决方案方法。但是,这些研究假设了一组独特的模型权重,可以最大程度地减少训练损失,这通常受到深度学习体系结构的影响。本文讨论了一种基于梯度的双层方法,该方法解决了这些缺点以解决超参数优化问题。所提出的方法可以处理我们在实验中选择正则化高参数的连续超参数。该方法保证了本研究已在理论上证明的一组最佳超参数的收敛。该想法基于使用高斯过程回归近似较低级别的最佳值函数。结果,使用增强拉格朗日方法解决的单个级别约束优化任务缩小为单个级别约束优化任务。我们已经对多层感知器和LENET架构进行了有关MNIST和CIFAR-10数据集的广泛计算研究,以证实该方法的效率。一项针对网格搜索,随机搜索,贝叶斯优化和Hyberband方法的比较研究表明,所提出的算法会收敛于较低的计算,并导致模型在测试集上更好地推广。
translated by 谷歌翻译
基于会话的建议系统在会话中捕获用户的短期兴趣。会话上下文(即,会话中用户在会话中的高级兴趣或意图)在大多数数据集中都没有明确给出,并且隐式推断会话上下文作为项目级属性的汇总是粗略的。在本文中,我们提出了ISCON,该ISCON隐含地将会议上下文化。ISCON首先通过创建会话信息图,学习图嵌入和聚类来为会话生成隐式上下文,以将会话分配给上下文。然后,ISCON训练会话上下文预测器,并使用预测上下文的嵌入来增强下一项目的预测准确性。四个数据集的实验表明,ISCON比最新模型具有优越的下一项目预测准确性。REDDIT数据集中的ISCON的案例研究证实,分配的会话上下文是独特而有意义的。
translated by 谷歌翻译
本作者在较早的论文中介绍和研究了上方的粗糙集。在这项研究中,她在两个不同的粒状方向上扩展了这一点,具有令人惊讶的代数语义。颗粒是基于在上指导性下广义封闭的思想,可能被理解为一种弱结果的形式。这产生了满足谨慎单调的近似算子,而pi-groupoidal近似(另外涉及战略选择和代数运算符)具有更好的特性。这项研究主要是由分布式认知观点,真实或虚拟课堂学习环境以及以学生为中心的教学中的概念结构的动机。还提出了涉及上定向关系的数据集的粗糙聚类技术(如Sentinel项目图像数据)。预计这项研究将在相关领域中看到重要的理论和实际应用。
translated by 谷歌翻译
静息状态脑功能活性对非成像表型的单个主体映射是神经影像学的主要目标。当今应用的绝大多数学习方法都取决于静态表示或短期时间相关性。这与动态性的大脑活动性质不符,并且表现出短期和长期依赖性。此外,在单个任务/数据集上已经开发并验证了新的复杂的深度学习方法。这些模型在研究不同目标的研究中的应用通常需要详尽的超参数搜索,模型工程以及反复试验,以通过更简单的线性模型获得竞争结果。反过来,这限制了他们在快速发展的研究领域中的采用和阻碍公平的基准测试。为此,我们提出了fMRI-S4;一种用于分类表型和精神疾病的多功能深度学习模型,该模型来自静止状态功能磁共振成像扫描时间的时间。 fMRI-S4使用1D卷积和最近引入的状态空间模型S4捕获信号中的短距离和长范围时间依赖性。所提出的体系结构在任务/数据集中具有轻巧,样本效率且健壮。我们在三个多站点RS-FMRI数据集上验证了fMRI-S4诊断重大抑郁症(MDD),自闭症谱系障碍(ASD)和性别分类的任务。我们证明fMRI-S4可以在所有三个任务上均优于现有方法,并且可以作为插件和游戏模型进行培训,而无需针对每种设置进行特殊的超散件调整
translated by 谷歌翻译