在本文中,我们分享了我们努力建立能够翻译一千多种语言的实用机器翻译(MT)系统的发现。我们在三个研究领域中描述了结果:(i)通过利用半监督预训练的语言识别和开发数据驱动的过滤技术来构建1500多种语言的清洁,网挖数据集; (ii)通过利用大规模的多语言模型来开发用于服务不足的语言的实用MT模型,该模型训练了有监督的并行数据,以使用100多种高资源语言和单语言数据集,以增加1000多种语言; (iii)研究这些语言的评估指标的局限性,并对我们MT模型的输出进行定性分析,突出显示了这些类型模型的几种频繁误差模式。我们希望我们的工作为旨在为当前研究的语言构建MT系统的从业者提供有用的见解,并突出显示可以补充Data-Sparse设置中大量多语言模型的弱点的研究方向。
translated by 谷歌翻译
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference -sometimes prohibitively so in the case of very large data sets and large models. Several authors have also charged that NMT systems lack robustness, particularly when input sentences contain rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using residual connections as well as attention connections from the decoder network to the encoder. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. To directly optimize the translation BLEU scores, we consider refining the models by using reinforcement learning, but we found that the improvement in the BLEU scores did not reflect in the human evaluation. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.
translated by 谷歌翻译
Creating compelling captions for data visualizations has been a longstanding challenge. Visualization researchers are typically untrained in journalistic reporting and hence the captions that are placed below data visualizations tend to be not overly engaging and rather just stick to basic observations about the data. In this work we explore the opportunities offered by the newly emerging crop of large language models (LLM) which use sophisticated deep learning technology to produce human-like prose. We ask, can these powerful software devices be purposed to produce engaging captions for generic data visualizations like a scatterplot. It turns out that the key challenge lies in designing the most effective prompt for the LLM, a task called prompt engineering. We report on first experiments using the popular LLM GPT-3 and deliver some promising results.
translated by 谷歌翻译
We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. By leveraging a quantized representation of speech as a target, Mu$^{2}$SLAM trains the speech-text models with a sequence-to-sequence masked denoising objective similar to T5 on the decoder and a masked language modeling (MLM) objective on the encoder, for both unlabeled speech and text, while utilizing the supervised tasks to improve cross-lingual and cross-modal representation alignment within the model. On CoVoST AST, Mu$^{2}$SLAM establishes a new state-of-the-art for models trained on public datasets, improving on xx-en translation over the previous best by 1.9 BLEU points and on en-xx translation by 1.1 BLEU points. On Voxpopuli ASR, our model matches the performance of an mSLAM model fine-tuned with an RNN-T decoder, despite using a relatively weaker sequence-to-sequence architecture. On text understanding tasks, our model improves by more than 6\% over mSLAM on XNLI, getting closer to the performance of mT5 models of comparable capacity on XNLI and TydiQA, paving the way towards a single model for all speech and text understanding tasks.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
We report on experiments for the fingerprint modality conducted during the First BioSecure Residential Workshop. Two reference systems for fingerprint verification have been tested together with two additional non-reference systems. These systems follow different approaches of fingerprint processing and are discussed in detail. Fusion experiments I volving different combinations of the available systems are presented. The experimental results show that the best recognition strategy involves both minutiae-based and correlation-based measurements. Regarding the fusion experiments, the best relative improvement is obtained when fusing systems that are based on heterogeneous strategies for feature extraction and/or matching. The best combinations of two/three/four systems always include the best individual systems whereas the best verification performance is obtained when combining all the available systems.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
在许多现实世界和高影响力决策设置中,从分类过程中说明预测性不确定性的神经网络的概率预测至关重要。但是,实际上,大多数数据集经过非稳定神经网络的培训,默认情况下,这些神经网络不会捕获这种固有的不确定性。这个众所周知的问题导致了事后校准程序的开发,例如PLATT缩放(Logistic),等渗和β校准,这将得分转化为校准良好的经验概率。校准方法的合理替代方法是使用贝叶斯神经网络,该网络直接建模预测分布。尽管它们已应用于图像和文本数据集,但在表格和小型数据制度中的采用有限。在本文中,我们证明了与校准神经网络相比,贝叶斯神经网络在各种数据集中进行实验,从而产生竞争性能。
translated by 谷歌翻译
在汽车行业中,标记数据的匮乏是典型的挑战。注释的时间序列测量需要固体域知识和深入的探索性数据分析,这意味着高标签工作。传统的主动学习(AL)通过根据估计的分类概率积极查询最有用的实例来解决此问题,并在迭代中重新审视该模型。但是,学习效率强烈依赖于初始模型,从而导致初始数据集和查询编号的大小之间的权衡。本文提出了一个新颖的几杆学习(FSL)基于AL框架,该框架通过将原型网络(Protonet)纳入AL迭代来解决权衡问题。一方面,结果表明了对初始模型的鲁棒性,另一方面,通过在每种迭代中的支持设置的主动选择方面的学习效率。该框架已在UCI HAR/HAPT数据集​​和现实世界制动操纵数据集上进行了验证。学习绩效在两个数据集上都显着超过了传统的AL算法,分别以10%和5%的标签工作实现了90%的分类精度。
translated by 谷歌翻译
非平稳来源分离是具有许多不同方法的盲源分离的一个完善的分支。但是,对于这些方法都没有大样本结果可用。为了弥合这一差距,我们开发了NSS-JD的大样本理论,NSS-JD是一种基于块构成协方差矩阵的联合对角线化的非平稳源分离方法。我们在独立高斯非平稳源信号的瞬时线性混合模型以及一组非常通用的假设下工作:除了有界条件外,我们做出的唯一假设是,源表现出有限的依赖性,其方差函数足够差异,足以差异为渐近可分离。在以前的条件下显示,未混合估计器及其在标准平方根速率下的限制高斯分布的一致性显示。模拟实验用于验证理论结果并研究块长度对分离的影响。
translated by 谷歌翻译