AI正在经历范式转变,随着模型的兴起(例如Bert,Dall-E,GPT-3),这些模型经过大规模的数据训练,并且可以适应广泛的下游任务。我们称这些模型基础模型来强调其至关重要但不完整的特征。该报告提供了基础模型的机会和风险的详尽说明,包括其功能(例如语言,愿景,机器人技术,推理,人类互动)和技术原则(例如,模型架构,培训程序,数据,系统,安全,安全性,评估,理论)对其应用(例如法律,医疗保健,教育)和社会影响(例如不平等,滥用,经济和环境影响,法律和道德考虑)。尽管基础模型基于标准的深度学习和转移学习,但它们的规模导致了新的新兴能力,以及它们在许多任务中的有效性都激发了同质化。同质化提供了强大的杠杆作用,但要求谨慎,因为基础模型的缺陷均由下游的所有适应模型继承。尽管即将广泛地部署基础模型,但我们目前对它们的工作方式,失败以及由于其新兴属性的影响而缺乏清晰的了解。为了解决这些问题,我们认为基础模型的许多批判性研究都需要与他们的基本社会技术性质相称。
translated by 谷歌翻译
虽然胸部X射线解释的深度学习模型通常在自动放射学报告贴标程序生成的标签上培训,但还没有系统地研究了报告标签的改进对胸部X射线分类模型的性能的影响。我们首先比较Chexpert,Chexbert和VisualChexbert贴标程序从放射学报告中提取精确的胸X射线图像标签的任务,报告了VisualChexbert标签人优于Chexpert和Chexbert贴标者。接下来,在培训图像分类模型之后,使用不同放射学报告贴标程序的标签在胸部X射线的一个最大数据集之一上,我们表明,VisualChexbert标签器培训的图像分类模型从VisualChexbert贴标程序达到了从标签培训的图像分类模型Chexpert和Chexbert贴标员。我们的工作表明,最近的放射学报告标签的改进可以转化为更高的表演胸部X射线分类模型的发展。
translated by 谷歌翻译
Multiple studies have focused on predicting the prospective popularity of an online document as a whole, without paying attention to the contributions of its individual parts. We introduce the task of proactively forecasting popularities of sentences within online news documents solely utilizing their natural language content. We model sentence-specific popularity forecasting as a sequence regression task. For training our models, we curate InfoPop, the first dataset containing popularity labels for over 1.7 million sentences from over 50,000 online news documents. To the best of our knowledge, this is the first dataset automatically created using streams of incoming search engine queries to generate sentence-level popularity annotations. We propose a novel transfer learning approach involving sentence salience prediction as an auxiliary task. Our proposed technique coupled with a BERT-based neural model exceeds nDCG values of 0.8 for proactive sentence-specific popularity forecasting. Notably, our study presents a non-trivial takeaway: though popularity and salience are different concepts, transfer learning from salience prediction enhances popularity forecasting. We release InfoPop and make our code publicly available: https://github.com/sayarghoshroy/InfoPopularity
translated by 谷歌翻译
Speech systems are sensitive to accent variations. This is especially challenging in the Indian context, with an abundance of languages but a dearth of linguistic studies characterising pronunciation variations. The growing number of L2 English speakers in India reinforces the need to study accents and L1-L2 interactions. We investigate the accents of Indian English (IE) speakers and report in detail our observations, both specific and common to all regions. In particular, we observe the phonemic variations and phonotactics occurring in the speakers' native languages and apply this to their English pronunciations. We demonstrate the influence of 18 Indian languages on IE by comparing the native language pronunciations with IE pronunciations obtained jointly from existing literature studies and phonetically annotated speech of 80 speakers. Consequently, we are able to validate the intuitions of Indian language influences on IE pronunciations by justifying pronunciation rules from the perspective of Indian language phonology. We obtain a comprehensive description in terms of universal and region-specific characteristics of IE, which facilitates accent conversion and adaptation of existing ASR and TTS systems to different Indian accents.
translated by 谷歌翻译
In molecular research, simulation \& design of molecules are key areas with significant implications for drug development, material science, and other fields. Current classical computational power falls inadequate to simulate any more than small molecules, let alone protein chains on hundreds of peptide. Therefore these experiment are done physically in wet-lab, but it takes a lot of time \& not possible to examine every molecule due to the size of the search area, tens of billions of dollars are spent every year in these research experiments. Molecule simulation \& design has lately advanced significantly by machine learning models, A fresh perspective on the issue of chemical synthesis is provided by deep generative models for graph-structured data. By optimising differentiable models that produce molecular graphs directly, it is feasible to avoid costly search techniques in the discrete and huge space of chemical structures. But these models also suffer from computational limitations when dimensions become huge and consume huge amount of resources. Quantum Generative machine learning in recent years have shown some empirical results promising significant advantages over classical counterparts.
translated by 谷歌翻译
Developing and least developed countries face the dire challenge of ensuring that each child in their country receives required doses of vaccination, adequate nutrition and proper medication. International agencies such as UNICEF, WHO and WFP, among other organizations, strive to find innovative solutions to determine which child has received the benefits and which have not. Biometric recognition systems have been sought out to help solve this problem. To that end, this report establishes a baseline accuracy of a commercial contactless palmprint recognition system that may be deployed for recognizing children in the age group of one to five years old. On a database of contactless palmprint images of one thousand unique palms from 500 children, we establish SOTA authentication accuracy of 90.85% @ FAR of 0.01%, rank-1 identification accuracy of 99.0% (closed set), and FPIR=0.01 @ FNIR=0.3 for open-set identification using PalmMobile SDK from Armatura.
translated by 谷歌翻译
Selective classification involves identifying the subset of test samples that a model can classify with high accuracy, and is important for applications such as automated medical diagnosis. We argue that this capability of identifying uncertain samples is valuable for training classifiers as well, with the aim of building more accurate classifiers. We unify these dual roles by training a single auxiliary meta-network to output an importance weight as a function of the instance. This measure is used at train time to reweight training data, and at test-time to rank test instances for selective classification. A second, key component of our proposal is the meta-objective of minimizing dropout variance (the variance of classifier output when subjected to random weight dropout) for training the metanetwork. We train the classifier together with its metanetwork using a nested objective of minimizing classifier loss on training data and meta-loss on a separate meta-training dataset. We outperform current state-of-the-art on selective classification by substantial margins--for instance, upto 1.9% AUC and 2% accuracy on a real-world diabetic retinopathy dataset. Finally, our meta-learning framework extends naturally to unsupervised domain adaptation, given our unsupervised variance minimization meta-objective. We show cumulative absolute gains of 3.4% / 3.3% accuracy and AUC over the other baselines in domain shift settings on the Retinopathy dataset using unsupervised domain adaptation.
translated by 谷歌翻译
Many real-world learning scenarios face the challenge of slow concept drift, where data distributions change gradually over time. In this setting, we pose the problem of learning temporally sensitive importance weights for training data, in order to optimize predictive accuracy. We propose a class of temporal reweighting functions that can capture multiple timescales of change in the data, as well as instance-specific characteristics. We formulate a bi-level optimization criterion, and an associated meta-learning algorithm, by which these weights can be learned. In particular, our formulation trains an auxiliary network to output weights as a function of training instances, thereby compactly representing the instance weights. We validate our temporal reweighting scheme on a large real-world dataset of 39M images spread over a 9 year period. Our extensive experiments demonstrate the necessity of instance-based temporal reweighting in the dataset, and achieve significant improvements to classical batch-learning approaches. Further, our proposal easily generalizes to a streaming setting and shows significant gains compared to recent continual learning methods.
translated by 谷歌翻译
Intelligently extracting and linking complex scientific information from unstructured text is a challenging endeavor particularly for those inexperienced with natural language processing. Here, we present a simple sequence-to-sequence approach to joint named entity recognition and relation extraction for complex hierarchical information in scientific text. The approach leverages a pre-trained large language model (LLM), GPT-3, that is fine-tuned on approximately 500 pairs of prompts (inputs) and completions (outputs). Information is extracted either from single sentences or across sentences in abstracts/passages, and the output can be returned as simple English sentences or a more structured format, such as a list of JSON objects. We demonstrate that LLMs trained in this way are capable of accurately extracting useful records of complex scientific knowledge for three representative tasks in materials chemistry: linking dopants with their host materials, cataloging metal-organic frameworks, and general chemistry/phase/morphology/application information extraction. This approach represents a simple, accessible, and highly-flexible route to obtaining large databases of structured knowledge extracted from unstructured text. An online demo is available at http://www.matscholar.com/info-extraction.
translated by 谷歌翻译
Representing and reasoning about uncertainty is crucial for autonomous agents acting in partially observable environments with noisy sensors. Partially observable Markov decision processes (POMDPs) serve as a general framework for representing problems in which uncertainty is an important factor. Online sample-based POMDP methods have emerged as efficient approaches to solving large POMDPs and have been shown to extend to continuous domains. However, these solutions struggle to find long-horizon plans in problems with significant uncertainty. Exploration heuristics can help guide planning, but many real-world settings contain significant task-irrelevant uncertainty that might distract from the task objective. In this paper, we propose STRUG, an online POMDP solver capable of handling domains that require long-horizon planning with significant task-relevant and task-irrelevant uncertainty. We demonstrate our solution on several temporally extended versions of toy POMDP problems as well as robotic manipulation of articulated objects using a neural perception frontend to construct a distribution of possible models. Our results show that STRUG outperforms the current sample-based online POMDP solvers on several tasks.
translated by 谷歌翻译