Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs (>70%) and significantly outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions (>65%). We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another.
translated by 谷歌翻译
我们提出了一个基于最小描述长度(MDL)原理的多任务加固学习的新颖框架。在我们称MDL-Control(MDL-C)的这种方法中,代理商在面临的任务中学习了共同的结构,然后将其提炼成更简单的表示,从而促进更快的收敛性和对新任务的概括。这样一来,MDL-C自然将适应性适应与任务分布的认知不确定性平衡。我们通过MDL原理与贝叶斯推论之间的正式联系来激励MDL-C,得出理论性能保证,并在离散和高维连续控制任务上证明了MDL-C的经验有效性。从经验上讲,该框架用于修改现有的策略优化方法,并在离散和高维连续控制问题中改善其多任务性能。
translated by 谷歌翻译
人工智能系统越来越涉及持续学习,以实现在系统培训期间不遇到的一般情况下的灵活性。与自治系统的人类互动广泛研究,但在系统积极学习的同时,研究发生了迄今为止发生的互动,并且可以在几分钟内明显改变其行为。在这项试验研究中,我们调查如何在代理商发展能力时如何发展人类和不断学习的预测代理人之间的互动。此外,我们可以比较两个不同的代理架构来评估代理设计中的代表性选择如何影响人工代理交互。我们开发虚拟现实环境和基于时间的预测任务,其中从增强学习(RL)算法增强人类预测中学到的预测。我们评估参与者在此任务中的性能和行为如何在代理类型中不同,使用定量和定性分析。我们的研究结果表明,系统的人类信任可能受到与代理人的早期互动的影响,并且反过来的信任会影响战略行为,但试点研究的限制排除了任何结论的声明。我们将信任作为互动的关键特征,以考虑基于RL的技术在考虑基于RL的技术时,并对这项研究进行了几项建议,以准备更大规模的调查。本文的视频摘要可以在https://youtu.be/ovyjdnbqtwq找到。
translated by 谷歌翻译
There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .
translated by 谷歌翻译
Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.
translated by 谷歌翻译
In unstructured environments, robots run the risk of unexpected collisions. How well they react to these events is determined by how transparent they are to collisions. Transparency is affected by structural properties as well as sensing and control architectures. In this paper, we propose the collision reflex metric as a way to formally quantify transparency. It is defined as the total impulse transferred in collision, which determines the collision mitigation capabilities of a closed-loop robotic system taking into account structure, sensing, and control. We analyze the effect of motor scaling, stiffness, and configuration on the collision reflex of a system using an analytical model. Physical experiments using the move-until-touch behavior are conducted to compare the collision reflex of direct-drive and quasi-direct-drive actuators and robotic hands (Schunk WSG-50 and Dexterous DDHand.) For transparent systems, we see a counter-intuitive trend: the impulse may be lower at higher pre-impact velocities.
translated by 谷歌翻译
Deep neural networks (DNNs) detect patterns in data and have shown versatility and strong performance in many computer vision applications. However, DNNs alone are susceptible to obvious mistakes that violate simple, common sense concepts and are limited in their ability to use explicit knowledge to guide their search and decision making. While overall DNN performance metrics may be good, these obvious errors, coupled with a lack of explainability, have prevented widespread adoption for crucial tasks such as medical image analysis. The purpose of this paper is to introduce SimpleMind, an open-source software framework for Cognitive AI focused on medical image understanding. It allows creation of a knowledge base that describes expected characteristics and relationships between image objects in an intuitive human-readable form. The SimpleMind framework brings thinking to DNNs by: (1) providing methods for reasoning with the knowledge base about image content, such as spatial inferencing and conditional reasoning to check DNN outputs; (2) applying process knowledge, in the form of general-purpose software agents, that are chained together to accomplish image preprocessing, DNN prediction, and result post-processing, and (3) performing automatic co-optimization of all knowledge base parameters to adapt agents to specific problems. SimpleMind enables reasoning on multiple detected objects to ensure consistency, providing cross checking between DNN outputs. This machine reasoning improves the reliability and trustworthiness of DNNs through an interpretable model and explainable decisions. Example applications are provided that demonstrate how SimpleMind supports and improves deep neural networks by embedding them within a Cognitive AI framework.
translated by 谷歌翻译
The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we: - announce the first public release of the astroBERT language model; - show how astroBERT improves over existing public language models on astrophysics specific tasks; - and detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context, to further improve astroBERT.
translated by 谷歌翻译
The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgment that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce the cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, DR.BENCH, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models. Experiments with state-of-the-art pre-trained generative language models using large general domain models and models that were continually trained on a medical corpus demonstrate opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community.
translated by 谷歌翻译
随着超维数据的大数据分析的最新激增,对机器学习应用程序的降低技术的兴趣重新引起了人们的兴趣。为了使这些方法提高绩效提高并了解基础数据,需要确定适当的指标。此步骤通常被忽略,通常会选择指标,而无需考虑数据的基本几何形状。在本文中,我们提出了一种将弹性指标纳入T分布的随机邻居嵌入(T-SNE)和均匀的歧管近似和投影(UMAP)的方法。我们将方法应用于功能数据,该功能数据以旋转,参数化和比例为特征。如果这些属性被忽略,它们可能会导致不正确的分析和分类性能差。通过我们的方法,我们证明了三个基准数据集(MPEG-7,CAR数据集和Themoor的平面数据集)的形状识别任务的提高,我们分别获得了0.77、0.95和1.00 F1分数。
translated by 谷歌翻译