我们提供了一个理论框架来研究我们称之为单发概括的现象。这种现象是指算法在单个任务中执行转移学习的能力,这意味着它正确地对训练集中具有单个示例的测试点进行了分类。我们提出了一个简单的数据模型,并使用它以两种方式研究这种现象。首先,我们证明了一种非反应基碱线 - 基于最近的邻分类的内核方法无法执行单发概括,而与核的选择无关,并且训练集的大小。其次,我们从经验上表明,我们数据模型最直接的神经网络体系结构几乎完美地执行了单发概括。这种鲜明的差异使我们相信,单发概括机制是对神经网络的经验成功的部分原因。
translated by 谷歌翻译
Knowledge distillation (KD) has gained a lot of attention in the field of model compression for edge devices thanks to its effectiveness in compressing large powerful networks into smaller lower-capacity models. Online distillation, in which both the teacher and the student are learning collaboratively, has also gained much interest due to its ability to improve on the performance of the networks involved. The Kullback-Leibler (KL) divergence ensures the proper knowledge transfer between the teacher and student. However, most online KD techniques present some bottlenecks under the network capacity gap. By cooperatively and simultaneously training, the models the KL distance becomes incapable of properly minimizing the teacher's and student's distributions. Alongside accuracy, critical edge device applications are in need of well-calibrated compact networks. Confidence calibration provides a sensible way of getting trustworthy predictions. We propose BD-KD: Balancing of Divergences for online Knowledge Distillation. We show that adaptively balancing between the reverse and forward divergences shifts the focus of the training strategy to the compact student network without limiting the teacher network's learning process. We demonstrate that, by performing this balancing design at the level of the student distillation loss, we improve upon both performance accuracy and calibration of the compact student network. We conducted extensive experiments using a variety of network architectures and show improvements on multiple datasets including CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet. We illustrate the effectiveness of our approach through comprehensive comparisons and ablations with current state-of-the-art online and offline KD techniques.
translated by 谷歌翻译
Objective: Imbalances of the electrolyte concentration levels in the body can lead to catastrophic consequences, but accurate and accessible measurements could improve patient outcomes. While blood tests provide accurate measurements, they are invasive and the laboratory analysis can be slow or inaccessible. In contrast, an electrocardiogram (ECG) is a widely adopted tool which is quick and simple to acquire. However, the problem of estimating continuous electrolyte concentrations directly from ECGs is not well-studied. We therefore investigate if regression methods can be used for accurate ECG-based prediction of electrolyte concentrations. Methods: We explore the use of deep neural networks (DNNs) for this task. We analyze the regression performance across four electrolytes, utilizing a novel dataset containing over 290000 ECGs. For improved understanding, we also study the full spectrum from continuous predictions to binary classification of extreme concentration levels. To enhance clinical usefulness, we finally extend to a probabilistic regression approach and evaluate different uncertainty estimates. Results: We find that the performance varies significantly between different electrolytes, which is clinically justified in the interplay of electrolytes and their manifestation in the ECG. We also compare the regression accuracy with that of traditional machine learning models, demonstrating superior performance of DNNs. Conclusion: Discretization can lead to good classification performance, but does not help solve the original problem of predicting continuous concentration levels. While probabilistic regression demonstrates potential practical usefulness, the uncertainty estimates are not particularly well-calibrated. Significance: Our study is a first step towards accurate and reliable ECG-based prediction of electrolyte concentration levels.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
噪声的去除或取消对成像和声学具有广泛的应用。在日常生活中,Denoising甚至可能包括对地面真理不忠的生成方面。但是,对于科学应用,denoing必须准确地重现地面真相。在这里,我们展示了如何通过深层卷积神经网络来定位数据,从而以定量精度出现弱信号。特别是,我们研究了晶体材料的X射线衍射。我们证明,弱信号是由电荷排序引起的,在嘈杂的数据中微不足道的信号,在DeNo的数据中变得可见和准确。通过对深度神经网络的监督培训,具有成对的低噪声数据,可以通过监督培训来实现这一成功。这样,神经网络就可以了解噪声的统计特性。我们证明,使用人造噪声(例如泊松和高斯)不会产生这种定量准确的结果。因此,我们的方法说明了一种实用的噪声过滤策略,可以应用于具有挑战性的获取问题。
translated by 谷歌翻译
知识蒸馏(KD)是压缩边缘设备深层分类模型的有效工具。但是,KD的表现受教师和学生网络之间较大容量差距的影响。最近的方法已诉诸KD的多个教师助手(TA)设置,该设置依次降低了教师模型的大小,以相对弥合这些模型之间的尺寸差距。本文提出了一种称为“知识蒸馏”课程专家选择的新技术,以有效地增强在容量差距问题下对紧凑型学生的学习。该技术建立在以下假设的基础上:学生网络应逐渐使用分层的教学课程来逐步指导,因为它可以从较低(较高的)容量教师网络中更好地学习(硬)数据样本。具体而言,我们的方法是一种基于TA的逐渐的KD技术,它每个输入图像选择单个教师,该课程是基于通过对图像进行分类的难度驱动的课程的。在这项工作中,我们凭经验验证了我们的假设,并对CIFAR-10,CIFAR-100,CINIC-10和Imagenet数据集进行了严格的实验,并在类似VGG的模型,Resnets和WideresNets架构上显示出提高的准确性。
translated by 谷歌翻译
通用数据模型解决了标准化电子健康记录(EHR)数据的许多挑战,但无法将其集成深度表型所需的资源。开放的生物学和生物医学本体论(OBO)铸造本体论提供了可用于生物学知识的语义计算表示,并能够整合多种生物医学数据。但是,将EHR数据映射到OBO Foundry本体论需要大量的手动策展和域专业知识。我们介绍了一个框架,用于将观察性医学成果合作伙伴关系(OMOP)标准词汇介绍给OBO铸造本体。使用此框架,我们制作了92,367条条件,8,615种药物成分和10,673个测量结果的映射。域专家验证了映射准确性,并且在24家医院进行检查时,映射覆盖了99%的条件和药物成分和68%的测量结果。最后,我们证明OMOP2OBO映射可以帮助系统地识别可能受益于基因检测的未诊断罕见病患者。
translated by 谷歌翻译
基于BERT的微调模型在内存,计算和时间上是资源密集的。尽管许多先前的工作旨在通过压缩技术(例如修剪)提高推论效率,但这些作品并未明确解决培训对下游任务的计算挑战。我们介绍了学习者模块和启动,新颖的方法,以利用预训练的语言模型的过度参数化,以获得收敛速度和资源利用率的好处。学习者模块通过微调参数的微调来导航1)有效训练的双结合,以及2)通过确保快速收敛和高度度量得分有效训练。我们在Distilbert上的结果表明,学习者在与基础方面的表现或超过基线。学习者训练7倍的参数比胶水上的最新方法少。在可乐方面,学习者快速调整20%,并且资源利用率显着降低。
translated by 谷歌翻译
传统的视听模型具有独立的音频和视频分支。我们设计了一个统一的音频和视频处理模型,称为统一音频 - 视听模型(UAVM)。在本文中,我们描述了UAVM,报告其在VGGSOUND上的新最新音频事件分类精度为65.8%,并描述模型的有趣属性。
translated by 谷歌翻译
源自建筑环境的规律性的线性透视图可用于在线重新校准内在和外在的摄像机参数,但是由于场景中的不规则性,线段估计和背景混乱中的不确定性,这些估计值可能是不可靠的。在这里,我们通过四个计划来应对这一挑战。首先,我们使用PanoContext全景图数据集[27]来策划一个新颖而逼真的平面投影数据集,这些数据集在广泛的场景,焦距和相机姿势上。其次,我们使用这个新颖的数据集和YorkurbandB [4]来系统地评估文献中经常发现的线性透视偏差度量,并表明偏差度量和可能性模型的选择对可靠性具有巨大的影响。第三,我们使用这些发现来创建一个用于在线摄像机校准的新型系统,我们称之为fr,并表明它的表现优于先前的最新状态,从而大大减少了估计的摄像机旋转和焦距的错误。我们的第四个贡献是一种新颖有效的方法来估计不确定性,可以通过战略性地选择用于重新校准的哪种框架来大大提高对性能至关重要的应用程序的在线可靠性。
translated by 谷歌翻译