语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
物体检测在计算机视觉中取得了巨大的进步。具有外观降级的小物体检测是一个突出的挑战,特别是对于鸟瞰观察。为了收集足够的阳性/阴性样本进行启发式训练,大多数物体探测器预设区域锚,以便将交叉联盟(iou)计算在地面判处符号数据上。在这种情况下,小物体经常被遗弃或误标定。在本文中,我们提出了一种有效的动态增强锚(DEA)网络,用于构建新颖的训练样本发生器。与其他最先进的技术不同,所提出的网络利用样品鉴别器来实现基于锚的单元和无锚单元之间的交互式样本筛选,以产生符合资格的样本。此外,通过基于保守的基于锚的推理方案的多任务联合训练增强了所提出的模型的性能,同时降低计算复杂性。所提出的方案支持定向和水平对象检测任务。对两个具有挑战性的空中基准(即,DotA和HRSC2016)的广泛实验表明,我们的方法以适度推理速度和用于训练的计算开销的准确性实现最先进的性能。在DotA上,我们的DEA-NET与ROI变压器的基线集成了0.40%平均平均精度(MAP)的先进方法,以便用较弱的骨干网(Resnet-101 VS Resnet-152)和3.08%平均 - 平均精度(MAP),具有相同骨干网的水平对象检测。此外,我们的DEA网与重新排列的基线一体化实现最先进的性能80.37%。在HRSC2016上,它仅使用3个水平锚点超过1.1%的最佳型号。
translated by 谷歌翻译
近年来,神经网络(NNS)的普及及其在现实世界应用中的普遍性的日益普及引起了人们对其验证的重要性的关注。虽然验证在理论上是计算困难的,但在实践中提出了许多解决该验证的技术。在文献中已经观察到,默认情况下,神经网络很少满足我们想要验证的逻辑约束。良好的行动是在验证验证之前训练给定的NN满足上述约束。这个想法有时被称为持续验证,指训练和验证之间的循环。通常,通过将给定正式逻辑语言的翻译指定为损失功能,可以实现带有约束的培训。然后,这些损失功能用于训练神经网络。因为为了培训目的,这些功能需要可区分,因此这些翻译称为可区分逻辑(DL)。这提出了几个研究问题。什么样的可区分逻辑是可能的?在连续验证的背景下,DL的特定选择有什么区别?从最终损失函数的角度来看,DL的理想标准是什么?在这个扩展的摘要中,我们将讨论并回答这些问题。
translated by 谷歌翻译
可激发的光电设备代表了在神经形态(脑启发)光子系统中实施人工尖峰神经元的关键构件之一。这项工作介绍并实验研究了用谐振隧穿二极管(RTD)构建的光电 - 光学(O/E/O)人工神经元,该神经元(RTD)耦合到光电探测器作为接收器和垂直腔表面发射激光器作为发射机。我们证明了一个明确定义的兴奋性阈值,在此上面,该神经元在该神经元中产生100 ns的光学尖峰反应,具有特征性的神经样耐受性。我们利用其粉丝功能来执行设备中的重合检测(逻辑和)以及独家逻辑或(XOR)任务。这些结果提供了基于RTD的Spiking光电神经元的确定性触发和任务的首次实验验证,并具有输入和输出光学(I/O)终端。此外,我们还从理论上研究了拟议系统的纳米光子实施的前景,并结合了纳米级RTD元素和纳米剂的整体设计。因此,在未来的神经形态光子硬件中,证明了基于RTD的综合兴奋节点对低足迹,高速光电尖峰神经元的潜力。
translated by 谷歌翻译
随着深度机器学习对现实生活应用的扩散,该技术的一种特殊属性引起了人们的注意:稳健性神经网络臭名昭著地表现出低的鲁棒性,并且对小输入扰动非常敏感。最近,已经提出了许多用于验证网络鲁棒性的一般特性的方法,但是它们主要用于计算机视觉。在本文中,我们提出了基于较大感兴趣区域的自然语言理解分类的验证规范,我们讨论了此类任务的挑战。我们观察到,尽管数据几乎是线性可分离的,但验证者努力输出积极的结果,我们解释了问题和含义。
translated by 谷歌翻译
神经网络在检测嘈杂数据中的模式方面非常成功,并且已成为许多领域的首选技术。但是,他们对对抗攻击的敏感性阻碍了它们的有用性。最近,已经提出了许多用于衡量和改善网络对对抗性扰动的鲁棒性的方法,并且这项不断增长的研究体现了许多明确或隐性的鲁棒性观念。这些概念之间的联系通常是微妙的,文献中缺少它们之间的系统比较。在本文中,我们开始解决这一差距,通过在网络的培训阶段,其验证和部署之后设置对网络鲁棒性作为数学属性的经验分析和评估的一般原则。然后,我们应用这些原则并进行案例研究,以展示我们一般方法的实际好处。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
To analyze this characteristic of vulnerability, we developed an automated deep learning method for detecting microvessels in intravascular optical coherence tomography (IVOCT) images. A total of 8,403 IVOCT image frames from 85 lesions and 37 normal segments were analyzed. Manual annotation was done using a dedicated software (OCTOPUS) previously developed by our group. Data augmentation in the polar (r,{\theta}) domain was applied to raw IVOCT images to ensure that microvessels appear at all possible angles. Pre-processing methods included guidewire/shadow detection, lumen segmentation, pixel shifting, and noise reduction. DeepLab v3+ was used to segment microvessel candidates. A bounding box on each candidate was classified as either microvessel or non-microvessel using a shallow convolutional neural network. For better classification, we used data augmentation (i.e., angle rotation) on bounding boxes with a microvessel during network training. Data augmentation and pre-processing steps improved microvessel segmentation performance significantly, yielding a method with Dice of 0.71+/-0.10 and pixel-wise sensitivity/specificity of 87.7+/-6.6%/99.8+/-0.1%. The network for classifying microvessels from candidates performed exceptionally well, with sensitivity of 99.5+/-0.3%, specificity of 98.8+/-1.0%, and accuracy of 99.1+/-0.5%. The classification step eliminated the majority of residual false positives, and the Dice coefficient increased from 0.71 to 0.73. In addition, our method produced 698 image frames with microvessels present, compared to 730 from manual analysis, representing a 4.4% difference. When compared to the manual method, the automated method improved microvessel continuity, implying improved segmentation performance. The method will be useful for research purposes as well as potential future treatment planning.
translated by 谷歌翻译
由于其高质量的重建以及将现有迭代求解器结合起来的易于性,因此最近将扩散模型作为强大的生成反问题解决器研究。但是,大多数工作都专注于在无噪声设置中解决简单的线性逆问题,这显着不足以使实际问题的复杂性不足。在这项工作中,我们将扩散求解器扩展求解器,以通过后采样的拉普拉斯近似有效地处理一般噪声(非)线性反问题。有趣的是,所得的后验采样方案是扩散采样的混合版本,具有歧管约束梯度,而没有严格的测量一致性投影步骤,与先前的研究相比,在嘈杂的设置中产生了更可取的生成路径。我们的方法表明,扩散模型可以结合各种测量噪声统计量,例如高斯和泊松,并且还有效处理嘈杂的非线性反问题,例如傅立叶相检索和不均匀的脱毛。
translated by 谷歌翻译