Neural image classifiers are known to undergo severe performance degradation when exposed to input that exhibits covariate-shift with respect to the training distribution. Successful hand-crafted augmentation pipelines aim at either approximating the expected test domain conditions or to perturb the features that are specific to the training environment. The development of effective pipelines is typically cumbersome, and produce transformations whose impact on the classifier performance are hard to understand and control. In this paper, we show that recent Text-to-Image (T2I) generators' ability to simulate image interventions via natural-language prompts can be leveraged to train more robust models, offering a more interpretable and controllable alternative to traditional augmentation methods. We find that a variety of prompting mechanisms are effective for producing synthetic training data sufficient to achieve state-of-the-art performance in widely-adopted domain-generalization benchmarks and reduce classifiers' dependency on spurious features. Our work suggests that further progress in T2I generation and a tighter integration with other research fields may represent a significant step towards the development of more robust machine learning systems.
translated by 谷歌翻译
Much of the information of breathing is contained within the photoplethysmography (PPG) signal, through changes in venous blood flow, heart rate and stroke volume. We aim to leverage this fact, by employing a novel deep learning framework which is a based on a repurposed convolutional autoencoder. Our model aims to encode all of the relevant respiratory information contained within photoplethysmography waveform, and decode it into a waveform that is similar to a gold standard respiratory reference. The model is employed on two photoplethysmography data sets, namely Capnobase and BIDMC. We show that the model is capable of producing respiratory waveforms that approach the gold standard, while in turn producing state of the art respiratory rate estimates. We also show that when it comes to capturing more advanced respiratory waveform characteristics such as duty cycle, our model is for the most part unsuccessful. A suggested reason for this, in light of a previous study on in-ear PPG, is that the respiratory variations in finger-PPG are far weaker compared with other recording locations. Importantly, our model can perform these waveform estimates in a fraction of a millisecond, giving it the capacity to produce over 6 hours of respiratory waveforms in a single second. Moreover, we attempt to interpret the behaviour of the kernel weights within the model, showing that in part our model intuitively selects different breathing frequencies. The model proposed in this work could help to improve the usefulness of consumer PPG-based wearables for medical applications, where detailed respiratory information is required.
translated by 谷歌翻译
Social network analysis faces profound difficulties in sharing data between researchers due to privacy and security concerns. A potential remedy to this issue are synthetic networks, that closely resemble their real counterparts, but can be freely distributed. generating synthetic networks requires the creation of network topologies that, in application, function as realistically as possible. Widely applied models are currently rule-based and can struggle to reproduce structural dynamics. Lead by recent developments in Graph Neural Network (GNN) models for network generation we evaluate the potential of GNNs for synthetic social networks. Our GNN use is specifically within a reasonable use-case and includes empirical evaluation using Maximum Mean Discrepancy (MMD). We include social network specific measurements which allow evaluation of how realistically synthetic networks behave in typical social network analysis applications. We find that the Gated Recurrent Attention Network (GRAN) extends well to social networks, and in comparison to a benchmark popular rule-based generation Recursive-MATrix (R-MAT) method, is better able to replicate realistic structural dynamics. We find that GRAN is more computationally costly than R-MAT, but is not excessively costly to employ, so would be effective for researchers seeking to create datasets of synthetic social networks.
translated by 谷歌翻译
从早期图像处理到现代计算成像,成功的模型和算法都依赖于自然信号的基本属性:对称性。在这里,对称是指信号集的不变性属性,例如翻译,旋转或缩放等转换。对称性也可以以模棱两可的形式纳入深度神经网络中,从而可以进行更多的数据效率学习。虽然近年来端到端的图像分类网络的设计方面取得了重要进展,但计算成像引入了对等效网络解决方案的独特挑战,因为我们通常只通过一些嘈杂的不良反向操作员观察图像,可能不是均等的。我们回顾了现象成像的新兴领域,并展示它如何提供改进的概括和新成像机会。在此过程中,我们展示了采集物理学与小组动作之间的相互作用,以及与迭代重建,盲目的压缩感应和自我监督学习之间的联系。
translated by 谷歌翻译
社会科学研究中文本数据的使用增加受益于易于访问的数据(例如Twitter)。这种趋势是以研究成本需要敏感但难以分享的数据的成本(例如,访谈数据,警察报告,电子健康记录)。我们使用开源文本匿名软件_textwash_介绍了该僵局的解决方案。本文使用TILD标准介绍了该工具的经验评估:技术评估(工具的准确性?),信息损失评估(匿名过程中丢失了多少信息?)和De-Nomenymisation Test(可以可以使用(可以可以可以使用)测试(可以可以使用匿名测试(可以人类从匿名文本数据中识别个人吗?)。研究结果表明,TextWash的性能类似于最新的实体识别模型,并引入了可忽略的信息损失0.84%。对于De-nonymisation测试,我们任命人类从众包人的描述数据集中对非常著名,半著名和不存在的个人的描述来识别个人。该工具的现实用例的匿名率范围为1.01-2.01%。我们在第二项研究中复制了发现,并得出结论,Textwash成功地删除了潜在的敏感信息,这些信息实际上使人描述实际上是匿名的。
translated by 谷歌翻译
受生物传感系统中随机投影的使用的启发,我们提出了一种用于处理分类问题数据的新算法。这是基于对人脑和果蝇的嗅觉系统的观察结果,并涉及将数据随机投射到一个大大增加尺寸的空间中,然后再应用CAP操作以截断较小的条目。这导致了一种算法,该算法实现了稀疏表示,分类准确性损失最小,并且在将噪声添加到数据中时提高了分类精度,也更加健壮。数值实验证明了这一点,这些实验补充了理论结果,表明所得信号转换在适当的意义上是连续且可逆的。
translated by 谷歌翻译
在许多现实世界中,只有不完整的测量数据可用于培训,这可能会带来学习重建功能的问题。实际上,通常不可能使用固定的不完整测量过程学习,因为测量运算符的无信息中没有信息。可以通过使用来自多个操作员的测量来克服此限制。尽管该想法已成功地应用于各种应用中,但仍缺乏对学习条件的精确表征。在本文中,我们通过提出必要和充分的条件来学习重建所需的基本信号模型,以指示不同测量运算符数量之间的相互作用,每个操作员的测量数量,模型的尺寸和尺寸之间的相互作用。信号。此外,我们提出了一个新颖且概念上简单的无监督学习损失,该损失仅需要访问不完整的测量数据,并在验证足够的条件时与受监督学习的表现达到相同的表现。我们通过一系列有关各种成像逆问题的实验,例如加速磁共振成像,压缩感测和图像介入,通过一系列实验来验证我们的理论界限,并证明了与以前的方法相比,提出的无监督损失的优势。
translated by 谷歌翻译
最近的Davies等(2021)的纸张描述了深度学习(DL)技术如何用于找到导致两个原始数学结果的合理假设:一个在结理论中,一个在代表理论中。我认为DL技术对数学的意义和新颖性在审查的论文中显着夸大,并且在流行科学出版社的一些账户中被疯狂地夸大了。在结理论结果中,DL的作用很小,并且传统的统计分析可能已经足够了。在代表理论结果中,DL的作用要大得多;然而,几十年来,它与实验数学的实际情况中的实物不同。此外,目前尚不清楚DL的独特特征,使其有用的是在此处将应用于各种数学问题。最后,我争辩说,这里的DL“指导人类直觉”是无益的和误导; DL主要是什么,是要将许多可能的猜想标记为虚假和其他一些可能值得研究的其他人。当然,表示理论结果代表了DL对数学研究的原始和有趣的应用,但其重要意义是不确定的。
translated by 谷歌翻译
权力下放的金融(DECI)是一种由各种区块链的智能合同构建的金融产品和服务系统。在过去的一年里,DEFI获得了普及和市场资本化。但是,它也成为了与加密货币相关的犯罪的震中,特别是各种类型的证券违规行为。缺乏了解您在DECII中的客户需求使各国政府不确定如何处理此空间的违规程度。本研究旨在通过机器学习方法解决这一问题,以确定基于其令牌的智能合同代码潜在地侵犯证券违规的污染项目。我们更广泛地调整了检测特定类型的证券违规行为的特定类型,基于从Defi项目令牌的智能合同代码中提取的功能来构建随机林分类器。最终分类器实现99.1%F1分数。对于任何分类问题来说,这种高性能令人惊讶,但是,从进一步的特征级别,我们发现一个特征使得一个高度可检测的问题。我们的研究的另一个贡献是一个新的数据集,由(a)验证的地面真理数据集,用于证券违规涉及的令牌和(b)来自项目中的Predi Aggregator的一组有效令牌,在项目上进行了尽职调查。本文进一步讨论了检察官在执法努力中使用我们的模式,并将其潜在利用与更广泛的法律背景联系起来。
translated by 谷歌翻译
我们介绍了一个新的真实值不变,称为3范围内的双曲结的自然斜率,这在其CUSP几何形状中定义。我们展示了两倍的结签名,自然斜率在大多数恒定时间上不同的双曲线除以喷射率半径的立方体。使用机器学习发现这种不等式来检测各种结不变之间的关系。它有应用于Dehn手术和4球属的应用。我们还显示了一个精致版本的不等式,其中上限是体积的线性函数,并且斜率通过对应于链接结的短测地测量的术语来校正,该术语将结奇数次数。
translated by 谷歌翻译