Machine learning is the study of computer algorithms that can automatically improve based on data and experience. Machine learning algorithms build a model from sample data, called training data, to make predictions or judgments without being explicitly programmed to do so. A variety of wellknown machine learning algorithms have been developed for use in the field of computer science to analyze data. This paper introduced a new machine learning algorithm called impact learning. Impact learning is a supervised learning algorithm that can be consolidated in both classification and regression problems. It can furthermore manifest its superiority in analyzing competitive data. This algorithm is remarkable for learning from the competitive situation and the competition comes from the effects of autonomous features. It is prepared by the impacts of the highlights from the intrinsic rate of natural increase (RNI). We, moreover, manifest the prevalence of the impact learning over the conventional machine learning algorithm.
translated by 谷歌翻译
We present NusaCrowd, a collaborative initiative to collect and unite existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have has brought together 137 datasets and 117 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their effectiveness has been demonstrated in multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and its local languages. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and its local languages. Our work is intended to help advance natural language processing research in under-represented languages.
translated by 谷歌翻译
The increasing importance of both deep neural networks (DNNs) and cloud services for training them means that bad actors have more incentive and opportunity to insert backdoors to alter the behavior of trained models. In this paper, we introduce a novel method for backdoor detection that extracts features from pre-trained DNN's weights using independent vector analysis (IVA) followed by a machine learning classifier. In comparison to other detection techniques, this has a number of benefits, such as not requiring any training data, being applicable across domains, operating with a wide range of network architectures, not assuming the nature of the triggers used to change network behavior, and being highly scalable. We discuss the detection pipeline, and then demonstrate the results on two computer vision datasets regarding image classification and object detection. Our method outperforms the competing algorithms in terms of efficiency and is more accurate, helping to ensure the safe application of deep learning and AI.
translated by 谷歌翻译
Neglected tropical diseases (NTDs) continue to affect the livelihood of individuals in countries in the Southeast Asia and Western Pacific region. These diseases have been long existing and have caused devastating health problems and economic decline to people in low- and middle-income (developing) countries. An estimated 1.7 billion of the world's population suffer one or more NTDs annually, this puts approximately one in five individuals at risk for NTDs. In addition to health and social impact, NTDs inflict significant financial burden to patients, close relatives, and are responsible for billions of dollars lost in revenue from reduced labor productivity in developing countries alone. There is an urgent need to better improve the control and eradication or elimination efforts towards NTDs. This can be achieved by utilizing machine learning tools to better the surveillance, prediction and detection program, and combat NTDs through the discovery of new therapeutics against these pathogens. This review surveys the current application of machine learning tools for NTDs and the challenges to elevate the state-of-the-art of NTDs surveillance, management, and treatment.
translated by 谷歌翻译
Using 3D CNNs on high resolution medical volumes is very computationally demanding, especially for large datasets like the UK Biobank which aims to scan 100,000 subjects. Here we demonstrate that using 2D CNNs on a few 2D projections (representing mean and standard deviation across axial, sagittal and coronal slices) of the 3D volumes leads to reasonable test accuracy when predicting the age from brain volumes. Using our approach, one training epoch with 20,324 subjects takes 40 - 70 seconds using a single GPU, which is almost 100 times faster compared to a small 3D CNN. These results are important for researchers who do not have access to expensive GPU hardware for 3D CNNs.
translated by 谷歌翻译
Large annotated datasets are required to train segmentation networks. In medical imaging, it is often difficult, time consuming and expensive to create such datasets, and it may also be difficult to share these datasets with other researchers. Different AI models can today generate very realistic synthetic images, which can potentially be openly shared as they do not belong to specific persons. However, recent work has shown that using synthetic images for training deep networks often leads to worse performance compared to using real images. Here we demonstrate that using synthetic images and annotations from an ensemble of 10 GANs, instead of from a single GAN, increases the Dice score on real test images with 4.7 % to 14.0 % on specific classes.
translated by 谷歌翻译
智能仪表测量值虽然对于准确的需求预测至关重要,但仍面临一些缺点,包括消费者的隐私,数据泄露问题,仅举几例。最近的文献探索了联合学习(FL)作为一种有前途的隐私机器学习替代方案,该替代方案可以协作学习模型,而无需将私人原始数据暴露于短期负载预测中。尽管有着美德,但标准FL仍然容易受到棘手的网络威胁,称为拜占庭式攻击,这是由错误和/或恶意客户进行的。因此,为了提高联邦联邦短期负载预测对拜占庭威胁的鲁棒性,我们开发了一个最先进的基于私人安全的FL框架,以确保单个智能电表的数据的隐私,同时保护FL的安全性模型和架构。我们提出的框架利用了通过符号随机梯度下降(SignsGD)算法的梯度量化的想法,在本地模型培训后,客户仅将梯度的“符号”传输到控制中心。当我们通过涉及一组拜占庭攻击模型的基准神经网络的实验突出显示时,我们提出的方法会非常有效地减轻此类威胁,从而优于常规的FED-SGD模型。
translated by 谷歌翻译
自然语言和生物学序列之间的明显相似之处已导致最新的深层语言模型(LMS)在抗体和其他生物学序列分析中的应用激增。但是,缺乏对生物序列语言的严格语言形式化,这些语言将定义基本组成部分,例如词典(即语言的离散单元)和语法(即,将序列序列良好的规则,结构和结构和结构和结构和结构链接的规则链接在一起含义)导致了LMS的主要域无规定应用,这些应用未考虑研究的生物序列的基础结构。另一方面,语言形式化为LM应用建立了语言信息,因此适应域的组件。它将有助于更好地理解自然语言和生物序列之间的差异和相似性如何影响LMS的质量,这对于具有可解释的模型具有可解释的模型至关重要。解密抗体特异性规则对于加速有理和硅生物治疗药物设计至关重要。在这里,我们将抗体语言的特性形式化,因此不仅建立了语言工具在适应性免疫受体分析中应用的基础,而且还为免疫受体特异性的系统免疫语言学研究提供了基础。
translated by 谷歌翻译
修剪是压缩深神经网络(DNNS)的主要方法之一。最近,将核(可证明的数据汇总)用于修剪DNN,并增加了理论保证在压缩率和近似误差之间的权衡方面的优势。但是,该域中的核心是数据依赖性的,要么是在模型的权重和输入的限制性假设下生成的。在实际情况下,这种假设很少得到满足,从而限制了核心的适用性。为此,我们建议一个新颖而健壮的框架,用于计算模型权重的轻度假设,而没有对训练数据的任何假设。这个想法是计算每个层中每个神经元相对于以下层的输出的重要性。这是通过l \“ {o} wner椭圆形和caratheodory定理的组合来实现的。我们的方法同时依赖数据独立,适用于各种网络和数据集(由于简化的假设),以及在理论上支持的。方法的表现优于基于核心的现有神经修剪方法在广泛的网络和数据集上。例如,我们的方法在Imagenet上获得了$ 62 \%$的压缩率,ImageNet上的RESNET50的准确性下降了$ 1.09 \%$。
translated by 谷歌翻译
语言是个人表达思想的方法。每种语言都有自己的字母和数字字符集。人们可以通过口头或书面交流相互交流。但是,每种语言都有同类语言。聋哑和/或静音的个人通过手语交流。孟加拉语还具有手语,称为BDSL。数据集是关于孟加拉手册图像的。该系列包含49个单独的孟加拉字母图像。 BDSL49是一个数据集,由29,490张具有49个标签的图像组成。在数据收集期间,已经记录了14个不同成年人的图像,每个人都有不同的背景和外观。在准备过程中,已经使用了几种策略来消除数据集中的噪声。该数据集可免费提供给研究人员。他们可以使用机器学习,计算机视觉和深度学习技术开发自动化系统。此外,该数据集使用了两个模型。第一个是用于检测,而第二个是用于识别。
translated by 谷歌翻译