在这项研究中,要求各种印度生物的听众倾听并认识到美国扬声器所说的速度话语。我们识别出一个话语时,我们有三种来自每个听众的回应:1。句子难度评级,2.扬声器难度评级,以及讲话的转录。从这些转录中,计算并用作标准以评估识别和原始句子之间的相似性。本研究中选择的句子分为三组:简单,中和硬,基于此研究它们中的单词的频率。我们观察到句子,扬声器难度评级和行动从易于难以句子的句子增加。我们还使用以下三种自动语音识别(ASR)进行人类语音识别性能,在声学模型(AM)和语言模型(LM)(LM)(LM):ASR1)训练中,录制了印度源头和LM的录音Timit Text,ASR2)我正在使用来自Libli语音语料库的本地美国扬声器和LM的录音,以及ASR3)正在使用来自美国原住民扬声器和LM构建的录音在Libli语音和Timit文本上。我们观察到HSR性能类似于ASR1的性能,而ASR3则实现最佳性能。扬声器诞生明智的分析表明,与少数其他生命神相比,印度听众的扬声器的话语更难以识别
translated by 谷歌翻译
Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.
translated by 谷歌翻译
Speech systems are sensitive to accent variations. This is especially challenging in the Indian context, with an abundance of languages but a dearth of linguistic studies characterising pronunciation variations. The growing number of L2 English speakers in India reinforces the need to study accents and L1-L2 interactions. We investigate the accents of Indian English (IE) speakers and report in detail our observations, both specific and common to all regions. In particular, we observe the phonemic variations and phonotactics occurring in the speakers' native languages and apply this to their English pronunciations. We demonstrate the influence of 18 Indian languages on IE by comparing the native language pronunciations with IE pronunciations obtained jointly from existing literature studies and phonetically annotated speech of 80 speakers. Consequently, we are able to validate the intuitions of Indian language influences on IE pronunciations by justifying pronunciation rules from the perspective of Indian language phonology. We obtain a comprehensive description in terms of universal and region-specific characteristics of IE, which facilitates accent conversion and adaptation of existing ASR and TTS systems to different Indian accents.
translated by 谷歌翻译
Dengue fever is a virulent disease spreading over 100 tropical and subtropical countries in Africa, the Americas, and Asia. This arboviral disease affects around 400 million people globally, severely distressing the healthcare systems. The unavailability of a specific drug and ready-to-use vaccine makes the situation worse. Hence, policymakers must rely on early warning systems to control intervention-related decisions. Forecasts routinely provide critical information for dangerous epidemic events. However, the available forecasting models (e.g., weather-driven mechanistic, statistical time series, and machine learning models) lack a clear understanding of different components to improve prediction accuracy and often provide unstable and unreliable forecasts. This study proposes an ensemble wavelet neural network with exogenous factor(s) (XEWNet) model that can produce reliable estimates for dengue outbreak prediction for three geographical regions, namely San Juan, Iquitos, and Ahmedabad. The proposed XEWNet model is flexible and can easily incorporate exogenous climate variable(s) confirmed by statistical causality tests in its scalable framework. The proposed model is an integrated approach that uses wavelet transformation into an ensemble neural network framework that helps in generating more reliable long-term forecasts. The proposed XEWNet allows complex non-linear relationships between the dengue incidence cases and rainfall; however, mathematically interpretable, fast in execution, and easily comprehensible. The proposal's competitiveness is measured using computational experiments based on various statistical metrics and several statistical comparison tests. In comparison with statistical, machine learning, and deep learning methods, our proposed XEWNet performs better in 75% of the cases for short-term and long-term forecasting of dengue incidence.
translated by 谷歌翻译
The increasing number of surveillance cameras and security concerns have made automatic violent activity detection from surveillance footage an active area for research. Modern deep learning methods have achieved good accuracy in violence detection and proved to be successful because of their applicability in intelligent surveillance systems. However, the models are computationally expensive and large in size because of their inefficient methods for feature extraction. This work presents a novel architecture for violence detection called Two-stream Multi-dimensional Convolutional Network (2s-MDCN), which uses RGB frames and optical flow to detect violence. Our proposed method extracts temporal and spatial information independently by 1D, 2D, and 3D convolutions. Despite combining multi-dimensional convolutional networks, our models are lightweight and efficient due to reduced channel capacity, yet they learn to extract meaningful spatial and temporal information. Additionally, combining RGB frames and optical flow yields 2.2% more accuracy than a single RGB stream. Regardless of having less complexity, our models obtained state-of-the-art accuracy of 89.7% on the largest violence detection benchmark dataset.
translated by 谷歌翻译
Predicting stock market movements has always been of great interest to investors and an active area of research. Research has proven that popularity of products is highly influenced by what people talk about. Social media like Twitter, Reddit have become hotspots of such influences. This paper investigates the impact of social media posts on close price prediction of stocks using Twitter and Reddit posts. Our objective is to integrate sentiment of social media data with historical stock data and study its effect on closing prices using time series models. We carried out rigorous experiments and deep analysis using multiple deep learning based models on different datasets to study the influence of posts by executives and general people on the close price. Experimental results on multiple stocks (Apple and Tesla) and decentralised currencies (Bitcoin and Ethereum) consistently show improvements in prediction on including social media data and greater improvements on including executive posts.
translated by 谷歌翻译
数据文章介绍了路线损坏数据集RDD2022,其中包括来自六个国家,日本,印度,捷克共和国,挪威,美国和中国的47,420条道路图像。图像已注释了超过55,000个道路损坏的实例。数据集中捕获了四种类型的道路损坏,即纵向裂缝,横向裂纹,鳄鱼裂纹和坑洼。设想注释的数据集用于开发基于深度学习的方法以自动检测和对道路损害进行分类。该数据集已作为基于人群传感的道路伤害检测挑战(CRDDC2022)的一部分发布。 CRDDC2022挑战邀请了来自全球的研究人员提出解决方案,以在多个国家 /地区自动道路损害检测。市政当局和道路机构可以使用RDD2022数据集,并使用RDD2022培训的模型用于低成本自动监测道路状况。此外,计算机视觉和机器学习研究人员可能会使用数据集对其他类型的其他基于图像的应用程序(分类,对象检测等)进行不同算法的性能。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
Analysis of Indian English (IE) pronunciation variabilities are useful in building systems for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) synthesis in the Indian context. Typically, these pronunciation variabilities have been explored by comparing IE pronunciation with Received Pronunciation (RP). However, to explore these variabilities, it is required to have labelled pronunciation data at the phonetic level, which is scarce for IE. Moreover, versatility of IE stems from the influence of a large diversity of the speakers' mother tongues and demographic region differences. Prior linguistic works have characterised features of IE variabilities qualitatively by reporting phonetic rules that represent such variations relative to RP. The qualitative descriptions often lack quantitative descriptors and data-driven analysis of diverse IE pronunciation data to characterise IE on the phonetic level. To address these issues, in this work, we consider a corpus, Indic TIMIT, containing a large set of IE varieties from 80 speakers from various regions of India. We present an analysis to obtain the new set of phonetic rules representing IE pronunciation variabilities relative to RP in a data-driven manner. We do this using 15,974 phonetic transcriptions, of which 13,632 were obtained manually in addition to those part of the corpus. Furthermore, we validate the rules obtained from the analysis against the existing phonetic rules to identify the relevance of the obtained phonetic rules and test the efficacy of Grapheme-to-Phoneme (G2P) conversion developed based on the obtained rules considering Phoneme Error Rate (PER) as the metric for performance.
translated by 谷歌翻译
机器学习开始在一系列环境应用中提供最先进的性能,例如水文流域中的流量预测。但是,由于主要的水文工艺的可变性,在实践中建立准确的大规模模型在实践中仍然具有挑战性,这是通过一组与过程相关的盆地特征捕获的。现有的盆地特征遭受了噪音和不确定性的影响,以及许多其他事情,这会对模型性能产生不利影响。为了应对上述挑战,在本文中,我们提出了一种新颖的知识引导的自学学习(KGSSL)逆框架,以从驱动程序和响应数据中提取系统特征。即使特征被损坏,这个首先的框架即使在特征被损坏的情况下也达到了强大的性能。我们表明,KGSSL为骆驼的流量建模(大型研究的流域属性和气象学)实现了最新的结果,这是一个广泛使用的水文基准数据集。具体而言,KGSSL在重建特性中最多优于其他方法16 \%。此外,我们表明KGSSL比基线方法相对强大,并且在插入KGSSL推断的特征时,基线模型的表现优于35 \%。
translated by 谷歌翻译