Being able to forecast the popularity of new garment designs is very important in an industry as fast paced as fashion, both in terms of profitability and reducing the problem of unsold inventory. Here, we attempt to address this task in order to provide informative forecasts to fashion designers within a virtual reality designer application that will allow them to fine tune their creations based on current consumer preferences within an interactive and immersive environment. To achieve this we have to deal with the following central challenges: (1) the proposed method should not hinder the creative process and thus it has to rely only on the garment's visual characteristics, (2) the new garment lacks historical data from which to extrapolate their future popularity and (3) fashion trends in general are highly dynamical. To this end, we develop a computer vision pipeline fine tuned on fashion imagery in order to extract relevant visual features along with the category and attributes of the garment. We propose a hierarchical label sharing (HLS) pipeline for automatically capturing hierarchical relations among fashion categories and attributes. Moreover, we propose MuQAR, a Multimodal Quasi-AutoRegressive neural network that forecasts the popularity of new garments by combining their visual features and categorical features while an autoregressive neural network is modelling the popularity time series of the garment's category and attributes. Both the proposed HLS and MuQAR prove capable of surpassing the current state-of-the-art in key benchmark datasets, DeepFashion for image classification and VISUELLE for new garment sales forecasting.
translated by 谷歌翻译
为了将时尚服装视为美学上的令人愉悦,构成它们的服装需要在视觉方面(例如样式,类别和颜色)兼容。随着计算机视觉深度学习模型的出现和无所不知,人们对视觉兼容检测的任务也增加了兴趣,目的是开发优质的时尚服装推荐系统。先前的作品将视觉兼容性定义为二进制分类任务,而衣服中的项目被认为是完全兼容或完全不相容的。但是,这不适用于用户创建自己的服装的服装制造商应用程序,并且需要知道哪些特定项目可能与其余的服装不相容。为了解决这个问题,我们提出了针对两个任务进行优化的视觉不兼容变压器(Victor):1)总体兼容性作为回归和2)检测不匹配项目。与以前的作品依赖于来自Imagenet预测模型的功能提取或端到端的微调不同,我们利用了时尚特定于时尚的对比语言图像预训练来进行微调计算机视觉神经网络在时尚图像上。此外,我们基于Polyvore Outfit基准测试,以产生部分不匹配的服装,创建一个称为Polyvore-Misfits的新数据集,该数据集用于训练Victor。一系列消融和比较分析表明,所提出的体系结构可以竞争,甚至超过Polyvore数据集上的最新目前,同时将实例的浮动操作减少88%,从而在高性能和效率之间达到平衡。
translated by 谷歌翻译
新的时尚产品销售预测是一个具有挑战性的问题,涉及许多业务动态,无法通过经典的预测方法来解决。在本文中,我们研究了以Google趋势时间序列的形式进行系统探索外源知识的有效性,并将其与与全新时尚项目相关的多模式信息结合在一起,以便有效地预测其销售额,尽管缺乏过去数据。特别是,我们提出了一种基于神经网络的方法,编码器在其中学习了外源时间序列的表示,而解码器则根据Google趋势编码以及可用的视觉和元数据信息来预测销售。我们的模型以非自动回归方式起作用,避免了大型第一步错误的复合效果。作为第二个贡献,我们介绍了Visuelle,这是一个公开可用的数据集,用于新时尚产品销售预测的任务,其中包含5577 Real,新产品的多模式信息,该产品在2016 - 2019年之间从意大利快速时尚公司Nunalie出售。该数据集配备了产品,元数据,相关销售以及相关的Google趋势的图像。我们使用Visuelle将我们的方法与最新的替代方案和几种基线进行比较,这表明我们基于神经网络的方法在百分比和绝对错误方面都是最准确的。值得注意的是,外源知识的添加使预测准确性提高了1.5%的Wape,从而揭示了利用内容丰富的外部信息的重要性。代码和数据集均可在https://github.com/humaticslab/gtm-transformer上获得。
translated by 谷歌翻译
We present Visuelle 2.0, the first dataset useful for facing diverse prediction problems that a fast-fashion company has to manage routinely. Furthermore, we demonstrate how the use of computer vision is substantial in this scenario. Visuelle 2.0 contains data for 6 seasons / 5355 clothing products of Nuna Lie, a famous Italian company with hundreds of shops located in different areas within the country. In particular, we focus on a specific prediction problem, namely short-observation new product sale forecasting (SO-fore). SO-fore assumes that the season has started and a set of new products is on the shelves of the different stores. The goal is to forecast the sales for a particular horizon, given a short, available past (few weeks), since no earlier statistics are available. To be successful, SO-fore approaches should capture this short past and exploit other modalities or exogenous data. To these aims, Visuelle 2.0 is equipped with disaggregated data at the item-shop level and multi-modal information for each clothing item, allowing computer vision approaches to come into play. The main message that we deliver is that the use of image data with deep networks boosts performances obtained when using the time series in long-term forecasting scenarios, ameliorating the WAPE and MAE by up to 5.48% and 7% respectively compared to competitive baseline methods. The dataset is available at https://humaticslab.github.io/forecasting/visuelle
translated by 谷歌翻译
我们提出了一个以数据为中心的管道,能够为新的时尚产品性能预测(NFPPF)问题生成外源性观察数据,即预测没有过去观察的全新服装探测的性能。我们的管道从一件服装探针的单个可用图像开始制造了失踪的过去。它首先要扩展与图像关联的文本标签,在过去的特定时间上查询相关的时尚图像或不合时宜的图像。通过自信的学习,可以在这些网络图像上对二进制分类器进行良好的训练,以了解过去的时尚以及探测图像对这种时尚性的概念的符合。这种合规性产生了潜在的性能(POP)时间序列,表明如果探针的性能较早,则该探针的性能如何。 POP被证明是对探针未来表现的高度预测,可以改善最近Visuelle快速时尚数据集中所有最先进模型的销售预测。我们还表明,流行音乐反映了时尚前锋基准上的新样式(服装合奏)的基础真实性的普及,这表明我们的熟悉的信号是一个真实的流行,每个人都可以访问,并且可以在任何分析时间范围内获得普遍性。 。预测代码,数据和流行时间序列可在以下网址获得:https://github.com/humaticslab/pop-mining-potential-performance
translated by 谷歌翻译
The fashion industry is one of the most active and competitive markets in the world, manufacturing millions of products and reaching large audiences every year. A plethora of business processes are involved in this large-scale industry, but due to the generally short life-cycle of clothing items, supply-chain management and retailing strategies are crucial for good market performance. Correctly understanding the wants and needs of clients, managing logistic issues and marketing the correct products are high-level problems with a lot of uncertainty associated to them given the number of influencing factors, but most importantly due to the unpredictability often associated with the future. It is therefore straightforward that forecasting methods, which generate predictions of the future, are indispensable in order to ameliorate all the various business processes that deal with the true purpose and meaning of fashion: having a lot of people wear a particular product or style, rendering these items, people and consequently brands fashionable. In this paper, we provide an overview of three concrete forecasting tasks that any fashion company can apply in order to improve their industrial and market impact. We underline advances and issues in all three tasks and argue about their importance and the impact they can have at an industrial level. Finally, we highlight issues and directions of future work, reflecting on how learning-based forecasting methods can further aid the fashion industry.
translated by 谷歌翻译
The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.
translated by 谷歌翻译
图像的美学质量被定义为图像美的度量或欣赏。美学本质上是一个主观性的财产,但是存在一些影响它的因素,例如图像的语义含量,描述艺术方面的属性,用于射击的摄影设置等。在本文中,我们提出了一种方法基于语义含量分析,艺术风格和图像的组成的图像自动预测图像的美学。所提出的网络包括:用于语义特征的预先训练的网络,提取(骨干网);依赖于骨干功能的多层的Perceptron(MLP)网络,用于预测图像属性(attributeNet);一种自适应的HyperNetwork,可利用以前编码到attributeNet生成的嵌入的属性以预测专用于美学估计的目标网络的参数(AestheticNet)。鉴于图像,所提出的多网络能够预测:风格和组成属性,以及美学分数分布。结果三个基准数据集展示了所提出的方法的有效性,而消融研究则更好地了解所提出的网络。
translated by 谷歌翻译
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
translated by 谷歌翻译
信息爆炸的时代促使累积巨大的时间序列数据,包括静止和非静止时间序列数据。最先进的算法在处理静止时间数据方面取得了体面的性能。然而,解决静止​​时间系列的传统算法不适用于外汇交易的非静止系列。本文调查了适用的模型,可以提高预测未来非静止时间序列序列趋势的准确性。特别是,我们专注于识别潜在模型,并调查识别模式从历史数据的影响。我们提出了基于RNN的\ Rebuttal {The} SEQ2Seq模型的组合,以及通过动态时间翘曲和Zigzag峰谷指示器提取的注重机制和富集的集合特征。定制损失函数和评估指标旨在更加关注预测序列的峰值和谷点。我们的研究结果表明,我们的模型可以在外汇数据集中预测高精度的4小时未来趋势,这在逼真的情况下至关重要,以协助外汇交易决策。我们进一步提供了对各种损失函数,评估指标,模型变体和组件对模型性能的影响的评估。
translated by 谷歌翻译
近年来,多任务学习在各种应用程序中都取得了巨大的成功。尽管这些年来,单个模型培训已承诺取得出色的成果,但它忽略了有价值的信息,这些信息可能有助于我们更好地估计一个指标。在与学习相关的任务下,多任务学习能够更好地概括模型。我们试图通过在相关任务和归纳转移学习之间共享功能来增强多任务模型的功能映射。此外,我们的兴趣是学习各种任务之间的任务关系,以从多任务学习中获得更好的收益。在本章中,我们的目标是可视化现有的多任务模型,比较其性能,用于评估多任务模型性能的方法,讨论在各个领域的设计和实施过程中所面临的问题,以及他们实现的优势和里程碑
translated by 谷歌翻译
Wind power forecasting helps with the planning for the power systems by contributing to having a higher level of certainty in decision-making. Due to the randomness inherent to meteorological events (e.g., wind speeds), making highly accurate long-term predictions for wind power can be extremely difficult. One approach to remedy this challenge is to utilize weather information from multiple points across a geographical grid to obtain a holistic view of the wind patterns, along with temporal information from the previous power outputs of the wind farms. Our proposed CNN-RNN architecture combines convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to extract spatial and temporal information from multi-dimensional input data to make day-ahead predictions. In this regard, our method incorporates an ultra-wide learning view, combining data from multiple numerical weather prediction models, wind farms, and geographical locations. Additionally, we experiment with global forecasting approaches to understand the impact of training the same model over the datasets obtained from multiple different wind farms, and we employ a method where spatial information extracted from convolutional layers is passed to a tree ensemble (e.g., Light Gradient Boosting Machine (LGBM)) instead of fully connected layers. The results show that our proposed CNN-RNN architecture outperforms other models such as LGBM, Extra Tree regressor and linear regression when trained globally, but fails to replicate such performance when trained individually on each farm. We also observe that passing the spatial information from CNN to LGBM improves its performance, providing further evidence of CNN's spatial feature extraction capabilities.
translated by 谷歌翻译
社交媒体网络已成为人们生活的重要方面,它是其思想,观点和情感的平台。因此,自动化情绪分析(SA)对于以其他信息来源无法识别人们的感受至关重要。对这些感觉的分析揭示了各种应用,包括品牌评估,YouTube电影评论和医疗保健应用。随着社交媒体的不断发展,人们以不同形式发布大量信息,包括文本,照片,音频和视频。因此,传统的SA算法已变得有限,因为它们不考虑其他方式的表现力。通过包括来自各种物质来源的此类特征,这些多模式数据流提供了新的机会,以优化基于文本的SA之外的预期结果。我们的研究重点是多模式SA的最前沿领域,该领域研究了社交媒体网络上发布的视觉和文本数据。许多人更有可能利用这些信息在这些平台上表达自己。为了作为这个快速增长的领域的学者资源,我们介绍了文本和视觉SA的全面概述,包括数据预处理,功能提取技术,情感基准数据集以及适合每个字段的多重分类方法的疗效。我们还简要介绍了最常用的数据融合策略,并提供了有关Visual Textual SA的现有研究的摘要。最后,我们重点介绍了最重大的挑战,并调查了一些重要的情感应用程序。
translated by 谷歌翻译
The stock market prediction has been a traditional yet complex problem researched within diverse research areas and application domains due to its non-linear, highly volatile and complex nature. Existing surveys on stock market prediction often focus on traditional machine learning methods instead of deep learning methods. Deep learning has dominated many domains, gained much success and popularity in recent years in stock market prediction. This motivates us to provide a structured and comprehensive overview of the research on stock market prediction focusing on deep learning techniques. We present four elaborated subtasks of stock market prediction and propose a novel taxonomy to summarize the state-of-the-art models based on deep neural networks from 2011 to 2022. In addition, we also provide detailed statistics on the datasets and evaluation metrics commonly used in the stock market. Finally, we highlight some open issues and point out several future directions by sharing some new perspectives on stock market prediction.
translated by 谷歌翻译
哥内克人Sentinel Imagery的纯粹卷的可用性为使用深度学习的大尺度创造了新的土地利用陆地覆盖(Lulc)映射的机会。虽然在这种大型数据集上培训是一个非琐碎的任务。在这项工作中,我们试验Lulc Image分类和基准不同最先进模型的Bigearthnet数据集,包括卷积神经网络,多层感知,视觉变压器,高效导通和宽残余网络(WRN)架构。我们的目标是利用分类准确性,培训时间和推理率。我们提出了一种基于用于网络深度,宽度和输入数据分辨率的WRNS复合缩放的高效导通的框架,以有效地训练和测试不同的模型设置。我们设计一种新颖的缩放WRN架构,增强了有效的通道注意力机制。我们提出的轻量级模型具有较小的培训参数,实现所有19个LULC类的平均F分类准确度达到4.5%,并且验证了我们使用的resnet50最先进的模型速度快两倍作为基线。我们提供超过50种培训的型号,以及我们在多个GPU节点上分布式培训的代码。
translated by 谷歌翻译
人口级社会事件,如民事骚乱和犯罪,往往对我们的日常生活产生重大影响。预测此类事件对于决策和资源分配非常重要。由于缺乏关于事件发生的真实原因和潜在机制的知识,事件预测传统上具有挑战性。近年来,由于两个主要原因,研究事件预测研究取得了重大进展:(1)机器学习和深度学习算法的开发和(2)社交媒体,新闻来源,博客,经济等公共数据的可访问性指标和其他元数据源。软件/硬件技术中的数据的爆炸性增长导致了社会事件研究中的深度学习技巧的应用。本文致力于提供社会事件预测的深层学习技术的系统和全面概述。我们专注于两个社会事件的域名:\ Texit {Civil unrest}和\ texit {犯罪}。我们首先介绍事件预测问题如何作为机器学习预测任务制定。然后,我们总结了这些问题的数据资源,传统方法和最近的深度学习模型的发展。最后,我们讨论了社会事件预测中的挑战,并提出了一些有希望的未来研究方向。
translated by 谷歌翻译
在线广告收入占发布者的收入流越来越多的份额,特别是对于依赖谷歌和Facebook等技术公司广告网络的中小型出版商而言。因此,出版商可能会从准确的在线广告收入预测中获益,以更好地管理其网站货币化战略。但是,只能获得自己的收入数据的出版商缺乏出版商广告总市场的整体视图,这反过来限制了他们在他们未来的在线广告收入中产生见解的能力。为了解决这一业务问题,我们利用了一个专有的数据库,包括来自各种各样的地区的大量出版商的Google Adsense收入。我们采用时间融合变压器(TFT)模型,这是一种新的基于关注的架构,以预测出版商的广告收入。我们利用多个协变量,不仅包括出版商自己的特征,还包括其他出版商的广告收入。我们的预测结果优于多个时间范围的几个基准深度学习时间系列预测模型。此外,我们通过分析可变重要性重量来识别显着的特征和自我注意重量来解释结果,以揭示持久的时间模式。
translated by 谷歌翻译
在本文中,我们提出了一条新型的管道,该管道利用语言基础模型进行时间顺序模式挖掘,例如人类的移动性预测任务。例如,在预测利益(POI)客户流量的任务中,通常从历史日志中提取访问次数,并且仅使用数值数据来预测访客流。在这项研究中,我们直接对包含各种信息的自然语言输入执行预测任务,例如数值和上下文的语义信息。引入特定的提示以将数值时间序列转换为句子,以便可以直接应用现有的语言模型。我们设计了一个Auxmoblcast管道,用于预测每个POI中的访问者数量,将辅助POI类别分类任务与编码器架构结构集成在一起。这项研究提供了所提出的Auxmoblcast管道有效性以发现移动性预测任务中的顺序模式的经验证据。在三个现实世界数据集上评估的结果表明,预训练的语言基础模型在预测时间序列中也具有良好的性能。这项研究可以提供有远见的见解,并为预测人类流动性提供新的研究方向。
translated by 谷歌翻译
深度学习属于人工智能领域,机器执行通常需要某种人类智能的任务。类似于大脑的基本结构,深度学习算法包括一种人工神经网络,其类似于生物脑结构。利用他们的感官模仿人类的学习过程,深入学习网络被送入(感官)数据,如文本,图像,视频或声音。这些网络在不同的任务中优于最先进的方法,因此,整个领域在过去几年中看到了指数增长。这种增长在过去几年中每年超过10,000多种出版物。例如,只有在医疗领域中的所有出版物中覆盖的搜索引擎只能在Q3 2020中覆盖所有出版物的子集,用于搜索术语“深度学习”,其中大约90%来自过去三年。因此,对深度学习领域的完全概述已经不可能在不久的将来获得,并且在不久的将来可能会难以获得难以获得子场的概要。但是,有几个关于深度学习的综述文章,这些文章专注于特定的科学领域或应用程序,例如计算机愿景的深度学习进步或在物体检测等特定任务中进行。随着这些调查作为基础,这一贡献的目的是提供对不同科学学科的深度学习的第一个高级,分类的元调查。根据底层数据来源(图像,语言,医疗,混合)选择了类别(计算机愿景,语言处理,医疗信息和其他工程)。此外,我们还审查了每个子类别的常见架构,方法,专业,利弊,评估,挑战和未来方向。
translated by 谷歌翻译
考虑到运输系统的多模式性质和潜在的跨模式相关性,通过从多模式数据中学习来提高需求预测准确性的趋势越来越大。这些多模式的预测模型可以提高准确性,但是当多模式数据集的不同部分由无法直接共享数据的不同机构拥有时,不太实际。尽管各个机构可能无法直接共享他们的数据,但他们可能会共享受其数据培训的预测模型,在此模型无法使用其数据集中确定确切信息。这项研究提出了一个无监督的知识适应需求预测框架,以通过基于其他模式的数据利用预训练的模型来预测目标模式的需求,这不需要源模式的直接数据共享。所提出的框架利用多种运输模式之间的潜在共享模式来改善预测性能,同时避免在不同机构之间直接共享数据。具体而言,首先根据源模式的数据学习了预训练的预测模型,该模式可以捕获和记住源旅行模式。然后,将目标数据集的需求数据编码为单个知识部分和共享知识部分,该部分将分别通过个人提取网络提取旅行模式和共享提取网络。无监督的知识适应策略用于通过制作预训练的网络和共享提取网络类似来形成共享功能,以进一步预测。我们的发现表明,通过将预先训练的模型共享到目标模式可以改善预测性能,而无需依赖直接数据共享。
translated by 谷歌翻译