智能论文笔记

Visual Time Series Forecasting: An Image-driven Approach

Naftali Cohen , Srijan Sood , Zhen Zeng , Tucker Balch , Manuela Veloso

分类：计算机视觉 | 机器学习

2021-07-02

在这项工作中，我们将时间系列预测解决为计算机视觉任务。我们将输入数据捕获为图像并培训模型以产生后续图像。这种方法导致预测分布而不是点的值。为了评估我们方法的稳健性和质量，我们检查各种数据集和多个评估指标。我们的实验表明，我们的预测工具对循环数据有效，但对于股票价格的不规则数据有点少。重要的是，在使用基于图像的评估指标时，我们发现我们的方法以优于各种基线，包括Arima，以及我们的深度学习方法的数值变化。

translated by 谷歌翻译

Visual Time Series Forecasting: An Image-driven Approach

Naftali Cohen , Srijan Sood , Zhen Zeng , Tucker Balch , Manuela Veloso

分类：计算机视觉 | 机器学习

2020-11-18

translated by 谷歌翻译

A Survey of Graph Neural Networks for Social Recommender Systems

Kartik Sharma , Yeon-Chang Lee , Sivagami Nambi , Aditya Salian , Shlok Shah , Sang-Wook Kim , Srijan Kumar

分类：机器学习

2022-12-08

Social recommender systems (SocialRS) simultaneously leverage user-to-item interactions as well as user-to-user social relations for the task of generating item recommendations to users. Additionally exploiting social relations is clearly effective in understanding users' tastes due to the effects of homophily and social influence. For this reason, SocialRS has increasingly attracted attention. In particular, with the advance of Graph Neural Networks (GNN), many GNN-based SocialRS methods have been developed recently. Therefore, we conduct a comprehensive and systematic review of the literature on GNN-based SocialRS. In this survey, we first identify 80 papers on GNN-based SocialRS after annotating 2151 papers by following the PRISMA framework (Preferred Reporting Items for Systematic Reviews and Meta-Analysis). Then, we comprehensively review them in terms of their inputs and architectures to propose a novel taxonomy: (1) input taxonomy includes 5 groups of input type notations and 7 groups of input representation notations; (2) architecture taxonomy includes 8 groups of GNN encoder, 2 groups of decoder, and 12 groups of loss function notations. We classify the GNN-based SocialRS methods into several categories as per the taxonomy and describe their details. Furthermore, we summarize the benchmark datasets and metrics widely used to evaluate the GNN-based SocialRS methods. Finally, we conclude this survey by presenting some future research directions.

translated by 谷歌翻译

Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions

Gaurav Verma , Vishwa Vinay , Ryan A. Rossi , Srijan Kumar

分类：机器学习 | 人工智能

2022-11-04

As multimodal learning finds applications in a wide variety of high-stakes societal tasks, investigating their robustness becomes important. Existing work has focused on understanding the robustness of vision-and-language models to imperceptible variations on benchmark tasks. In this work, we investigate the robustness of multimodal classifiers to cross-modal dilutions - a plausible variation. We develop a model that, given a multimodal (image + text) input, generates additional dilution text that (a) maintains relevance and topical coherence with the image and existing text, and (b) when added to the original text, leads to misclassification of the multimodal input. Via experiments on Crisis Humanitarianism and Sentiment Detection tasks, we find that the performance of task-specific fusion-based multimodal classifiers drops by 23.3% and 22.5%, respectively, in the presence of dilutions generated by our model. Metric-based comparisons with several baselines and human evaluations indicate that our dilutions show higher relevance and topical coherence, while simultaneously being more effective at demonstrating the brittleness of the multimodal classifiers. Our work aims to highlight and encourage further research on the robustness of deep multimodal models to realistic variations, especially in human-facing societal applications. The code and other resources are available at https://claws-lab.github.io/multimodal-robustness/.

translated by 谷歌翻译

M2TRec: Metadata-aware Multi-task Transformer for Large-scale and Cold-start free Session-based Recommendations

Walid Shalaby , Sejoon Oh , Amir Afsharinejad , Srijan Kumar , Xiquan Cui

分类：人工智能 | 机器学习

2022-09-23

基于会话的推荐系统（SBRS）表现出优于常规方法的性能。但是，它们在大规模工业数据集上显示出有限的可伸缩性，因为大多数模型都会学习一个嵌入每个项目。这导致了巨大的记忆要求（每项存储一个矢量），并且在稀疏的会话上具有冷启动或不受欢迎的项目的性能差。使用一个公共和一个大型工业数据集，我们在实验上表明，最先进的SBRS在稀疏项目的稀疏会议上的性能较低。我们提出了M2TREC，这是一种基于会话建议的元数据感知的多任务变压器模型。我们提出的方法学习了从项目元数据到嵌入的转换函数，因此是免费的（即，不需要学习一个嵌入每个项目）。它集成了项目元数据以学习各种项目属性的共享表示。在推论期间，将为与先前在培训期间观察到的项目共享的属性分配新的或不受欢迎的项目，因此将与这些项目具有相似的表示，从而使甚至冷启动和稀疏项目的建议。此外，M2TREC接受了多任务设置的培训，以预测会话中的下一个项目及其主要类别和子类别。我们的多任务策略使该模型收敛更快，并显着改善了整体性能。实验结果表明，使用我们在两个数据集中稀疏项目上提出的方法进行了显着的性能增长。

translated by 谷歌翻译

Recent trends and analysis of Generative Adversarial Networks in Cervical Cancer Imaging

Tamanna Sood

分类：计算机视觉

2022-09-23

宫颈癌是女性最常见的癌症类型之一。它占女性所有癌症的6-29％。它是由人类乳头状瘤病毒（HPV）引起的。宫颈癌的5年生存机会范围从17％-92％的范围内，具体取决于检测到的阶段。早期发现该疾病有助于更好地治疗患者。如今，许多深度学习算法被用于检测宫颈癌。一种被称为生成对抗网络（GAN）的深度学习技术的特殊类别正在赶上宫颈癌的筛查，检测和分类中的速度。在这项工作中，我们介绍了有关使用各种GAN模型，其应用以及用于其在宫颈癌成像领域的性能评估的评估指标的最新趋势的详细分析。

translated by 谷歌翻译

Multiple Waypoint Navigation in Unknown Indoor Environments

Shivam Sood , Jaskaran Singh Sodhi , Parv Maheshwari , Karan Uppal , Debashish Chakravarty

分类：机器人

2022-09-18

室内运动计划的重点是解决通过混乱环境导航代理的问题。迄今为止，在该领域已经完成了很多工作，但是这些方法通常无法找到计算廉价的在线路径计划和路径最佳之间的最佳平衡。除此之外，这些作品通常证明是单一启动单目标世界的最佳性。为了应对这些挑战，我们为在未知室内环境中进行导航的多个路径路径计划者和控制器堆栈，在该环境中，路点将目标与机器人必须在达到目标之前必须穿越的中介点一起。我们的方法利用全球规划师（在任何瞬间找到下一个最佳航路点），本地规划师（计划通往特定航路点的路径）以及自适应模型预测性控制策略（用于强大的系统控制和更快的操作）。我们在一组随机生成的障碍图，中间航路点和起始目标对上评估了算法，结果表明计算成本显着降低，具有高度准确性和可靠的控制。

translated by 谷歌翻译

Implicit Session Contexts for Next-Item Recommendations

Sejoon Oh , Ankur Bhardwaj , Jongseok Han , Sungchul Kim , Ryan A. Rossi , Srijan Kumar

分类：机器学习

2022-08-18

基于会话的建议系统在会话中捕获用户的短期兴趣。会话上下文（即，会话中用户在会话中的高级兴趣或意图）在大多数数据集中都没有明确给出，并且隐式推断会话上下文作为项目级属性的汇总是粗略的。在本文中，我们提出了ISCON，该ISCON隐含地将会议上下文化。ISCON首先通过创建会话信息图，学习图嵌入和聚类来为会话生成隐式上下文，以将会话分配给上下文。然后，ISCON训练会话上下文预测器，并使用预测上下文的嵌入来增强下一项目的预测准确性。四个数据集的实验表明，ISCON比最新模型具有优越的下一项目预测准确性。REDDIT数据集中的ISCON的案例研究证实，分配的会话上下文是独特而有意义的。

translated by 谷歌翻译

Signed Link Representation in Continuous-Time Dynamic Signed Networks

Mohit Raghavendra , Kartik Sharma , Anand Kumar M , Srijan Kumar

分类：机器学习

2022-07-07

签名的网络使我们能够对双方的关系和互动进行建模，例如朋友/敌人，支持/反对等。这些交互通常在真实数据集中是暂时的，在这些数据集中，节点和边缘会随时间出现。因此，学习签名网络的动态对于有效预测未来联系的符号和强度至关重要。现有的作品模型签名网络或动态网络，但并非都在一起。在这项工作中，我们研究了动态签名的网络，在这些网络中，链接都随时间签名和演变。我们的模型使用内存模块和平衡聚合（因此，名称SEMBA）学习了签名的链接的演变。每个节点都维护两个单独的内存编码，以实现正相互作用和负相互作用。在新边缘的到来时，每个交互节点汇总了此签名的信息，并利用平衡理论。节点嵌入是使用更新的内存生成的，然后将其用于训练多个下游任务，包括链接标志预测和链接权重预测。我们的结果表明，SEMBA的表现优于所有基准，即通过获得AUC增长8％，而FPR降低了50％。关于预测签名权重的任务的结果表明，SEMBA将平方误差降低了9％，同时降低了KL-Divergence对预测签名权重的分布的减少69％。

translated by 谷歌翻译

Video + CLIP Baseline for Ego4D Long-term Action Anticipation

Srijan Das , Michael S. Ryoo

分类：计算机视觉 | 机器学习

2022-07-01

在本报告中，我们介绍了图像文本模型的适应性，以进行长期行动预期。我们的视频 +剪辑框架利用了大规模训练的配对图像文本模型：剪辑和视频编码器慢速网络。剪辑嵌入提供了对与操作相关的对象的细粒度理解，而慢速网络负责在几帧的视频片段中对时间信息进行建模。我们表明，从两个编码器获得的功能相互互补，因此在长期行动预期的任务上，在EGO4D上的基线表现优于基线。我们的代码可在github.com/srijandas07/clip_baseline_lta_ego4d上找到。

translated by 谷歌翻译