智能论文笔记

Fine-grained Few-shot Recognition by Deep Object Parsing

Pengkai Zhu , Ruizhao Zhu , Samarth Mishra , Venkatesh Saligrama

分类：计算机视觉 | 人工智能

2022-07-14

在我们的框架中，一个对象由k个不同的零件或单位组成，我们通过推断k零件来解析测试实例，其中每个零件在特征空间中占据着不同的位置，并且该实例在此位置，表现为零件模板的主动子集在所有实例中共享。我们通过比较其活性模板及其零件位置的相对几何形状与所呈现的几个实例的相对几何形状来识别测试实例。我们提出了一种端到端训练方法，以在卷积主链上学习零件模板。为了打击视觉失真，例如方向，姿势和大小，我们学习多尺度模板，以及在测试时间分析和匹配这些量表的实例。我们表明，我们的方法与最先进的方法具有竞争力，并且由于解析也具有解释性。

translated by 谷歌翻译

Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data

Samarth Mishra , Rameswar Panda , Cheng Perng Phoo , Chun-Fu Chen , Leonid Karlinsky , Kate Saenko , Venkatesh Saligrama , Rogerio S. Feris

分类：计算机视觉 | 机器学习

2021-11-30

在Imagenet或其他大规模数据数据上的预培训模型导致计算机愿景的主要进步，尽管伴随着与策划成本，隐私，使用权和道德问题相关的缺点。在本文中，我们首次研究了基于由图形模拟器生成的合成数据到来自非常不同的域的下游任务的培训模型的可转换性。在使用此类合成数据进行预培训时，我们发现不同任务的下游性能受到不同配置的不同配置（例如，照明，对象姿势，背景等），并且没有单尺寸适合 - 所有解决方案。因此，更好地将合成的预训练数据量身定制到特定的下游任务，以获得最佳性能。我们介绍Task2SIM，一个统一的模型将下游任务表示映射到最佳模拟参数，以为它们生成合成的预训练数据。 Task2SIM通过培训学习此映射，以查找一组“看到”任务上的最佳参数集。曾经训练过，它可以用于预测一个新颖的“看不见”任务的最佳仿真参数，而无需额外的培训。鉴于每级图像数量的预算，我们具有20个不同的下游任务的广泛实验，显示了Task2SIM的任务 - 自适应预训练数据导致明显更好的下游性能，而不是在看见和看不见的任务上的非自适应选择模拟参数。它甚至是竞争对手的真实图像的竞争力。

translated by 谷歌翻译

Surprisingly Simple Semi-Supervised Domain Adaptation with Pretraining and Consistency

Samarth Mishra , Kate Saenko , Venkatesh Saligrama

分类：计算机视觉 | 机器学习

2021-01-29

大多数现代无人监督域适应（UDA）方法源于域对齐，即，学习源和目标功能，使用源标签学习目标域分类器。在半监督域适应（SSDA）中，当学习者可以访问少量目标域标签时，先前的方法遵循UDA理论以使用域对齐进行学习。我们表明SSDA的情况是不同的，并且可以在不需要对齐的情况下学习良好的目标分类器。我们使用自我监督的预测（通过旋转预测）和一致性正则化来实现良好的分开的目标集群，同时在学习低误差目标分类器时。凭借我们预先推价和一致性（PAC）方法，我们在该半监控域适应任务上实现了最新的目标准确性，超过了多个数据集的多个对抗域对齐方法。 PAC，同时使用简单的技术，对DomainNet和Visda-17等大而挑战的SSDA基准进行了非常好的，通常通过相当的边距来表现最近的艺术状态。我们的实验代码可以在https://github.com/venkatesh-saligrama/pac找到

translated by 谷歌翻译

Statistical Machine Translation for Indic Languages

Sudhansu Bala Das , Divyajoti Panda , Tapas Kumar Mishra , Bidyut Kr. Patra

分类：自然语言处理

2023-01-02

Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES

translated by 谷歌翻译

Skeletal Video Anomaly Detection using Deep Learning: Survey, Challenges and Future Directions

Pratik K. Mishra , Alex Mihailidis , Shehroz S. Khan

分类：计算机视觉

2022-12-31

The existing methods for video anomaly detection mostly utilize videos containing identifiable facial and appearance-based features. The use of videos with identifiable faces raises privacy concerns, especially when used in a hospital or community-based setting. Appearance-based features can also be sensitive to pixel-based noise, straining the anomaly detection methods to model the changes in the background and making it difficult to focus on the actions of humans in the foreground. Structural information in the form of skeletons describing the human motion in the videos is privacy-protecting and can overcome some of the problems posed by appearance-based features. In this paper, we present a survey of privacy-protecting deep learning anomaly detection methods using skeletons extracted from videos. We present a novel taxonomy of algorithms based on the various learning approaches. We conclude that skeleton-based approaches for anomaly detection can be a plausible privacy-protecting alternative for video anomaly detection. Lastly, we identify major open research questions and provide guidelines to address them.

translated by 谷歌翻译

Offline Policy Optimization in RL with Variance Regularizaton

Riashat Islam , Samarth Sinha , Homanga Bharadhwaj , Samin Yeasar Arnob , Zhuoran Yang , Animesh Garg , Zhaoran Wang , Lihong Li , Doina Precup

分类：机器学习

2022-12-29

Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications. This is often because off-policy RL algorithms suffer from distributional shift, due to mismatch between dataset and the target policy, leading to high variance and over-estimation of value functions. In this work, we propose variance regularization for offline RL algorithms, using stationary distribution corrections. We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer. The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms. We show that the regularizer leads to a lower bound to the offline policy optimization objective, which can help avoid over-estimation errors, and explains the benefits of our approach across a range of continuous control domains when compared to existing state-of-the-art algorithms.

translated by 谷歌翻译

Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data

Harsh Rangwani , Sumukh K Aithal , Mayank Mishra , R. Venkatesh Babu

分类：机器学习 | 计算机视觉

2022-12-28

Real-world datasets exhibit imbalances of varying types and degrees. Several techniques based on re-weighting and margin adjustment of loss are often used to enhance the performance of neural networks, particularly on minority classes. In this work, we analyze the class-imbalanced learning problem by examining the loss landscape of neural networks trained with re-weighting and margin-based techniques. Specifically, we examine the spectral density of Hessian of class-wise loss, through which we observe that the network weights converge to a saddle point in the loss landscapes of minority classes. Following this observation, we also find that optimization methods designed to escape from saddle points can be effectively used to improve generalization on minority classes. We further theoretically and empirically demonstrate that Sharpness-Aware Minimization (SAM), a recent technique that encourages convergence to a flat minima, can be effectively used to escape saddle points for minority classes. Using SAM results in a 6.2\% increase in accuracy on the minority classes over the state-of-the-art Vector Scaling Loss, leading to an overall average increase of 4\% across imbalanced datasets. The code is available at: https://github.com/val-iisc/Saddle-LongTail.

translated by 谷歌翻译

Privacy-Protecting Behaviours of Risk Detection in People with Dementia using Videos

Pratik K. Mishra , Andrea Iaboni , Bing Ye , Kristine Newman , Alex Mihailidis , Shehroz S. Khan

分类：计算机视觉

2022-12-20

People living with dementia often exhibit behavioural and psychological symptoms of dementia that can put their and others' safety at risk. Existing video surveillance systems in long-term care facilities can be used to monitor such behaviours of risk to alert the staff to prevent potential injuries or death in some cases. However, these behaviours of risk events are heterogeneous and infrequent in comparison to normal events. Moreover, analyzing raw videos can also raise privacy concerns. In this paper, we present two novel privacy-protecting video-based anomaly detection approaches to detect behaviours of risks in people with dementia. We either extracted body pose information as skeletons and use semantic segmentation masks to replace multiple humans in the scene with their semantic boundaries. Our work differs from most existing approaches for video anomaly detection that focus on appearance-based features, which can put the privacy of a person at risk and is also susceptible to pixel-based noise, including illumination and viewing direction. We used anonymized videos of normal activities to train customized spatio-temporal convolutional autoencoders and identify behaviours of risk as anomalies. We show our results on a real-world study conducted in a dementia care unit with patients with dementia, containing approximately 21 hours of normal activities data for training and 9 hours of data containing normal and behaviours of risk events for testing. We compared our approaches with the original RGB videos and obtained an equivalent area under the receiver operating characteristic curve performance of 0.807 for the skeleton-based approach and 0.823 for the segmentation mask-based approach. This is one of the first studies to incorporate privacy for the detection of behaviours of risks in people with dementia.

translated by 谷歌翻译

Self-Instruct: Aligning Language Model with Self Generated Instructions

Yizhong Wang , Yeganeh Kordi , Swaroop Mishra , Alisa Liu , Noah A. Smith , Daniel Khashabi , Hannaneh Hajishirzi

分类：自然语言处理 | 人工智能

2022-12-20

Large "instruction-tuned" language models (finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We introduce Self-Instruct, a framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off its own generations. Our pipeline generates instruction, input, and output samples from a language model, then prunes them before using them to finetune the original model. Applying our method to vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT_001, which is trained with private user data and human annotations. For further evaluation, we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with Self-Instruct outperforms using existing public instruction datasets by a large margin, leaving only a 5% absolute gap behind InstructGPT_001. Self-Instruct provides an almost annotation-free method for aligning pre-trained language models with instructions, and we release our large synthetic dataset to facilitate future studies on instruction tuning.

translated by 谷歌翻译

Do language models have coherent mental models of everyday things?

Yuling Gu , Bhavana Dalvi Mishra , Peter Clark

分类：自然语言处理 | 人工智能

2022-12-20

When people think of everyday things like an "egg," they typically have a mental image associated with it. This commonsense knowledge helps us understand how these everyday things work and how to interact with them. For example, when someone tries to make a fried egg, they know that it has a shell and that it can be cracked open to reveal the egg white and yolk inside. However, if a system does not have a coherent picture of such everyday things, thinking that the egg yolk surrounds the shell, then it might have to resort to ridiculous approaches such as trying to scrape the egg yolk off the shell into the pan. Do language models have a coherent picture of such everyday things? To investigate this, we propose a benchmark dataset consisting of 100 everyday things, their parts, and the relationships between these parts. We observe that state-of-the-art pre-trained language models (LMs) like GPT-3 and Macaw have fragments of knowledge about these entities, but they fail to produce consistent parts mental models. We propose a simple extension to these LMs where we apply a constraint satisfaction layer on top of raw predictions from LMs to produce more consistent and accurate parts mental models of everyday things.

translated by 谷歌翻译