我们提出了一种用于少量视频分类的新方法,该方法可以执行外观和时间对齐。特别是,给定一对查询和支持视频,我们通过框架级功能匹配进行外观对齐,以在视频之间达到外观相似性得分,同时利用时间订单保留的先验来获得视频之间的时间相似性得分。此外,我们介绍了一些视频分类框架,该框架利用了多个步骤的上述外观和时间相似性得分,即基于原型的训练和测试,以及电感和thresductive和转导的原型细化。据我们所知,我们的工作是第一个探索跨传感器的视频分类的工作。动力学和某些事物的V2数据集进行了广泛的实验表明,外观和时间对齐对于具有时间订单敏感性的数据集至关重要。我们的方法与两个数据集上的以前方法相似或更好的结果。我们的代码可在https://github.com/vinairesearch/fsvc-ata上找到。
translated by 谷歌翻译
在这项工作中,我们建议使用分布式样本,即来自目标类别外部的未标记样本,以改善几乎没有记录的学习。具体而言,我们利用易于可用的分布样品来驱动分类器,以避免通过最大化原型到分布样品的距离,同时最大程度地减少分布样品的距离(即支持,查询数据),以避免使用分类器。。我们的方法易于实施,不可知论的是提取器,轻量级,而没有任何额外的预训练费用,并且适用于归纳和跨传输设置。对各种标准基准测试的广泛实验表明,所提出的方法始终提高具有不同架构的预审计网络的性能。
translated by 谷歌翻译
在本文中,我们对单一和多对象文本到图像合成的最先进方法进行了研究,并提出了用于评估这些方法的共同框架。我们首先识别当前评估文本到图像模型的几个常见问题,即:(i)用于图像质量评估的常用度量,例如,Inception得分(是),通常是对单个对象的错误匹配案例或滥用多目标案例; (ii)在现有的R精度(RP)和SOA度量中出现过烧点现象,用于分别评估文本相关性和对象精度方面; (iii)在多目标案例评估中的许多重要因素主要被解雇,例如对象保真度,位置对准,计数对准; (iv)基于当前度量的方法的排名与真实图像高度不一致。然后,为了克服这些限制,我们提出了一个组合的现有和新度量标准,以系统地评估方法。对于现有的指标,我们通过使用温度缩放来校准所使用的分类器的置信度的改进版本的名称为*;我们还提出了一种解决方案来减轻RP和SOA的过度问题。关于在多目标情况下缺乏重要评估因素的一套新度量,我们开发CA用于计数对齐,PA用于定位对齐,以对象为中心,是(O-IS),以对象为中心的FID(O- FID)对于对象保真度。因此,我们的基准导致现有方法中高度一致的排名,与人类评估良好。我们还通过众所周知的Attngan简单修改,为基准创建一个强大的基线模型(Attngan ++)。我们将发布此工具箱进行统一评估,所谓的明智,以标准化文本到图像综合模型的评估。
translated by 谷歌翻译
由于GaN潜在空间的勘探和利用,近年来,现实世界的图像操纵实现了奇妙的进展。 GaN反演是该管道的第一步,旨在忠实地将真实图像映射到潜在代码。不幸的是,大多数现有的GaN反演方法都无法满足下面列出的三个要求中的至少一个:重建质量,可编辑性和快速推断。我们在本研究中提出了一种新的两阶段策略,同时适合所有要求。在第一阶段,我们训练编码器将输入图像映射到StyleGan2 $ \ Mathcal {W} $ - 空间,这被证明具有出色的可编辑性,但重建质量较低。在第二阶段,我们通过利用一系列HyperNetWorks来补充初始阶段的重建能力以在反转期间恢复缺失的信息。这两个步骤互相补充,由于Hypernetwork分支和由于$ \ Mathcal {W} $ - 空间中的反转,因此由于HyperNetwork分支和优异的可编辑性而相互作用。我们的方法完全是基于编码器的,导致极快的推断。关于两个具有挑战性的数据集的广泛实验证明了我们方法的优越性。
translated by 谷歌翻译
Here, we demonstrate how machine learning enables the prediction of comonomers reactivity ratios based on the molecular structure of monomers. We combined multi-task learning, multi-inputs, and Graph Attention Network to build a model capable of predicting reactivity ratios based on the monomers chemical structures.
translated by 谷歌翻译
Modern deep neural networks have achieved superhuman performance in tasks from image classification to game play. Surprisingly, these various complex systems with massive amounts of parameters exhibit the same remarkable structural properties in their last-layer features and classifiers across canonical datasets. This phenomenon is known as "Neural Collapse," and it was discovered empirically by Papyan et al. \cite{Papyan20}. Recent papers have theoretically shown the global solutions to the training network problem under a simplified "unconstrained feature model" exhibiting this phenomenon. We take a step further and prove the Neural Collapse occurrence for deep linear network for the popular mean squared error (MSE) and cross entropy (CE) loss. Furthermore, we extend our research to imbalanced data for MSE loss and present the first geometric analysis for Neural Collapse under this setting.
translated by 谷歌翻译
Machine Reading Comprehension has become one of the most advanced and popular research topics in the fields of Natural Language Processing in recent years. The classification of answerability questions is a relatively significant sub-task in machine reading comprehension; however, there haven't been many studies. Retro-Reader is one of the studies that has solved this problem effectively. However, the encoders of most traditional machine reading comprehension models in general and Retro-Reader, in particular, have not been able to exploit the contextual semantic information of the context completely. Inspired by SemBERT, we use semantic role labels from the SRL task to add semantics to pre-trained language models such as mBERT, XLM-R, PhoBERT. This experiment was conducted to compare the influence of semantics on the classification of answerability for the Vietnamese machine reading comprehension. Additionally, we hope this experiment will enhance the encoder for the Retro-Reader model's Sketchy Reading Module. The improved Retro-Reader model's encoder with semantics was first applied to the Vietnamese Machine Reading Comprehension task and obtained positive results.
translated by 谷歌翻译
RTE is a significant problem and is a reasonably active research community. The proposed research works on the approach to this problem are pretty diverse with many different directions. For Vietnamese, the RTE problem is moderately new, but this problem plays a vital role in natural language understanding systems. Currently, methods to solve this problem based on contextual word representation learning models have given outstanding results. However, Vietnamese is a semantically rich language. Therefore, in this paper, we want to present an experiment combining semantic word representation through the SRL task with context representation of BERT relative models for the RTE problem. The experimental results give conclusions about the influence and role of semantic representation on Vietnamese in understanding natural language. The experimental results show that the semantic-aware contextual representation model has about 1% higher performance than the model that does not incorporate semantic representation. In addition, the effects on the data domain in Vietnamese are also higher than those in English. This result also shows the positive influence of SRL on RTE problem in Vietnamese.
translated by 谷歌翻译
To the best of our knowledge, this paper made the first attempt to answer whether word segmentation is necessary for Vietnamese sentiment classification. To do this, we presented five pre-trained monolingual S4- based language models for Vietnamese, including one model without word segmentation, and four models using RDRsegmenter, uitnlp, pyvi, or underthesea toolkits in the pre-processing data phase. According to comprehensive experimental results on two corpora, including the VLSP2016-SA corpus of technical article reviews from the news and social media and the UIT-VSFC corpus of the educational survey, we have two suggestions. Firstly, using traditional classifiers like Naive Bayes or Support Vector Machines, word segmentation maybe not be necessary for the Vietnamese sentiment classification corpus, which comes from the social domain. Secondly, word segmentation is necessary for Vietnamese sentiment classification when word segmentation is used before using the BPE method and feeding into the deep learning model. In this way, the RDRsegmenter is the stable toolkit for word segmentation among the uitnlp, pyvi, and underthesea toolkits.
translated by 谷歌翻译
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.
translated by 谷歌翻译