由于视频帧中存在各种干扰,因此基于视频的人重新识别(REID)具有挑战性。最近的方法使用时间聚合策略来解决此问题。在这项工作中,我们提出了一个新颖的环境感应注意网络(CSA-NET),该网络既改进框架特征提取和时间聚集步骤。首先,我们介绍了上下文传感渠道注意(CSCA)模块,该模块强调了每个帧信息渠道的响应。这些信息通道不仅可以参考每个单独的框架,还可以参考整个序列的内容。因此,CSCA探索了序列的每个帧的个性和全局上下文。其次,我们提出了对比特征聚合(CFA)模块,该模块预测了时间聚集的框架权重。在这里,每个帧的重量是以对比的方式确定的:即,不仅是由每个单独框架的质量,而且还取决于顺序中其他帧的平均质量。因此,它有效地促进了相对良好的框架的贡献。四个数据集的广泛实验结果表明,CSA-NET始终达到最新的性能。
translated by 谷歌翻译
决策者经常面对“许多匪徒”问题,其中必须同时学习相关但异构的情境匪徒实例。例如,大型零售商可能希望在许多商店中动态地学习产品需求,以解决定价或库存问题,这使得可以共同学习为服务类似客户的商店;或者,医院网络可能希望在许多提供商中动态学习患者风险以分配个性化干预措施,这使得可以为服务类似患者群体的医院共同学习。我们研究每个匪徒实例中未知参数可以分解为全局参数加上稀疏实例特定术语的设置。然后,我们提出了一种新颖的两级估计器,通过使用强大的统计数据组合(在类似的实例中学到)和套索回归(将结果进行替代),以样本有效的方式利用这种结构。我们在强盗算法中嵌入了这个估计器,并证明它在上下文维度下,它可以改善渐近遗憾界限。这种改进是数据较差的实例的指数。我们进一步展示了我们的结果如何依赖于强盗实例的基础网络结构。
translated by 谷歌翻译
In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables. We formalize this scenario by building a new Chinese benchmark KnowSQL consisting of domain-specific questions covering various domains. We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples. More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing. Experiments using ReGrouP demonstrate a significant 28.2% improvement overall on KnowSQL.
translated by 谷歌翻译
Accompanying rapid industrialization, humans are suffering from serious air pollution problems. The demand for air quality prediction is becoming more and more important to the government's policy-making and people's daily life. In this paper, We propose GreenEyes -- a deep neural network model, which consists of a WaveNet-based backbone block for learning representations of sequences and an LSTM with a Temporal Attention module for capturing the hidden interactions between features of multi-channel inputs. To evaluate the effectiveness of our proposed method, we carry out several experiments including an ablation study on our collected and preprocessed air quality data near HKUST. The experimental results show our model can effectively predict the air quality level of the next timestamp given any segment of the air quality data from the data set. We have also released our standalone dataset at https://github.com/AI-Huang/IAQI_Dataset The model and code for this paper are publicly available at https://github.com/AI-Huang/AirEvaluation
translated by 谷歌翻译
Conditional variational models, using either continuous or discrete latent variables, are powerful for open-domain dialogue response generation. However, previous works show that continuous latent variables tend to reduce the coherence of generated responses. In this paper, we also found that discrete latent variables have difficulty capturing more diverse expressions. To tackle these problems, we combine the merits of both continuous and discrete latent variables and propose a Hybrid Latent Variable (HLV) method. Specifically, HLV constrains the global semantics of responses through discrete latent variables and enriches responses with continuous latent variables. Thus, we diversify the generated responses while maintaining relevance and coherence. In addition, we propose Conditional Hybrid Variational Transformer (CHVT) to construct and to utilize HLV with transformers for dialogue generation. Through fine-grained symbolic-level semantic information and additive Gaussian mixing, we construct the distribution of continuous variables, prompting the generation of diverse expressions. Meanwhile, to maintain the relevance and coherence, the discrete latent variable is optimized by self-separation training. Experimental results on two dialogue generation datasets (DailyDialog and Opensubtitles) show that CHVT is superior to traditional transformer-based variational mechanism w.r.t. diversity, relevance and coherence metrics. Moreover, we also demonstrate the benefit of applying HLV to fine-tuning two pre-trained dialogue models (PLATO and BART-base).
translated by 谷歌翻译
Complex dialogue mappings (CDM), including one-to-many and many-to-one mappings, tend to make dialogue models generate incoherent or dull responses, and modeling these mappings remains a huge challenge for neural dialogue systems. To alleviate these problems, methods like introducing external information, reconstructing the optimization function, and manipulating data samples are proposed, while they primarily focus on avoiding training with CDM, inevitably weakening the model's ability of understanding CDM in human conversations and limiting further improvements in model performance. This paper proposes a Sentence Semantic \textbf{Seg}mentation guided \textbf{C}onditional \textbf{V}ariational \textbf{A}uto-\textbf{E}ncoder (SegCVAE) method which can model and take advantages of the CDM data. Specifically, to tackle the incoherent problem caused by one-to-many, SegCVAE uses response-related prominent semantics to constrained the latent variable. To mitigate the non-diverse problem brought by many-to-one, SegCVAE segments multiple prominent semantics to enrich the latent variables. Three novel components, Internal Separation, External Guidance, and Semantic Norms, are proposed to achieve SegCVAE. On dialogue generation tasks, both the automatic and human evaluation results show that SegCVAE achieves new state-of-the-art performance.
translated by 谷歌翻译
In this work, we explore combining automatic hyperparameter tuning and optimization for federated learning (FL) in an online, one-shot procedure. We apply a principled approach on a method for adaptive client learning rate, number of local steps, and batch size. In our federated learning applications, our primary motivations are minimizing communication budget as well as local computational resources in the training pipeline. Conventionally, hyperparameter tuning methods involve at least some degree of trial-and-error, which is known to be sample inefficient. In order to address our motivations, we propose FATHOM (Federated AuTomatic Hyperparameter OptiMization) as a one-shot online procedure. We investigate the challenges and solutions of deriving analytical gradients with respect to the hyperparameters of interest. Our approach is inspired by the fact that, with the exception of local data, we have full knowledge of all components involved in our training process, and this fact can be exploited in our algorithm impactfully. We show that FATHOM is more communication efficient than Federated Averaging (FedAvg) with optimized, static valued hyperparameters, and is also more computationally efficient overall. As a communication efficient, one-shot online procedure, FATHOM solves the bottleneck of costly communication and limited local computation, by eliminating a potentially wasteful tuning process, and by optimizing the hyperparamters adaptively throughout the training procedure without trial-and-error. We show our numerical results through extensive empirical experiments with the Federated EMNIST-62 (FEMNIST) and Federated Stack Overflow (FSO) datasets, using FedJAX as our baseline framework.
translated by 谷歌翻译
Recent methods for deep metric learning have been focusing on designing different contrastive loss functions between positive and negative pairs of samples so that the learned feature embedding is able to pull positive samples of the same class closer and push negative samples from different classes away from each other. In this work, we recognize that there is a significant semantic gap between features at the intermediate feature layer and class labels at the final output layer. To bridge this gap, we develop a contrastive Bayesian analysis to characterize and model the posterior probabilities of image labels conditioned by their features similarity in a contrastive learning setting. This contrastive Bayesian analysis leads to a new loss function for deep metric learning. To improve the generalization capability of the proposed method onto new classes, we further extend the contrastive Bayesian loss with a metric variance constraint. Our experimental results and ablation studies demonstrate that the proposed contrastive Bayesian metric learning method significantly improves the performance of deep metric learning in both supervised and pseudo-supervised scenarios, outperforming existing methods by a large margin.
translated by 谷歌翻译
基于自动机的方法使机器人能够执行各种复杂的任务。但是,大多数现有的基于自动机的算法都高度依赖于已考虑任务的状态的手动定制表示,从而限制了其在深度强化学习算法中的适用性。为了解决这个问题,通过将变压器纳入强化学习中,我们开发了一个双转化器引导的时间逻辑框架(T2TL),该逻辑框架(T2TL)两次利用变压器的结构特征,即首先通过变压器模块编码LTL指令,以有效地理解对有效的理解培训期间的任务说明,然后再次通过变压器编码上下文变量,以改善任务性能。特别是,LTL指令由Co-Safe LTL指定。作为具有语义的改写操作,LTL的进展被利用以将复杂的任务分解为可学习的子目标,这不仅将非马克维亚奖励决策转换为马尔可夫的奖励决策过程,而且通过同时学习多个子 - 学习效率,提高了采样效率。任务。进一步纳入了环境不足的LTL预训练方案,以促进变压器模块的学习,从而改善LTL的表示。模拟和实验结果证明了T2TL框架的有效性。
translated by 谷歌翻译
透明的物体广泛用于工业自动化和日常生活中。但是,强大的视觉识别和对透明物体的感知一直是一个主要挑战。目前,由于光的折射和反射,大多数商用级深度摄像机仍然不擅长感知透明物体的表面。在这项工作中,我们从单个RGB-D输入中提出了一种基于变压器的透明对象深度估计方法。我们观察到,变压器的全球特征使得更容易提取上下文信息以执行透明区域的深度估计。此外,为了更好地增强细粒度的特征,功能融合模块(FFM)旨在帮助连贯的预测。我们的经验证据表明,与以前的最新基于卷积的数据集相比,我们的模型在最近的流行数据集中有了重大改进,例如RMSE增长25%,RER增长21%。广泛的结果表明,我们的基于变压器的模型可以更好地汇总对象的RGB和不准确的深度信息,以获得更好的深度表示。我们的代码和预培训模型将在https://github.com/yuchendoudou/tode上找到。
translated by 谷歌翻译