在本文中,我们研究了部分可观察到的动态系统的在线增强学习(RL)。我们专注于预测状态表示(PSRS)模型,该模型是捕获其他知名模型(例如可观察到的马尔可夫决策过程(POMDP))的表达模型。 PSR使用一组未来观察结果的预测表示状态,并完全使用可观察的数量来定义。我们为PSRS开发了一种新型的基于模型的算法,该算法可以在样本复杂性中学习相对于系统的所有相关参数的多项式缩放的近乎最佳策略。我们的算法自然可以与功能近似合作,以扩展到具有较大状态和观察空间的系统。我们表明,给定一个可实现的模型类别,学习近乎最佳策略的样本复杂性仅相对于模型类的统计复杂性,而没有任何明确的多项式依赖性对状态和观察空间的大小依赖。值得注意的是,我们的工作是表明多项式样本复杂性与PSR中全球最佳政策竞争的第一项工作。最后,我们演示了如何直接使用我们的一般定理来得出特殊模型的样本复杂性界限,包括$ m $ $ step弱揭示和$ m $ $ $ - 可解码的表格pomdps,具有低率潜在过渡的POMDP和具有线性pomdps的POMDP排放和潜在过渡。
translated by 谷歌翻译
离线增强学习(RL)的样本效率保证通常依赖于对功能类别(例如Bellman-Completeness)和数据覆盖范围(例如,全政策浓缩性)的强有力的假设。尽管最近在放松这些假设方面做出了努力,但现有作品只能放松这两个因素之一,从而使另一个因素的强烈假设完好无损。作为一个重要的开放问题,我们是否可以实现对这两个因素的假设较弱的样本效率离线RL?在本文中,我们以积极的态度回答了这个问题。我们基于MDP的原始偶对偶进行分析了一种简单的算法,其中双重变量(打折占用)是使用密度比函数对离线数据进行建模的。通过适当的正则化,我们表明该算法仅在可变性和单极浓缩性下具有多项式样品的复杂性。我们还基于不同的假设提供了替代分析,以阐明离线RL原始二算法的性质。
translated by 谷歌翻译
政策优化,通过大规模优化技术最大化价值函数来学习兴趣的政策,位于现代强化学习(RL)的核心。除了价值最大化之外,其他实际考虑因素也出现,包括令人鼓舞的探索,以及确保由于安全,资源和运营限制而确保学习政策的某些结构性。这些考虑通常可以通过诉诸正规化的RL来占据,这增加了目标值函数,并通过结构促进正则化术语。专注于无限范围打折马尔可夫决策过程,本文提出了一种用于解决正规化的RL的广义策略镜血压(GPMD)算法。作为策略镜血压LAN的概括(2021),所提出的算法可以容纳一般类凸常规的常规阶级,以及在使用中的规则器的认识到的广泛的Bregman分歧。我们展示了我们的算法在整个学习速率范围内,以无维的方式在全球解决方案的整个学习速率范围内融合到全球解决方案,即使常规器缺乏强大的凸起和平滑度。此外,在不精确的策略评估和不完美的政策更新方面,该线性收敛特征是可透明的。提供数值实验以证实GPMD的适用性和吸引力性能。
translated by 谷歌翻译
In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables. We formalize this scenario by building a new Chinese benchmark KnowSQL consisting of domain-specific questions covering various domains. We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples. More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing. Experiments using ReGrouP demonstrate a significant 28.2% improvement overall on KnowSQL.
translated by 谷歌翻译
Most existing text-video retrieval methods focus on cross-modal matching between the visual content of offline videos and textual query sentences. However, in real scenarios, online videos are frequently accompanied by relevant text information such as titles, tags, and even subtitles, which can be utilized to match textual queries. This inspires us to generate associated captions from offline videos to help with existing text-video retrieval methods. To do so, we propose to use the zero-shot video captioner with knowledge of pre-trained web-scale models (e.g., CLIP and GPT-2) to generate captions for offline videos without any training. Given the captions, one question naturally arises: what can auxiliary captions do for text-video retrieval? In this paper, we present a novel framework Cap4Video, which makes use of captions from three aspects: i) Input data: The video and captions can form new video-caption pairs as data augmentation for training. ii) Feature interaction: We perform feature interaction between video and caption to yield enhanced video representations. iii) Output score: The Query-Caption matching branch can be complementary to the original Query-Video matching branch for text-video retrieval. We conduct thorough ablation studies to demonstrate the effectiveness of our method. Without any post-processing, our Cap4Video achieves state-of-the-art performance on MSR-VTT (51.4%), VATEX (66.6%), MSVD (51.8%), and DiDeMo (52.0%).
translated by 谷歌翻译
Vision-language models (VLMs) that are pre-trained on large-scale image-text pairs have demonstrated impressive transferability on a wide range of visual tasks. Transferring knowledge from such powerful pre-trained VLMs is emerging as a promising direction for building effective video recognition models. However, the current exploration is still limited. In our opinion, the greatest charm of pre-trained vision-language models is to build a bridge between visual and textual domains. In this paper, we present a novel framework called BIKE which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We propose a Video Attribute Association mechanism which leverages the Video-to-Text knowledge to generate textual auxiliary attributes to complement video recognition. ii) We also present a Temporal Concept Spotting mechanism which uses the Text-to-Video expertise to capture temporal saliency in a parameter-free manner to yield enhanced video representation. The extensive studies on popular video datasets (ie, Kinetics-400 & 600, UCF-101, HMDB-51 and ActivityNet) show that our method achieves state-of-the-art performance in most recognition scenarios, eg, general, zero-shot, and few-shot video recognition. To the best of our knowledge, our best model achieves a state-of-the-art accuracy of 88.4% on challenging Kinetics-400 with the released CLIP pre-trained model.
translated by 谷歌翻译
The node-place model has been widely used to classify and evaluate transit stations, which sheds light on individual travel behaviors and supports urban planning through effectively integrating land use and transportation development. This article adapts this model to investigate whether and how node, place, and mobility would be associated with the transmission risks and presences of the local COVID-19 cases in a city. Similar studies on the model and its relevance to COVID-19, according to our knowledge, have not been undertaken before. Moreover, the unique metric drawn from detailed visit history of the infected, i.e., the COVID-19 footprints, is proposed and exploited. This study then empirically uses the adapted model to examine the station-level factors affecting the local COVID-19 footprints. The model accounts for traditional measures of the node and place as well as actual human mobility patterns associated with the node and place. It finds that stations with high node, place, and human mobility indices normally have more COVID-19 footprints in proximity. A multivariate regression is fitted to see whether and to what degree different indices and indicators can predict the COVID-19 footprints. The results indicate that many of the place, node, and human mobility indicators significantly impact the concentration of COVID-19 footprints. These are useful for policy-makers to predict and monitor hotspots for COVID-19 and other pandemics transmission.
translated by 谷歌翻译
Text-to-SQL semantic parsing is an important NLP task, which greatly facilitates the interaction between users and the database and becomes the key component in many human-computer interaction systems. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MultiSpider, the largest multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MultiSpider, we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVe (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
translated by 谷歌翻译
Practical applications employing deep learning must guarantee inference quality. However, we found that the inference quality of state-of-the-art and state-of-the-practice in practical applications has a long tail distribution. In the real world, many tasks have strict requirements for the quality of deep learning inference, such as safety-critical and mission-critical tasks. The fluctuation of inference quality seriously affects its practical applications, and the quality at the tail may lead to severe consequences. State-of-the-art and state-of-the-practice with outstanding inference quality designed and trained under loose constraints still have poor inference quality under constraints with practical application significance. On the one hand, the neural network models must be deployed on complex systems with limited resources. On the other hand, safety-critical and mission-critical tasks need to meet more metric constraints while ensuring high inference quality. We coin a new term, ``tail quality,'' to characterize this essential requirement and challenge. We also propose a new metric, ``X-Critical-Quality,'' to measure the inference quality under certain constraints. This article reveals factors contributing to the failure of using state-of-the-art and state-of-the-practice algorithms and systems in real scenarios. Therefore, we call for establishing innovative methodologies and tools to tackle this enormous challenge.
translated by 谷歌翻译
Previous computation models either have equivalent abilities in representing all computations but fail to provide primitive operators for programming complex algorithms or lack generalized expression ability to represent newly-added computations. This article presents a unified computation model with generalized expression ability and a concise set of primitive operators for programming high-level algorithms. We propose a unified data abstraction -- Tensor of List, and offer a unified computation model based on Tensor of List, which we call the ToL model (in short, ToL). ToL introduces five atomic computations that can represent any elementary computation by finite composition, ensured with strict formal proof. Based on ToL, we design a pure-functional language -- ToLang. ToLang provides a concise set of primitive operators that can be used to program complex big data and AI algorithms. Our evaluations show ToL has generalized expression ability and a built-in performance indicator, born with a strictly defined computation metric -- elementary operation count (EOPs), consistent with FLOPs within a small error range.
translated by 谷歌翻译