本文提出了一项新的统计分析,旨在解释自然语言处理(NLP)中训练技术的最新成就。我们证明,当预训练任务的类(例如,蒙版语言模型任务中的不同单词)的类别足够多样化,从某种意义上说,最后一个线性层的最小奇异值在预训练中(表示为$ \ \ \ \ \ Tilde {\ nu} $)很大,然后预训练可以显着提高下游任务的样本效率。特别是,我们显示转移学习过量风险享受$ o \ left(\ frac {1} {\ tilde {\ nu} \ sqrt {n}} \ right)$ rate,与$ o \ left相比(\)标准监督学习中的frac {1} {\ sqrt {m}} \ right)$ rate。在这里,$ n $是预训练数据的数量,$ m $是下游任务中的数据数,通常是$ n \ gg m $。我们的证明依赖于矢量形式的rademacher复杂性链规则来拆卸复合函数类别和修改的自我符合条件。这些技术可能具有独立的兴趣。
translated by 谷歌翻译
在表演性预测中,一个预测模型会影响生成未来数据的分布,这一现象在经典监督学习中被忽略。在这种闭环环境中,表明性能的自然衡量标准捕获了部署后预测模型产生的预期损失。最小化性能风险的核心难度是数据分布本身取决于模型参数。这种依赖性受环境的约束,而不是在学习者的控制之下。结果,即使选择凸损耗函数也可能导致高度非凸性的性能最小化问题。先前的工作已经确定了一对关于损失的一般条件以及从模型参数到分布的映射,这意味着表现风险的凸度。在本文中,我们放宽了这些假设,并专注于获得较弱的凸度概念,而无需牺牲迭代优化方法的表演风险最小化问题的舒适性。
translated by 谷歌翻译
基于大/基础模型的最新突破揭示了人工智能的模糊大道,即出价数据,大/基础模型,大型学习,$ \ cdots $。遵循该大道,我们在这里详细介绍了新引入的大型学习。具体而言,大型学习通过同时学习为一个通用基础模型建模大规模完整/不完整数据中固有的可用信息,同时学习建模多个与所有关节/条件/边际数据分布(因此命名为大型学习)。我们透露,大型学习是现有的基础模型隐含的工作。因此,我们的大型学习为灵活的设计和基础模型的改进提供了高级指导,从而加快了互联网上真正的自学学习。此外,Big Learning($ i $)还具有奇妙的灵活性,用于培训数据和培训任务定制; ($ ii $)培训后可能会提供所有联合/条件/边际数据功能; ($ iii $)通过改进的模型概括大大减少了训练测试差距; ($ iv $)统一传统的机器学习范式,例如监督学习,无监督的学习,生成学习等,并使他们的灵活合作表现出了普遍的学习范式。
translated by 谷歌翻译
超参数优化是机器学习中的一个重要问题,因为它旨在在任何模型中实现最先进的性能。在这一领域取得了巨大努力,例如随机搜索,网格搜索,贝叶斯优化。在本文中,我们将超参数优化过程模拟为马尔可夫决策过程,并用加强学习解决它。提出了一种基于软演员评论家的新型超参数优化方法和分层混合阵列。实验表明,所提出的方法可以在较短的时间内获得更好的超参数。
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译
For Prognostics and Health Management (PHM) of Lithium-ion (Li-ion) batteries, many models have been established to characterize their degradation process. The existing empirical or physical models can reveal important information regarding the degradation dynamics. However, there is no general and flexible methods to fuse the information represented by those models. Physics-Informed Neural Network (PINN) is an efficient tool to fuse empirical or physical dynamic models with data-driven models. To take full advantage of various information sources, we propose a model fusion scheme based on PINN. It is implemented by developing a semi-empirical semi-physical Partial Differential Equation (PDE) to model the degradation dynamics of Li-ion-batteries. When there is little prior knowledge about the dynamics, we leverage the data-driven Deep Hidden Physics Model (DeepHPM) to discover the underlying governing dynamic models. The uncovered dynamics information is then fused with that mined by the surrogate neural network in the PINN framework. Moreover, an uncertainty-based adaptive weighting method is employed to balance the multiple learning tasks when training the PINN. The proposed methods are verified on a public dataset of Li-ion Phosphate (LFP)/graphite batteries.
translated by 谷歌翻译