智能论文笔记

Distributed Newton-Type Methods with Communication Compression and Bernoulli Aggregation

Rustem Islamov , Xun Qian , Slavomír Hanzely , Mher Safaryan , Peter Richtárik

分类：机器学习

2022-06-07

尽管计算高昂和沟通成本，牛顿型方法仍然是分布式培训的吸引人选择，因为它们对不良条件的凸问题进行了稳健性。在这项工作中，我们研究了通信压缩和曲率信息的聚合机制，以降低这些成本，同时保留理论上优越的局部收敛保证。我们证明了Richtarik等人最近开发的三点压缩机（3PC）类。 [2022]对于梯度交流也可以推广到Hessian通信。该结果开辟了各种各样的沟通策略，例如承包压缩}和懒惰的聚合，可用于压缩过高的成本曲率信息。此外，我们发现了几种新的3PC机制，例如自适应阈值和Bernoulli聚集，这些机制需要减少通信和偶尔的Hessian计算。此外，我们扩展和分析了双向通信压缩和部分设备参与设置的方法，以迎合联合学习中应用的实际考虑。对于我们的所有方法，我们得出了与局部无关的局部线性和/或超线性收敛速率。最后，通过对凸优化问题进行广泛的数值评估，我们说明我们的设计方案与使用二阶信息相比，与几个关键基线相比，我们的设计方案达到了最新的通信复杂性。

translated by 谷歌翻译

Toward Open-World Electroencephalogram Decoding Via Deep Learning: A Comprehensive Survey

Xun Chen , Chang Li , Aiping Liu , Martin J. McKeown , Ruobing Qian , Z. Jane Wang

分类：机器学习

2021-12-08

脑电图（EEG）解码旨在识别基于非侵入性测量的脑活动的神经处理的感知，语义和认知含量。当应用于在静态，受控的实验室环境中获取的数据时，传统的EEG解码方法取得了适度的成功。然而，开放世界的环境是一个更现实的环境，在影响EEG录音的情况下，可以意外地出现，显着削弱了现有方法的鲁棒性。近年来，由于其在特征提取的卓越容量，深入学习（DL）被出现为潜在的解决方案。它克服了使用浅架构提取的“手工制作”功能或功能的限制，但通常需要大量的昂贵，专业标记的数据 - 并不总是可获得的。结合具有域特定知识的DL可能允许开发即使具有小样本数据，也可以开发用于解码大脑活动的鲁棒方法。虽然已经提出了各种DL方法来解决EEG解码中的一些挑战，但目前缺乏系统的教程概述，特别是对于开放世界应用程序。因此，本文为开放世界EEG解码提供了对DL方法的全面调查，并确定了有前途的研究方向，以激发现实世界应用中的脑电图解码的未来研究。

translated by 谷歌翻译

Basis Matters: Better Communication-Efficient Second Order Methods for Federated Learning

Xun Qian , Rustem Islamov , Mher Safaryan , Peter Richtárik

分类：机器学习

2021-11-02

分布式优化的最新进展表明，与适当的通信压缩机制的牛顿型方法可以保证与第一订单方法相比的局部速率和低通信成本。我们发现这些方法的通信成本可以进一步减少，有时会急剧下降，有一个令人惊讶的简单技巧：{\ EM基础学习（BL）}。这些想法是通过在矩阵空间中的变化和将压缩工具应用于新的表示来改变当地黑森州的通常代表。为了展示使用自定义基础的潜力，我们设计了一种新的牛顿型方法（BL1），其通过{\ em bl}技术和双向压缩机制来降低通信成本。此外，我们向部分参与提供两个替代扩展（BL2和BL3）以适应联合学习应用。我们证明了局部线性和超连线率无关，无关。最后，我们通过比较多种第一和第二〜订单方法来支持我们的索赔。

translated by 谷歌翻译

Credible Remote Sensing Scene Classification Using Evidential Fusion on Aerial-Ground Dual-view Images

Kun Zhao , Qian Gao , Siyuan Hao , Jie Sun , Lijian Zhou

分类：计算机视觉 | 人工智能

2023-01-02

Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.

translated by 谷歌翻译

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

Qinghao Ye , Guohai Xu , Ming Yan , Haiyang Xu , Qi Qian , Ji Zhang , Fei Huang

分类：计算机视觉 | 自然语言处理

2022-12-30

Video-language pre-training has advanced the performance of various downstream video-language tasks. However, most previous methods directly inherit or adapt typical image-language pre-training paradigms to video-language pre-training, thus not fully exploiting the unique characteristic of video, i.e., temporal. In this paper, we propose a Hierarchical Temporal-Aware video-language pre-training framework, HiTeA, with two novel pre-training tasks for modeling cross-modal alignment between moments and texts as well as the temporal relations of video-text pairs. Specifically, we propose a cross-modal moment exploration task to explore moments in videos, which results in detailed video moment representation. Besides, the inherent temporal relations are captured by aligning video-text pairs as a whole in different time resolutions with multi-modal temporal relation exploration task. Furthermore, we introduce the shuffling test to evaluate the temporal reliance of datasets and video-language pre-training models. We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks, especially on temporal-oriented datasets (e.g., SSv2-Template and SSv2-Label) with 8.6% and 11.1% improvement respectively. HiTeA also demonstrates strong generalization ability when directly transferred to downstream tasks in a zero-shot manner. Models and demo will be available on ModelScope.

translated by 谷歌翻译

Exploring Depth Information for Face Manipulation Detection

Haoyue Wang , Meiling Li , Sheng Li , Zhenxing Qian , Xinpeng Zhang

分类：计算机视觉

2022-12-29

Face manipulation detection has been receiving a lot of attention for the reliability and security of the face images. Recent studies focus on using auxiliary information or prior knowledge to capture robust manipulation traces, which are shown to be promising. As one of the important face features, the face depth map, which has shown to be effective in other areas such as the face recognition or face detection, is unfortunately paid little attention to in literature for detecting the manipulated face images. In this paper, we explore the possibility of incorporating the face depth map as auxiliary information to tackle the problem of face manipulation detection in real world applications. To this end, we first propose a Face Depth Map Transformer (FDMT) to estimate the face depth map patch by patch from a RGB face image, which is able to capture the local depth anomaly created due to manipulation. The estimated face depth map is then considered as auxiliary information to be integrated with the backbone features using a Multi-head Depth Attention (MDA) mechanism that is newly designed. Various experiments demonstrate the advantage of our proposed method for face manipulation detection.

translated by 谷歌翻译

A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

Jian Cao , Chen Qian , Yihui Huang , Dicheng Chen , Yuncheng Gao , Jiyang Dong , Di Guo , Xiaobo Qu

分类：机器学习

2022-12-29

Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.

translated by 谷歌翻译

Automatic Recognition and Classification of Future Work Sentences from Academic Articles in a Specific Domain

Chengzhi Zhang , Yi Xiang , Wenke Hao , Zhicheng Li , Yuchen Qian , Yuzhuo Wang

分类：自然语言处理

2022-12-28

Future work sentences (FWS) are the particular sentences in academic papers that contain the author's description of their proposed follow-up research direction. This paper presents methods to automatically extract FWS from academic papers and classify them according to the different future directions embodied in the paper's content. FWS recognition methods will enable subsequent researchers to locate future work sentences more accurately and quickly and reduce the time and cost of acquiring the corpus. The current work on automatic identification of future work sentences is relatively small, and the existing research cannot accurately identify FWS from academic papers, and thus cannot conduct data mining on a large scale. Furthermore, there are many aspects to the content of future work, and the subdivision of the content is conducive to the analysis of specific development directions. In this paper, Nature Language Processing (NLP) is used as a case study, and FWS are extracted from academic papers and classified into different types. We manually build an annotated corpus with six different types of FWS. Then, automatic recognition and classification of FWS are implemented using machine learning models, and the performance of these models is compared based on the evaluation metrics. The results show that the Bernoulli Bayesian model has the best performance in the automatic recognition task, with the Macro F1 reaching 90.73%, and the SCIBERT model has the best performance in the automatic classification task, with the weighted average F1 reaching 72.63%. Finally, we extract keywords from FWS and gain a deep understanding of the key content described in FWS, and we also demonstrate that content determination in FWS will be reflected in the subsequent research work by measuring the similarity between future work sentences and the abstracts.

translated by 谷歌翻译

MC-Nonlocal-PINNs: handling nonlocal operators in PINNs via Monte Carlo sampling

Xiaodong Feng , Yue Qian , Wanfang Shen

分类：机器学习

2022-12-26

We propose, Monte Carlo Nonlocal physics-informed neural networks (MC-Nonlocal-PINNs), which is a generalization of MC-fPINNs in \cite{guo2022monte}, for solving general nonlocal models such as integral equations and nonlocal PDEs. Similar as in MC-fPINNs, our MC-Nonlocal-PINNs handle the nonlocal operators in a Monte Carlo way, resulting in a very stable approach for high dimensional problems. We present a variety of test problems, including high dimensional Volterra type integral equations, hypersingular integral equations and nonlocal PDEs, to demonstrate the effectiveness of our approach.

translated by 谷歌翻译

A Manipulator-Assisted Multiple UAV Landing System for USV Subject to Disturbance

Ruoyu Xu , Chongfeng Liu , Zhongzhong Cao , Yuquan Wang , Huihuan Qian

分类：机器人

2022-12-23

Marine waves significantly disturb the unmanned surface vehicle (USV) motion. An unmanned aerial vehicle (UAV) can hardly land on a USV that undergoes irregular motion. An oversized landing platform is usually necessary to guarantee the landing safety, which limits the number of UAVs that can be carried. We propose a landing system assisted by tether and robot manipulation. The system can land multiple UAVs without increasing the USV's size. An MPC controller stabilizes the end-effector and tracks the UAVs, and an adaptive estimator addresses the disturbance caused by the base motion. The working strategy of the system is designed to plan the motion of each device. We have validated the manipulator controller through simulations and well-controlled indoor experiments. During the field tests, the proposed system caught and placed the UAVs when the disturbed USV roll range was approximately 12 degrees.

translated by 谷歌翻译