Most Graph Neural Networks follow the message-passing paradigm, assuming the observed structure depicts the ground-truth node relationships. However, this fundamental assumption cannot always be satisfied, as real-world graphs are always incomplete, noisy, or redundant. How to reveal the inherent graph structure in a unified way remains under-explored. We proposed PRI-GSL, a Graph Structure Learning framework guided by the Principle of Relevant Information, providing a simple and unified framework for identifying the self-organization and revealing the hidden structure. PRI-GSL learns a structure that contains the most relevant yet least redundant information quantified by von Neumann entropy and Quantum Jensen-Shannon divergence. PRI-GSL incorporates the evolution of quantum continuous walk with graph wavelets to encode node structural roles, showing in which way the nodes interplay and self-organize with the graph structure. Extensive experiments demonstrate the superior effectiveness and robustness of PRI-GSL.
translated by 谷歌翻译
Text-driven person image generation is an emerging and challenging task in cross-modality image generation. Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on. However, previous methods mostly employ single-modality information as the prior condition (e.g. pose-guided person image generation), or utilize the preset words for text-driven human synthesis. Introducing a sentence composed of free words with an editable semantic pose map to describe person appearance is a more user-friendly way. In this paper, we propose HumanDiffusion, a coarse-to-fine alignment diffusion framework, for text-driven person image generation. Specifically, two collaborative modules are proposed, the Stylized Memory Retrieval (SMR) module for fine-grained feature distillation in data processing and the Multi-scale Cross-modality Alignment (MCA) module for coarse-to-fine feature alignment in diffusion. These two modules guarantee the alignment quality of the text and image, from image-level to feature-level, from low-resolution to high-resolution. As a result, HumanDiffusion realizes open-vocabulary person image generation with desired semantic poses. Extensive experiments conducted on DeepFashion demonstrate the superiority of our method compared with previous approaches. Moreover, better results could be obtained for complicated person images with various details and uncommon poses.
translated by 谷歌翻译
拓扑不平衡是由标记节点的不均匀拓扑位置引起的一个特异性不平衡问题,它大大损害了GNN的性能。什么拓扑不平衡意味着如何衡量其对图形学习的影响。在本文中,从全球视图中,我们对监督信息分布的全球视图提供了对拓扑 - 不平衡的新理解,从不足和过度划分的角度来看,这激发了两个定量指标作为测量。鉴于我们的分析,我们提出了一个新颖的位置感知的图形结构学习框架,该框架名为柔和,该框架直接优化了信息传播路径并解决了本质上解决拓扑 - 不平衡问题。我们的关键见解是增强同一类中节点的连接性,以获取更多的监督信息,从而减轻不足和过度的现象。具体而言,我们设计了一个基于锚的位置编码机制,该机制可以更好地结合相对拓扑位置并通过最大化标签影响来增强类内部电感偏置。我们进一步提出了作为边缘权重的阶级冲突度量,这有利于不同节点类别的分离。广泛的实验表明,在不同的数据注释方案中增强GNNS的功率方面,柔和的能力具有较高的潜力和适应性。
translated by 谷歌翻译
时间基础旨在找到目标视频时刻,该目标瞬间与未修剪视频中给定的句子查询相对应。但是,最近的作品发现现有方法遇到了严重的时间偏见问题。这些方法并不是根据训练集中查询的时间偏见过度依赖基于视觉文本语义对齐的目标矩位置。为此,本文提出了一个新颖的培训框架,用于接地模型,以使用洗牌视频解决时间偏见问题而不会失去接地精度。我们的框架介绍了两个辅助任务,即跨模式匹配和时间订单歧视,以促进接地模型训练。跨模式匹配任务利用了洗牌和原始视频之间的内容一致性迫使接地模型以挖掘视觉内容以匹配语义的查询。时间秩序歧视任务利用时间顺序的差异来加强对长期时间环境的理解。关于Charades-STA和活动网字幕的广泛实验证明了我们方法可以减轻对时间偏差的依赖并增强模型对不同时间分布的概括能力的有效性。代码可从https://github.com/haojc/shufflingvideosfortsg获得。
translated by 谷歌翻译
随着自我监督学习(SSL)的成功,它已成为一种主流范式,可以从自我监督预定的预计模型中进行微调以提高下游任务的性能。但是,我们发现当前的SSL模型在执行低位量化时遭受严重的准确性下降,禁止其在资源受限应用程序中的部署。在本文中,我们提出了一种称为协同自我监督和量化学习(SSQL)的方法,以预处理量化量化的自我监督模型,从而有助于下游部署。 SSQL以自我监督的方式对比量化和完整的精度模型的特征,在每个步骤中随机选择了量化模型的位宽度。 SSQL不仅在量化较低的位宽度时显着提高了准确性,而且在大多数情况下都提高了完整精度模型的准确性。通过仅培训一次,SSQL可以同时在不同的位宽度上受益于各种下游任务。此外,在没有额外的存储开销的情况下,可以实现位宽度的灵活性,在训练和推理过程中只需要一份重量。我们理论上分析了SSQL的优化过程,并在各种基准测试中进行详尽的实验,以进一步证明我们方法的有效性。我们的代码可从https://github.com/megvii-research/ssql-eccv2022获得。
translated by 谷歌翻译
图形神经网络(GNNS)在广泛的应用方面显示了有希望的结果。 GNN的大多数实证研究直接将观察图视为输入,假设观察到的结构完美地描绘了节点之间的准确性和完全关系。然而,现实世界中的图形是不可避免的或不完整的,甚至可以加剧图表表示的质量。在这项工作中,我们提出了一种新颖的变分信息瓶颈引导图形结构学习框架,即vib-gsl,在信息理论的角度下。 VIB-GSL推进了图形结构学习的信息瓶颈(IB)原则,为挖掘潜在的任务关系提供了更优雅且普遍的框架。 VIB-GSL了解一个信息和压缩图形结构,用于蒸馏出特定的下游任务的可操作信息。 VIB-GSL为不规则图数据推导了变化近似,以形成促进训练稳定性的易切换IB目标函数。广泛的实验结果表明,VIB-GSL的卓越有效性和鲁棒性。
translated by 谷歌翻译
Federated learning achieves joint training of deep models by connecting decentralized data sources, which can significantly mitigate the risk of privacy leakage. However, in a more general case, the distributions of labels among clients are different, called ``label distribution skew''. Directly applying conventional federated learning without consideration of label distribution skew issue significantly hurts the performance of the global model. To this end, we propose a novel federated learning method, named FedMGD, to alleviate the performance degradation caused by the label distribution skew issue. It introduces a global Generative Adversarial Network to model the global data distribution without access to local datasets, so the global model can be trained using the global information of data distribution without privacy leakage. The experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art on several public benchmarks. Code is available at \url{https://github.com/Sheng-T/FedMGD}.
translated by 谷歌翻译
Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction. At the core of OFASys is the idea of decoupling multi-modal task representations from the underlying model implementations. In OFASys, a task involving multiple modalities can be defined declaratively even with just a single line of code. The system automatically generates task plans from such instructions for training and inference. It also facilitates multi-task training for diverse multi-modal workloads. As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data. The single OFA+ model achieves 95% performance in average with only 16% parameters of 15 task-finetuned models, showcasing the performance reliability of multi-modal task-scaling provided by OFASys. Available at https://github.com/OFA-Sys/OFASys
translated by 谷歌翻译
Generative modeling of human motion has broad applications in computer animation, virtual reality, and robotics. Conventional approaches develop separate models for different motion synthesis tasks, and typically use a model of a small size to avoid overfitting the scarce data available in each setting. It remains an open question whether developing a single unified model is feasible, which may 1) benefit the acquirement of novel skills by combining skills learned from multiple tasks, and 2) help in increasing the model capacity without overfitting by combining multiple data sources. Unification is challenging because 1) it involves diverse control signals as well as targets of varying granularity, and 2) motion datasets may use different skeletons and default poses. In this paper, we present MoFusion, a framework for unified motion synthesis. MoFusion employs a Transformer backbone to ease the inclusion of diverse control signals via cross attention, and pretrains the backbone as a diffusion model to support multi-granularity synthesis ranging from motion completion of a body part to whole-body motion generation. It uses a learnable adapter to accommodate the differences between the default skeletons used by the pretraining and the fine-tuning data. Empirical results show that pretraining is vital for scaling the model size without overfitting, and demonstrate MoFusion's potential in various tasks, e.g., text-to-motion, motion completion, and zero-shot mixing of multiple control signals. Project page: \url{https://ofa-sys.github.io/MoFusion/}.
translated by 谷歌翻译
In this work, we focus on the problem of safe policy transfer in reinforcement learning: we seek to leverage existing policies when learning a new task with specified constraints. This problem is important for safety-critical applications where interactions are costly and unconstrained policies can lead to undesirable or dangerous outcomes, e.g., with physical robots that interact with humans. We propose a Constrained Markov Decision Process (CMDP) formulation that simultaneously enables the transfer of policies and adherence to safety constraints. Our formulation cleanly separates task goals from safety considerations and permits the specification of a wide variety of constraints. Our approach relies on a novel extension of generalized policy improvement to constrained settings via a Lagrangian formulation. We devise a dual optimization algorithm that estimates the optimal dual variable of a target task, thus enabling safe transfer of policies derived from successor features learned on source tasks. Our experiments in simulated domains show that our approach is effective; it visits unsafe states less frequently and outperforms alternative state-of-the-art methods when taking safety constraints into account.
translated by 谷歌翻译