Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
由于多药的组合被广泛应用,因此准确的药物相互作用(DDI)的准确预测变得越来越关键。在我们的方法中,我们使用图代表药物相互作用:节点代表药物;边缘代表药物相互作用。基于我们的假设,我们将DDI的预测转换为链接预测问题,利用已知的药物节点特性和DDI类型来预测未知的DDI类型。这项工作提出了一个图形距离神经网络(GDNN),以预测药物 - 药物相互作用。首先,GDNN通过目标点方法生成节点的初始特征,完全包括图中的距离信息。其次,GDNN采用改进的消息传递框架来更好地生成每个药物节点嵌入式表达式,全面考虑节点和边缘的特征。第三,GDNN聚集了嵌入式表达式,经过MLP处理以生成最终预测的药物相互作用类型。 GDNN在OGB-DDI数据集上实现了hits@20 = 0.9037,证明GDNN可以有效地预测DDI。
translated by 谷歌翻译
在新颖的类发现(NCD)中,我们从可见的类别和看不见的类别的未标记的数据中给出了标记的数据,并为看不见的类培训聚类模型。但是,NCD背后的隐含假设仍不清楚。在本文中,我们揭开了NCD背后的假设,并发现应在可见和看不见的类中共享高级语义特征。基于这一发现,在某些假设下,NCD在理论上是可以解决的,并且可以自然地与具有与NCD完全相同的假设的元学习链接。因此,我们可以通过经过轻微修改后的元学习算法来实证解决NCD问题。正如实验中所证明的那样,这种基于元学习的方法可显着减少培训所需的未标记数据的数量,并使其更加实用。 NCD的应用程序方案也证明了非常有限的数据的使用:由于仅标记Seep类数据是不自然的,因此NCD是采样而不是因果关系标记。因此,应在收集可见级数据的方式上收集看不​​见的级数据,这就是为什么它们是新颖的,首先需要聚类的原因。
translated by 谷歌翻译
In this paper, a semantic communication framework for image transmission is developed. In the investigated framework, a set of servers cooperatively transmit images to a set of users utilizing semantic communication techniques. To evaluate the performance of studied semantic communication system, a multimodal metric is proposed to measure the correlation between the extracted semantic information and the original image. To meet the ISS requirement of each user, each server must jointly determine the semantic information to be transmitted and the resource blocks (RBs) used for semantic information transmission. We formulate this problem as an optimization problem aiming to minimize each server's transmission latency while reaching the ISS requirement. To solve this problem, a value decomposition based entropy-maximized multi-agent reinforcement learning (RL) is proposed, which enables servers to coordinate for training and execute RB allocation in a distributed manner to approach to a globally optimal performance with less training iterations. Compared to traditional multi-agent RL, the proposed RL improves the valuable action exploration of servers and the probability of finding a globally optimal RB allocation policy based on local observation. Simulation results show that the proposed algorithm can reduce the transmission delay by up to 16.1% compared to traditional multi-agent RL.
translated by 谷歌翻译
Training a Neural Radiance Field (NeRF) without pre-computed camera poses is challenging. Recent advances in this direction demonstrate the possibility of jointly optimising a NeRF and camera poses in forward-facing scenes. However, these methods still face difficulties during dramatic camera movement. We tackle this challenging problem by incorporating undistorted monocular depth priors. These priors are generated by correcting scale and shift parameters during training, with which we are then able to constrain the relative poses between consecutive frames. This constraint is achieved using our proposed novel loss functions. Experiments on real-world indoor and outdoor scenes show that our method can handle challenging camera trajectories and outperforms existing methods in terms of novel view rendering quality and pose estimation accuracy.
translated by 谷歌翻译
Credit assignment problem of neural networks refers to evaluating the credit of each network component to the final outputs. For an untrained neural network, approaches to tackling it have made great contributions to parameter update and model revolution during the training phase. This problem on trained neural networks receives rare attention, nevertheless, it plays an increasingly important role in neural network patch, specification and verification. Based on Koopman operator theory, this paper presents an alternative perspective of linear dynamics on dealing with the credit assignment problem for trained neural networks. Regarding a neural network as the composition of sub-dynamics series, we utilize step-delay embedding to capture snapshots of each component, characterizing the established mapping as exactly as possible. To circumvent the dimension-difference problem encountered during the embedding, a composition and decomposition of an auxiliary linear layer, termed minimal linear dimension alignment, is carefully designed with rigorous formal guarantee. Afterwards, each component is approximated by a Koopman operator and we derive the Jacobian matrix and its corresponding determinant, similar to backward propagation. Then, we can define a metric with algebraic interpretability for the credit assignment of each network component. Moreover, experiments conducted on typical neural networks demonstrate the effectiveness of the proposed method.
translated by 谷歌翻译
Label Shift has been widely believed to be harmful to the generalization performance of machine learning models. Researchers have proposed many approaches to mitigate the impact of the label shift, e.g., balancing the training data. However, these methods often consider the underparametrized regime, where the sample size is much larger than the data dimension. The research under the overparametrized regime is very limited. To bridge this gap, we propose a new asymptotic analysis of the Fisher Linear Discriminant classifier for binary classification with label shift. Specifically, we prove that there exists a phase transition phenomenon: Under certain overparametrized regime, the classifier trained using imbalanced data outperforms the counterpart with reduced balanced data. Moreover, we investigate the impact of regularization to the label shift: The aforementioned phase transition vanishes as the regularization becomes strong.
translated by 谷歌翻译
Data-driven identification of differential equations is an interesting but challenging problem, especially when the given data are corrupted by noise. When the governing differential equation is a linear combination of various differential terms, the identification problem can be formulated as solving a linear system, with the feature matrix consisting of linear and nonlinear terms multiplied by a coefficient vector. This product is equal to the time derivative term, and thus generates dynamical behaviors. The goal is to identify the correct terms that form the equation to capture the dynamics of the given data. We propose a general and robust framework to recover differential equations using a weak formulation, for both ordinary and partial differential equations (ODEs and PDEs). The weak formulation facilitates an efficient and robust way to handle noise. For a robust recovery against noise and the choice of hyper-parameters, we introduce two new mechanisms, narrow-fit and trimming, for the coefficient support and value recovery, respectively. For each sparsity level, Subspace Pursuit is utilized to find an initial set of support from the large dictionary. Then, we focus on highly dynamic regions (rows of the feature matrix), and error normalize the feature matrix in the narrow-fit step. The support is further updated via trimming of the terms that contribute the least. Finally, the support set of features with the smallest Cross-Validation error is chosen as the result. A comprehensive set of numerical experiments are presented for both systems of ODEs and PDEs with various noise levels. The proposed method gives a robust recovery of the coefficients, and a significant denoising effect which can handle up to $100\%$ noise-to-signal ratio for some equations. We compare the proposed method with several state-of-the-art algorithms for the recovery of differential equations.
translated by 谷歌翻译
本文提出了一种新方法,该方法融合了混响场中的声学测量和低临界性惯性测量单元(IMU)运动报告,以同时定位和映射(SLAM)。与仅使用声学数据进行到达方向(DOA)估计的现有研究不同,源与传感器的距离是通过直接到依次的能量比(DRR)计算的,并用作新约束以消除非线性噪声从运动报告。应用粒子过滤器估计临界距离,这是将源距离与DRR关联的关键。使用密钥帧方法来消除源位置估计向机器人的偏差。拟议的DOA-DRR声学大满贯(D-D大满贯)设计用于三维运动,适合大多数机器人。该方法是第一个在现实世界中仅包含声学数据和IMU测量值的现实世界室内场景数据集上验证的声学大满贯算法。与以前的方法相比,D-D SLAM在定位机器人和从现实世界室内数据集中构建源地图方面具有可接受的性能。平均位置精度为0.48 m,而源位置误差在2.8 s内收敛到小于0.25 m。这些结果证明了D-D SLAM在现实世界室内场景中的有效性,这可能在环境有雾(即不适合光或激光辐照的环境)之后特别有用。
translated by 谷歌翻译
预测公路参与者的未来运动对于自动驾驶至关重要,但由于令人震惊的运动不确定性,因此极具挑战性。最近,大多数运动预测方法求助于基于目标的策略,即预测运动轨迹的终点,作为回归整个轨迹的条件,以便可以减少解决方案的搜索空间。但是,准确的目标坐标很难预测和评估。此外,目的地的点表示限制了丰富的道路环境的利用,从而导致预测不准确。目标区域,即可能的目的地区域,而不是目标坐标,可以通过涉及更多的容忍度和指导来提供更软的限制,以搜索潜在的轨迹。考虑到这一点,我们提出了一个新的基于目标区域的框架,名为“目标区域网络”(GANET)进行运动预测,该框架对目标区域进行了建模,而不是确切的目标坐标作为轨迹预测的先决条件,更加可靠,更准确地执行。具体而言,我们建议一个goicrop(目标的目标区域)操作员有效地提取目标区域中的语义巷特征,并在目标区域和模型演员的未来互动中提取语义巷,这对未来的轨迹估计很大。 Ganet在所有公共文献(直到论文提交)中排名第一个,将其源代码排在第一位。
translated by 谷歌翻译