无人驾驶飞机(UAV)的实时对象检测是一个具有挑战性的问题,因为Edge GPU设备作为物联网(IoT)节点的计算资源有限。为了解决这个问题,在本文中,我们提出了一种基于Yolox模型的新型轻型深度学习体系结构,用于Edge GPU上的实时对象检测。首先,我们设计了一个有效且轻巧的PixSF头,以更换Yolox的原始头部以更好地检测小物体,可以将其进一步嵌入深度可分离的卷积(DS Conv)中,以达到更轻的头。然后,开发为减少网络参数的颈层中的较小结构,这是精度和速度之间的权衡。此外,我们将注意模块嵌入头层中,以改善预测头的特征提取效果。同时,我们还改进了标签分配策略和损失功能,以减轻UAV数据集的类别不平衡和盒子优化问题。最后,提出了辅助头进行在线蒸馏,以提高PIXSF Head中嵌入位置嵌入和特征提取的能力。在NVIDIA Jetson NX和Jetson Nano GPU嵌入平台上,我们的轻质模型的性能得到了实验验证。扩展的实验表明,与目前的模型相比,Fasterx模型在Visdrone2021数据集中实现了更好的折衷和延迟之间的折衷。
translated by 谷歌翻译
自动编写长品是一个复杂和挑战的语言生成任务。前工作主要专注于使用人写的提示生成这些文章,以提供一些关于文章的局部背景和一些元数据。也就是说,对于许多应用程序,例如生成新闻报道,这些文章通常与图像及其字幕或alt文本配对,这反过来基于真实的事件,并且可以参考许多不同的命名实体通过语言模型正确识别和预测。为了解决这两个问题,本文介绍了一个具有图像信息的实体感知新闻生成方法,Engin,将新闻图像信息结合到语言模型中。 Engin在元数据和从图像中提取的标题和命名实体之类的元数据上生成有关的新闻文章。我们还提出了一个实体感知机制来帮助我们的模型更好地识别并预测新闻中的实体名称。我们对两辆公共大型新闻数据集,Goodnews和VisualEws进行实验。定量结果表明,我们的方法在基础型号上通过4-5点提高了物品困惑。定性结果展示了Engin产生的文本与新闻图像更加一致。我们还对所生成的文章进行文章质量注释实验,以验证我们的模型生产更高质量的文章。最后,我们调查了自动检测机器生成的物品的方法的效果。
translated by 谷歌翻译
无人战斗机(UCAV)的智能决定长期以来一直是一个具有挑战性的问题。传统的搜索方法几乎不能满足高动力学空战场景期间的实时需求。增强学习(RL)方法可以通过使用神经网络显着缩短决策时间。然而,稀疏奖励问题限制了其收敛速度,人工先前的经验奖励可以很容易地偏离其原始任务的最佳会聚方向,这对RL Air Confic应用程序产生了巨大的困难。在本文中,我们提出了一种基于同型的软演员 - 批评方法(HSAC),它专注于通过跟随具有稀疏奖励和具有人工事先经验奖励的原始任务和辅助任务之间的同谐话的同谐路径来解决这些问题。本文还证明了该方法的收敛性和可行性。为了确认我们的方法,我们为基于RL的方法培训构建了一个详细的3D空调仿真环境,我们在攻击水平飞行UCAV任务和自我播放对抗任务中实现了我们的方法。实验结果表明,我们的方法比仅利用稀疏奖励或人工事先经验奖励的方法更好地表现得更好。通过我们方法训练的代理人可以在攻击水平飞行中达到98.3%的胜利率,平均在面对由另外两种方法培训的代理商面临的胜利时平均67.4%。
translated by 谷歌翻译
To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets and further incorporates $45$ PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.
translated by 谷歌翻译
Establishing open and general benchmarks has been a critical driving force behind the success of modern machine learning techniques. As machine learning is being applied to broader domains and tasks, there is a need to establish richer and more diverse benchmarks to better reflect the reality of the application scenarios. Graph learning is an emerging field of machine learning that urgently needs more and better benchmarks. To accommodate the need, we introduce Graph Learning Indexer (GLI), a benchmark curation platform for graph learning. In comparison to existing graph learning benchmark libraries, GLI highlights two novel design objectives. First, GLI is designed to incentivize \emph{dataset contributors}. In particular, we incorporate various measures to minimize the effort of contributing and maintaining a dataset, increase the usability of the contributed dataset, as well as encourage attributions to different contributors of the dataset. Second, GLI is designed to curate a knowledge base, instead of a plain collection, of benchmark datasets. We use multiple sources of meta information to augment the benchmark datasets with \emph{rich characteristics}, so that they can be easily selected and used in downstream research or development. The source code of GLI is available at \url{https://github.com/Graph-Learning-Benchmarks/gli}.
translated by 谷歌翻译
Neural networks, especially the recent proposed neural operator models, are increasingly being used to find the solution operator of differential equations. Compared to traditional numerical solvers, they are much faster and more efficient in practical applications. However, one critical issue is that training neural operator models require large amount of ground truth data, which usually comes from the slow numerical solvers. In this paper, we propose a physics-guided data augmentation (PGDA) method to improve the accuracy and generalization of neural operator models. Training data is augmented naturally through the physical properties of differential equations such as linearity and translation. We demonstrate the advantage of PGDA on a variety of linear differential equations, showing that PGDA can improve the sample complexity and is robust to distributional shift.
translated by 谷歌翻译
Accurate polyp segmentation is of great importance for colorectal cancer diagnosis and treatment. However, due to the high cost of producing accurate mask annotations, existing polyp segmentation methods suffer from severe data shortage and impaired model generalization. Reversely, coarse polyp bounding box annotations are more accessible. Thus, in this paper, we propose a boosted BoxPolyp model to make full use of both accurate mask and extra coarse box annotations. In practice, box annotations are applied to alleviate the over-fitting issue of previous polyp segmentation models, which generate fine-grained polyp area through the iterative boosted segmentation model. To achieve this goal, a fusion filter sampling (FFS) module is firstly proposed to generate pixel-wise pseudo labels from box annotations with less noise, leading to significant performance improvements. Besides, considering the appearance consistency of the same polyp, an image consistency (IC) loss is designed. Such IC loss explicitly narrows the distance between features extracted by two different networks, which improves the robustness of the model. Note that our BoxPolyp is a plug-and-play model, which can be merged into any appealing backbone. Quantitative and qualitative experimental results on five challenging benchmarks confirm that our proposed model outperforms previous state-of-the-art methods by a large margin.
translated by 谷歌翻译
Prompt tuning has been employed as an efficient way to adapt large vision-language pre-trained models (e.g. CLIP) to various downstream tasks in data-limited or label-limited settings. Nonetheless, visual data (e.g., images) is by default prerequisite for learning prompts in existing methods. In this work, we advocate that the effectiveness of image-text contrastive learning in aligning the two modalities (for training CLIP) further makes it feasible to treat texts as images for prompt tuning and introduce TaI prompting. In contrast to the visual data, text descriptions are easy to collect, and their class labels can be directly derived. Particularly, we apply TaI prompting to multi-label image recognition, where sentences in the wild serve as alternatives to images for prompt tuning. Moreover, with TaI, double-grained prompt tuning (TaI-DPT) is further presented to extract both coarse-grained and fine-grained embeddings for enhancing the multi-label recognition performance. Experimental results show that our proposed TaI-DPT outperforms zero-shot CLIP by a large margin on multiple benchmarks, e.g., MS-COCO, VOC2007, and NUS-WIDE, while it can be combined with existing methods of prompting from images to improve recognition performance further. Code is released at https://github.com/guozix/TaI-DPT.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
在过去几十年中,功能选择吸引了很多关注,因为它可以降低数据维度,同时保持功能的原始物理含义,这比功能提取可以更好地解释性。但是,大多数现有的功能选择方法,尤其是基于深度学习的方法,通常集中在仅具有很高分数的功能上,但忽略了那些在训练过程中得分较低的人以及重要的候选功能的顺序。这可能是有风险的,因为不幸的是,在培训过程中可能会忽略一些重要和相关的功能,从而导致次优的解决方案或误导性选择。在我们的工作中,我们通过利用较少重要性分数的功能来处理功能选择,并根据新颖的互补功能掩码提出功能选择框架。我们的方法是通用的,可以轻松地集成到现有的基于深度学习的特征选择方法中,以提高其性能。实验是在基准数据集上进行的,并表明所提出的方法可以选择比艺术状态更具代表性和信息性的特征。
translated by 谷歌翻译