智能论文笔记

Deep Spectral Q-learning with Application to Mobile Health

Yuhe Gao , Chengchun Shi , Rui Song

分类： (统计)机器学习 | 机器学习

2023-01-03

Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Learning Dual-Fused Modality-Aware Representations for RGBD Tracking

Shang Gao , Jinyu Yang , Zhe Li , Feng Zheng , Aleš Leonardis , Jingkuan Song

分类：计算机视觉

2022-11-06

With the development of depth sensors in recent years, RGBD object tracking has received significant attention. Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference. However, some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored. On the other hand, some methods attempt to fuse the two modalities by treating them equally, resulting in the missing of modality-specific features. To tackle these limitations, we propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking. The first fusion module focuses on extracting the shared information between modalities based on cross-modal attention. The second aims at integrating the RGB-specific and depth-specific information to enhance the fused features. By fusing both the modality-shared and modality-specific information in a modality-aware scheme, our DMTracker can learn discriminative representations in complex tracking scenes. Experiments show that our proposed tracker achieves very promising results on challenging RGBD benchmarks. Code is available at \url{https://github.com/ShangGaoG/DMTracker}.

translated by 谷歌翻译

AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition

Yulin Wang , Yang Yue , Xinhong Xu , Ali Hassani , Victor Kulikov , Nikita Orlov , Shiji Song , Humphrey Shi , Gao Huang

分类：计算机视觉 | 人工智能 | 机器学习

2022-09-27

最近的研究表明，减少时间和空间冗余都是有效的视频识别方法的有效方法，例如，将大多数计算分配给与任务相关的框架或每个帧中最有价值的图像区域。但是，在大多数现有的作品中，任何一种类型的冗余通常都是用另一个缺失建模的。本文探讨了在最近提出的ADAFOCUSV2算法之上的时空动态计算的统一配方，从而有助于改进的ADAFOCUSV3框架。我们的方法仅在一些小但有益的3D视频立方体上激活昂贵的高容量网络来降低计算成本。这些立方体是从框架高度，宽度和视频持续时间形成的空间中裁剪的，而它们的位置则以每样本样本为基础的轻加权政策网络自适应地确定。在测试时间，与每个视频相对应的立方体的数量是动态配置的，即，对视频立方体进行顺序处理，直到产生足够可靠的预测为止。值得注意的是，可以通过近似可插入深度特征的插值来有效地训练adafocusv3。六个基准数据集（即ActivityNet，FCVID，Mini-Kinetics，Something Something V1＆V2和潜水48）上的广泛经验结果表明，我们的模型比竞争性基线要高得多。

translated by 谷歌翻译

AcroFOD: An Adaptive Method for Cross-domain Few-shot Object Detection

Yipeng Gao , Lingxiao Yang , Yunmu Huang , Song Xie , Shiyong Li , Wei-shi Zheng

分类：计算机视觉

2022-09-22

在域移位下，跨域几个射击对象检测旨在通过一些注释的目标数据适应目标域中的对象检测器。存在两个重大挑战：（1）高度不足的目标域数据；（2）潜在的过度适应和误导性是由不当放大的目标样本而没有任何限制引起的。为了应对这些挑战，我们提出了一种由两个部分组成的自适应方法。首先，我们提出了一种自适应优化策略，以选择类似于目标样本的增强数据，而不是盲目增加数量。具体而言，我们过滤了增强的候选者，这些候选者在一开始就显着偏离了目标特征分布。其次，为了进一步释放数据限制，我们提出了多级域感知数据增强，以增加增强数据的多样性和合理性，从而利用了跨图像前景 - 背景混合物。实验表明，所提出的方法在多个基准测试中实现了最先进的性能。

translated by 谷歌翻译

A Tent Lévy Flying Sparrow Search Algorithm for Feature Selection: A COVID-19 Case Study

Qinwen Yang , Yuelin Gao , Yanjie Song

分类：机器学习

2022-09-20

信息科学的快速发展引起的“维度诅咒”在处理大数据集时可能会产生负面影响。在本文中，我们提出了Sparrow搜索算法（SSA）的一种变体，称为帐篷L \'evy飞行麻雀搜索算法（TFSSA），并使用它来选择包装模式中最佳的特征子集以进行分类。 SSA是最近提出的算法，尚未系统地应用于特征选择问题。通过CEC2020基准函数进行验证后，TFSSA用于选择最佳功能组合，以最大化分类精度并最大程度地减少所选功能的数量。将拟议的TFSSA与文献中的九种算法进行了比较。 9个评估指标用于正确评估和比较UCI存储库中21个数据集上这些算法的性能。此外，该方法应用于冠状病毒病（COVID-19）数据集，分别获得最佳的平均分类精度和特征选择的平均数量，为93.47％和2.1。实验结果证实了所提出的算法在提高分类准确性和减少与其他基于包装器的算法相比的选定特征数量方面的优势。

translated by 谷歌翻译

ActiveNeRF: Learning where to See with Uncertainty Estimation

Xuran Pan , Zihang Lai , Shiji Song , Gao Huang

分类：计算机视觉

2022-09-18

最近，神经辐射场（NERF）在重建3D场景并从一组稀疏的2D图像中综合新视图方面表现出了有希望的表演。尽管有效，但NERF的性能受到训练样品质量的很大影响。由于现场有限的图像，Nerf无法很好地概括到新颖的观点，并可能崩溃到未观察到的区域中的琐碎解决方案。这使得在资源约束的情况下不切实际。在本文中，我们提出了一个新颖的学习框架Activenerf，旨在模拟一个3D场景，并具有限制的输入预算。具体而言，我们首先将不确定性估计纳入NERF模型，该模型在很少的观察下确保了鲁棒性，并提供了NERF如何理解场景的解释。在此基础上，我们建议根据积极学习方案将现有的培训设置补充新捕获的样本。通过评估给定新输入的不确定性的降低，我们选择了带来最多信息增益的样本。这样，可以通过最少的额外资源来提高新型视图合成的质量。广泛的实验验证了我们模型在现实和合成场景上的性能，尤其是在稀缺的训练数据中。代码将在\ url {https://github.com/leaplabthu/activenerf}上发布。

translated by 谷歌翻译

Learning to Weight Samples for Dynamic Early-exiting Networks

Yizeng Han , Yifan Pu , Zihang Lai , Chaofei Wang , Shiji Song , Junfen Cao , Wenhui Huang , Chao Deng , Gao Huang

分类：计算机视觉

2022-09-17

早期退出是提高深网推理效率的有效范例。通过构建具有不同资源需求的分类器（出口），此类网络可以在早期出口处输出简单的样本，从而消除了执行更深层的需求。尽管现有作品主要关注多EXIT网络的建筑设计，但此类模型的培训策略在很大程度上没有探索。当前的最新模型在培训期间对所有样品进行了相同的处理。但是，在测试过程中的早期外观行为被忽略了，从而导致训练和测试之间存在差距。在本文中，我们建议通过样品加权来弥合这一差距。从直觉上讲，简单的样品通常在推理期间在网络早期退出，应该为培训早期分类器提供更多贡献。但是，晚期分类器应强调硬样品的培训（主要是从更深层退出）。我们的工作建议采用一个体重预测网络，以加重每个出口处不同训练样本的损失。这个重量预测网络和骨干模型在具有新的优化目标的元学习框架下共同优化。通过将推断期间的适应性行为带入训练阶段，我们表明拟议的加权机制始终提高分类准确性和推理效率之间的权衡。代码可在https://github.com/leaplabthu/l2w-den上找到。

translated by 谷歌翻译

LKD-Net: Large Kernel Convolution Network for Single Image Dehazing

Pinjun Luo , Guoqiang Xiao , Xinbo Gao , Song Wu

分类：计算机视觉 | 机器学习

2022-09-05

基于深度卷积神经网络（CNN）的单图像飞机方法已取得了重大成功。以前的方法致力于通过增加网络的深度和宽度来改善网络的性能。当前的方法着重于增加卷积内核的大小，以通过受益于更大的接受场来增强其性能。但是，直接增加卷积内核的大小会引入大量计算开销和参数。因此，本文设计了一个新型的大内核卷积驱动块（LKD块），该磁带（LKD块）由分解深度大核卷积块（DLKCB）和通道增强的进料前向前网络（CEFN）组成。设计的DLKCB可以将深度大的内核卷积分为较小的深度卷积和深度扩张的卷积，而无需引入大量参数和计算开销。同时，设计的CEFN将通道注意机制纳入馈电网络中，以利用重要的通道并增强鲁棒性。通过组合多个LKD块和上向下的采样模块，可以进行大内核卷积DeHaze网络（LKD-NET）。评估结果证明了设计的DLKCB和CEFN的有效性，而我们的LKD-NET优于最先进的功能。在SOTS室内数据集上，我们的LKD-NET极大地优于基于变压器的方法Dehamer，只有1.79％#PARAM和48.9％的FLOPS。我们的LKD-NET的源代码可在https://github.com/swu-cs-medialab/lkd-net上获得。

translated by 谷歌翻译

Less is More: Rethinking State-of-the-art Continual Relation Extraction Models with a Frustratingly Easy but Effective Approach

Peiyi Wang , Yifan Song , Tianyu Liu , Rundong Gao , Binghuai Lin , Yunbo Cao , Zhifang Sui

分类：自然语言处理

2022-09-01

连续的关系提取（CRE）要求该模型不断从课堂收入数据流中学习新关系。在本文中，我们提出了一种令人沮丧的简单但有效的方法（FEA）方法，其中有两个学习阶段的CRE：1）快速适应（FA）仅使用新数据加热模型。 2）平衡调整（BT）列出平衡内存数据上的模型。尽管它很简单，但FEA与最先进的基线相比，FEA取得了可比性（在诱人或优越（在少数情况下）性能。通过仔细的检查，我们发现新关系之间的数据失衡会导致偏斜的决策边界在预计编码器上的头部分类器中，从而损害了整体性能。在FEA中，FA阶段释放了后续填充的内存数据的潜力，而BT阶段有助于建立更平衡的决策边界。通过统一的视图，我们，我们发现可以将两个强大的CRE基准列入提议的培训管道中。FEEA的成功还为CRE中的未来模型设计提供了可行的见解和建议。

translated by 谷歌翻译

HTML版本