智能论文笔记

Boosting COVID-19 Severity Detection with Infection-aware Contrastive Mixup Classification

Junlin Hou , Jilan Xu , Nan Zhang , Yuejie Zhang , Xiaobo Zhang , Rui Feng

分类：计算机视觉

2022-11-26

This paper presents our solution for the 2nd COVID-19 Severity Detection Competition. This task aims to distinguish the Mild, Moderate, Severe, and Critical grades in COVID-19 chest CT images. In our approach, we devise a novel infection-aware 3D Contrastive Mixup Classification network for severity grading. Specifcally, we train two segmentation networks to first extract the lung region and then the inner lesion region. The lesion segmentation mask serves as complementary information for the original CT slices. To relieve the issue of imbalanced data distribution, we further improve the advanced Contrastive Mixup Classification network by weighted cross-entropy loss. On the COVID-19 severity detection leaderboard, our approach won the first place with a Macro F1 Score of 51.76%. It significantly outperforms the baseline method by over 11.46%.

translated by 谷歌翻译

Cross-Field Transformer for Diabetic Retinopathy Grading on Two-field Fundus Images

Junlin Hou , Jilan Xu , Fan Xiao , Rui-Wei Zhao , Yuejie Zhang , Haidong Zou , Lina Lu , Wenwen Xue , Rui Feng

分类：计算机视觉

2022-11-26

Automatic diabetic retinopathy (DR) grading based on fundus photography has been widely explored to benefit the routine screening and early treatment. Existing researches generally focus on single-field fundus images, which have limited field of view for precise eye examinations. In clinical applications, ophthalmologists adopt two-field fundus photography as the dominating tool, where the information from each field (i.e.,macula-centric and optic disc-centric) is highly correlated and complementary, and benefits comprehensive decisions. However, automatic DR grading based on two-field fundus photography remains a challenging task due to the lack of publicly available datasets and effective fusion strategies. In this work, we first construct a new benchmark dataset (DRTiD) for DR grading, consisting of 3,100 two-field fundus images. To the best of our knowledge, it is the largest public DR dataset with diverse and high-quality two-field images. Then, we propose a novel DR grading approach, namely Cross-Field Transformer (CrossFiT), to capture the correspondence between two fields as well as the long-range spatial correlations within each field. Considering the inherent two-field geometric constraints, we particularly define aligned position embeddings to preserve relative consistent position in fundus. Besides, we perform masked cross-field attention during interaction to flter the noisy relations between fields. Extensive experiments on our DRTiD dataset and a public DeepDRiD dataset demonstrate the effectiveness of our CrossFiT network. The new dataset and the source code of CrossFiT will be publicly available at https://github.com/FDU-VTS/DRTiD.

translated by 谷歌翻译

Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection

Jiashuo Yu , Jinyu Liu , Ying Cheng , Rui Feng , Yuejie Zhang

分类：计算机视觉

2022-07-12

弱监督的视听暴力检测旨在区分包含带有视频级标签的多模式暴力事件的片段。许多先前的作品以早期或中间的方式执行视听整合和互动，但在弱监督的设置上忽略了模态异质性。在本文中，我们分析了多种实例学习（MIL）程序的模式异步和未分化的实例现象，并进一步研究了其对弱监督视听学习的负面影响。为了解决这些问题，我们提出了一种以自我验证（MACIL-SD）策略学习的方式感知的对比实例。具体而言，我们利用轻量级的两流网络来生成音频和视觉袋，其中单峰背景，暴力和普通实例以一种无监督的方式聚集到半袋中。然后，将音频和视觉剧烈的半袋表示作为正对组装，将暴力半袋与背景和正常实例相结合，以对比性负对。此外，将自我验证模块应用于将单峰视觉知识传输到视听模型，该模型减轻了噪音并缩小单峰和多模式特征之间的语义差距。实验表明，我们的框架在大规模XD-Violence数据集上的复杂性较低的方法优于先前的方法。结果还表明，我们提出的方法可以用作增强其他网络的插件模块。代码可在https://github.com/justinyuu/macil_sd上找到。

translated by 谷歌翻译

IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training

Xinyu Huang , Youcai Zhang , Ying Cheng , Weiwei Tian , Ruiwei Zhao , Rui Feng , Yuejie Zhang , Yaqian Li , Yandong Guo , Xiaobo Zhang

分类：计算机视觉 | 机器学习

2022-07-12

具有大尺度图像文本对的视觉预训练（VLP）在各个领域都表现出卓越的性能。但是，Internet上的图像文本对共存通常缺乏明确的对齐信息，这对于VLP来说是次优的。建议采用现成的对象检测器来利用其他图像标签信息。但是，对象检测器是耗时的，只能识别预定义的对象类别，从而限制了模型容量。受到观察的启发，即文本包含不完整的细粒图像信息，我们介绍了Ideas，该想法代表通过在线多标签识别VLP来增加文本多样性。想法表明，可以在VLP期间共同优化从文本中提取的图像标签的多标签学习。此外，想法可以在线识别有价值的图像标签，以提供更明确的文本监督。全面的实验表明，想法可以显着提高多个下游数据集上的性能，并具有较小的额外计算成本。

translated by 谷歌翻译

FDVTS's Solution for 2nd COV19D Competition on COVID-19 Detection and Severity Analysis

Junlin Hou , Jilan Xu , Rui Feng , Yuejie Zhang

分类：计算机视觉

2022-07-05

本文介绍了我们针对第二届COVID-19比赛的解决方案，该竞赛是在欧洲计算机视觉会议（ECCV 2022）的Aimia研讨会框架内举行的。在我们的方法中，我们采用有效的3D对比度混合分类网络，用于在胸部CT图像上进行COVID-19诊断，该图像由对比度表示学习和混合分类组成。对于COVID-19检测挑战，我们的方法在484验证CT扫描中达到0.9245宏F1得分，这显着优于基线方法的16.5％。在COVID-19的严重性检测挑战中，我们的方法在61个验证样本上达到0.7186宏F1得分，这也超过了基线8.86％。

translated by 谷歌翻译

MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing

Jiashuo Yu , Ying Cheng , Rui-Wei Zhao , Rui Feng , Yuejie Zhang

分类：计算机视觉

2021-11-24

识别和本地化视频中的事件是视频理解的基本任务。由于事件可能发生在听觉和视觉方式中，因此多式联合的详细感知对于完全的场景理解至关重要。最先前的作品试图从整体角度分析视频。但是，它们不考虑多个尺度的语义信息，这使得模型难以定位各种长度的事件。在本文中，我们提供了一个多模式金字塔注意网络（MM-PYRAMID），用于捕获和集成多级时间特征，用于视听事件定位和视听视频解析。具体而言，我们首先提出了专注特征金字塔模块。该模块通过多个堆叠金字塔单元捕获时间金字塔特征，每个单元都由固定尺寸的注意力块和扩张的卷积块组成。我们还设计了一种自适应语义融合模块，它利用单位级注意块和选择性融合块以交互地集成金字塔特征。对视听事件定位的广泛实验和虚线监督的视听视频解析任务验证了我们方法的有效性。

translated by 谷歌翻译

Deep Unfolded Tensor Robust PCA with Self-supervised Learning

Harry Dong , Megna Shah , Sean Donegan , Yuejie Chi

分类： (统计)机器学习 | 机器学习

2022-12-21

Tensor robust principal component analysis (RPCA), which seeks to separate a low-rank tensor from its sparse corruptions, has been crucial in data science and machine learning where tensor structures are becoming more prevalent. While powerful, existing tensor RPCA algorithms can be difficult to use in practice, as their performance can be sensitive to the choice of additional hyperparameters, which are not straightforward to tune. In this paper, we describe a fast and simple self-supervised model for tensor RPCA using deep unfolding by only learning four hyperparameters. Despite its simplicity, our model expunges the need for ground truth labels while maintaining competitive or even greater performance compared to supervised deep unfolding. Furthermore, our model is capable of operating in extreme data-starved scenarios. We demonstrate these claims on a mix of synthetic data and real-world tasks, comparing performance against previously studied supervised deep unfolding methods and Bayesian optimization baselines.

translated by 谷歌翻译

Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a Generative Model

Gen Li , Yuejie Chi , Yuting Wei , Yuxin Chen

分类：机器学习 | (统计)机器学习

2022-08-22

本文涉及两人零和马尔可夫游戏 - 可以说是多代理增强学习中最基本的设置 - 目的是学习纳什平衡（NE）的样本 - 优越。所有先前的结果至少都有两个障碍中的至少一个：多种试剂的诅咒和长层的障碍，无论使用采样方案如何。假设访问灵活的采样机制：生成模型，我们朝着解决此问题迈出了一步。专注于非平稳的有限 - 霍森马尔可夫游戏，我们开发了一种学习算法$ \ mathsf {nash} \ text { - } \ mathsf {q} \ text { - } \ text { - } \ mathsf {ftrl} $ and deflavery and Adaptive采样方案对抗性学习中的乐观原则（尤其是跟随规范化领导者（FTRL）方法），具有精致的奖励术语设计，可确保在FTRL动力学下进行某些可分解性。我们的算法使用$$ \ widetilde {o} \ bigg（\ frac {h^4 s（a+b）} {\ varepsilon^2} \ bigg）$ bigg）$ samples $ \ varepsilon $ -Approximate Markov ne策略其中$ s $是状态的数量，$ h $是地平线，而$ a $ a $ a $ a $ a $（resp。〜 $ b $）表示max-player的动作数（分别〜min-player）。从最小的意义上讲，这几乎无法得到解决。在此过程中，我们得出了一个精致的遗憾，以赋予FTRL的遗憾，从而明确说明了差异数量的作用，这可能具有独立的利益。

translated by 谷歌翻译

Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity

Laixi Shi , Yuejie Chi

分类：机器学习 | (统计)机器学习

2022-08-11

本文涉及离线增强学习（RL）中模型鲁棒性和样本效率的核心问题，该问题旨在学习从没有主动探索的情况下从历史数据中执行决策。由于环境的不确定性和变异性，至关重要的是，学习强大的策略（尽可能少的样本），即使部署的环境偏离用于收集历史记录数据集的名义环境时，该策略也能很好地执行。我们考虑了离线RL的分布稳健公式，重点是标签非平稳的有限摩托稳健的马尔可夫决策过程，其不确定性设置为Kullback-Leibler Divergence。为了与样本稀缺作用，提出了一种基于模型的算法，该算法将分布强劲的价值迭代与面对不确定性时的悲观原理结合在一起，通过对稳健的价值估计值进行惩罚，以精心设计的数据驱动的惩罚项进行惩罚。在对历史数据集的轻度和量身定制的假设下，该数据集测量分布变化而不需要完全覆盖州行动空间，我们建立了所提出算法的有限样本复杂性，进一步表明，鉴于几乎无法改善的情况，匹配信息理论下限至地平线长度的多项式因素。据我们所知，这提供了第一个在模型不确定性和部分覆盖范围内学习的近乎最佳的稳健离线RL算法。

translated by 谷歌翻译

SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression

Zhize Li , Haoyu Zhao , Boyue Li , Yuejie Chi

分类：机器学习

2022-06-20

为了在带宽洪泛环境（例如无线网络）中启用大规模的机器学习，最近在设计借助通信压缩的帮助下，最近在设计沟通效率的联合学习算法方面取得了重大进展。另一方面，隐私保护，尤其是在客户层面上，是另一个重要的避税，在存在高级通信压缩技术的情况下尚未同时解决。在本文中，我们提出了一个统一的框架，以通过沟通压缩提高私人联邦学习的沟通效率。利用通用压缩操作员和局部差异隐私，我们首先检查了一种简单的算法，该算法将压缩直接应用于差异私密的随机梯度下降，并确定其局限性。然后，我们为私人联合学习提出了一个统一的框架Soteriafl，该框架适应了一般的局部梯度估计剂家庭，包括流行的随机方差减少梯度方法和最先进的变化压缩方案。我们在隐私，公用事业和沟通复杂性方面提供了其性能权衡的全面表征，在这种情况下，Soterafl被证明可以在不牺牲隐私或实用性的情况下实现更好的沟通复杂性，而不是其他私人联合联盟学习算法而没有沟通压缩。

translated by 谷歌翻译