智能论文笔记

An FPGA-based Solution for Convolution Operation Acceleration

Trung Dinh Pham , Bao Gia Bach , Lam Trinh Luu , Minh Dinh Nguyen , Hai Duc Pham , Khoa Bui Anh , Xuan Quang Nguyen , Cuong Pham Quoc

分类：人工智能 | 机器学习

2022-06-09

基于硬件的加速度是促进许多计算密集型数学操作的广泛尝试。本文提出了一个基于FPGA的体系结构来加速卷积操作 - 在许多卷积神经网络模型中出现的复杂且昂贵的计算步骤。我们将设计定为标准卷积操作，打算以边缘-AI解决方案启动产品。该项目的目的是产生一个可以一次处理卷积层的FPGA IP核心。系统开发人员可以使用Verilog HDL作为体系结构的主要设计语言来部署IP核心。实验结果表明，我们在简单的边缘计算FPGA板上合成的单个计算核心可以提供0.224 GOPS。当董事会充分利用时，可以实现4.48 GOP。

translated by 谷歌翻译

ColonFormer: An Efficient Transformer based Method for Colon Polyp Segmentation

Nguyen Thanh Duc , Nguyen Thi Oanh , Nguyen Thi Thuy , Tran Minh Triet , Dinh Viet Sang

分类：计算机视觉

2022-05-17

识别息肉对于在计算机辅助临床支持系统中自动分析内窥镜图像的自动分析具有挑战性。已经提出了基于卷积网络（CNN），变压器及其组合的模型，以分割息肉以有希望的结果。但是，这些方法在模拟息肉的局部外观方面存在局限性，或者在解码过程中缺乏用于空间依赖性的多层次特征。本文提出了一个新颖的网络，即结肠形式，以解决这些局限性。 Colonformer是一种编码器架构，能够在编码器和解码器分支上对远程语义信息进行建模。编码器是一种基于变压器的轻量级体系结构，用于在多尺度上建模全局语义关系。解码器是一种层次结构结构，旨在学习多层功能以丰富特征表示。此外，添加了一个新的Skip连接技术，以完善整体地图中的息肉对象的边界以进行精确分割。已经在五个流行的基准数据集上进行了广泛的实验，以进行息肉分割，包括Kvasir，CVC-Clinic DB，CVC-ColondB，CVC-T和Etis-Larib。实验结果表明，我们的结肠构造者在所有基准数据集上的表现优于其他最先进的方法。

translated by 谷歌翻译

Simultaneous face detection and 360 degree headpose estimation

Hoang Nguyen Viet , Linh Nguyen Viet , Tuan Nguyen Dinh , Duc Tran Minh , Long Tran Quoc

分类：计算机视觉

2021-11-23

随着人类生活中的许多实际应用，包括制造监控摄像机，分析和加工客户行为，许多研究人员都注明了对数字图像的面部检测和头部姿势估计。大量提出的深度学习模型具有最先进的准确性，如YOLO，SSD，MTCNN，解决了面部检测或HOPENET的问题，FSA-NET，用于头部姿势估计问题的速度。根据许多最先进的方法，该任务的管道由两部分组成，从面部检测到头部姿势估计。这两个步骤完全独立，不共享信息。这使得模型在设置中清除但不利用每个模型中提取的大部分特色资源。在本文中，我们提出了多任务净模型，具有利用从面部检测模型提取的特征的动机，将它们与头部姿势估计分支共享以提高精度。此外，随着各种数据，表示面部的欧拉角域大，我们的模型可以预测360欧拉角域的结果。应用多任务学习方法，多任务净模型可以同时预测人头的位置和方向。为了提高预测模型的头部方向的能力，我们将人脸从欧拉角呈现到旋转矩阵的载体。

translated by 谷歌翻译

UET-Headpose: A sensor-based top-view head pose dataset

Linh Nguyen Viet , Tuan Nguyen Dinh , Hoang Nguyen Viet , Duc Tran Minh , Long Tran Quoc

分类：计算机视觉 | 人工智能

2021-11-13

头部姿势估计是一个具有挑战性的任务，旨在解决与预测三维向量相关的问题，这为人机互动或客户行为中的许多应用程序提供服务。以前的研究提出了一些用于收集头部姿势数据的精确方法。但这些方法需要昂贵的设备，如深度摄像机或复杂的实验室环境设置。在这项研究中，我们引入了一种新的方法，以有效的成本和易于设置，以收集头部姿势图像，即UET-HEADBETS数据集，具有顶视图头姿势数据。该方法使用绝对方向传感器而不是深度摄像机快速设置，但仍然可以确保良好的效果。通过实验，我们的数据集已显示其分发和可用数据集之间的差异，如CMU Panoptic DataSet \ Cite {CMU}。除了使用UET符号数据集和其他头部姿势数据集外，我们还介绍了称为FSANET的全范围模型，这显着优于UET-HEALPETS数据集的头部姿势估计结果，尤其是在顶视图上。此外，该模型非常重量轻，占用小尺寸图像。

translated by 谷歌翻译

CoughTrigger: Earbuds IMU Based Cough Detection Activator Using An Energy-efficient Sensitivity-prioritized Time Series Classifier

Shibo Zhang , Ebrahim Nemati , Minh Dinh , Nathan Folkman , Tousif Ahmed , Mahbubur Rahman , Jilong Kuang , Nabil Alshurafa , Alex Gao

分类：机器学习

2021-11-07

持续咳嗽是呼吸系统疾病的主要症状。通过可穿戴物品来检测咳嗽，特别是在Covid-19大流行期间，已经支付了越来越多的研究。在所有类型的传感器中，麦克风最广泛地用于检测咳嗽。然而，处理音频信号所需的强力消耗阻碍了对电池限制的商业可穿戴产品（例如耳塞）的连续音频咳嗽检测。我们呈现了利用较低功率传感器，惯性测量单元（IMU）的COUGHTRIGGER作为咳嗽检测激活器，以触发更高功率的传感器，用于音频处理和分类。它能够以最小的电池消耗运行作为备用服务，并在从IMU检测到候选咳嗽时触发基于音频的咳嗽检测。此外，IMU的使用带来了改善咳嗽检测特异性的益处。实验是对45个科目进行的，我们的IMU的模型达到了0.77 AUC评分，留出了一个主题的评价。我们还验证了其对自由生活数据的有效性，并通过设备实现。

translated by 谷歌翻译

A New Look and Convergence Rate of Federated Multi-Task Learning with Laplacian Regularization

Canh T. Dinh , Tung T. Vu , Nguyen H. Tran , Minh N. Dao , Hongyu Zhang

分类：机器学习

2021-02-14

客户端之间的非独立和相同分布（非IID）数据分布被视为降低联合学习（FL）性能的关键因素。处理非IID数据（如个性化FL和联邦多任务学习（FMTL）的几种方法对研究社区有很大兴趣。在这项工作中，首先，我们使用Laplacian正规化制定FMTL问题，明确地利用客户模型之间的关系进行多任务学习。然后，我们介绍了FMTL问题的新视图，首次表明配制的FMTL问题可用于传统的FL和个性化FL。我们还提出了两种算法FEDU和DFEDU，分别解决了通信集中和分散方案中的配制FMTL问题。从理论上讲，我们证明了两种算法的收敛速率实现了用于非凸起目标的强大凸起和载位加速的线性加速。实验，我们表明我们的算法优于FL设置的传统算法FedVG，在FMTL设置中的Mocha，以及个性化流程中的PFEDME和PER-FEDAVG。

translated by 谷歌翻译

Comparison and Evaluation of Methods for a Predict+Optimize Problem in Renewable Energy

Christoph Bergmeir , Frits de Nijs , Abishek Sriramulu , Mahdi Abolghasemi , Richard Bean , John Betts , Quang Bui , Nam Trong Dinh , Nils Einecke , Rasul Esmaeilbeigi

分类：人工智能

2022-12-21

Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method.

translated by 谷歌翻译

Managing Large Dataset Gaps in Urban Air Quality Prediction: DCU-Insight-AQ at MediaEval 2022

Dinh Viet Cuong , Phuc H. Le-Khac , Adam Stapleton , Elke Eichlemann , Mark Roantree , Alan F. Smeaton

分类：机器学习 | 人工智能

2022-12-19

Calculating an Air Quality Index (AQI) typically uses data streams from air quality sensors deployed at fixed locations and the calculation is a real time process. If one or a number of sensors are broken or offline, then the real time AQI value cannot be computed. Estimating AQI values for some point in the future is a predictive process and uses historical AQI values to train and build models. In this work we focus on gap filling in air quality data where the task is to predict the AQI at 1, 5 and 7 days into the future. The scenario is where one or a number of air, weather and traffic sensors are offline and explores prediction accuracy under such situations. The work is part of the MediaEval'2022 Urban Air: Urban Life and Air Pollution task submitted by the DCU-Insight-AQ team and uses multimodal and crossmodal data consisting of AQI, weather and CCTV traffic images for air pollution prediction.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Improving Warped Planar Object Detection Network For Automatic License Plate Recognition

Nguyen Dinh Tra , Nguyen Cong Tri , Phan Duy Hung

分类：计算机视觉 | 人工智能

2022-12-14

This paper aims to improve the Warping Planer Object Detection Network (WPOD-Net) using feature engineering to increase accuracy. What problems are solved using the Warping Object Detection Network using feature engineering? More specifically, we think that it makes sense to add knowledge about edges in the image to enhance the information for determining the license plate contour of the original WPOD-Net model. The Sobel filter has been selected experimentally and acts as a Convolutional Neural Network layer, the edge information is combined with the old information of the original network to create the final embedding vector. The proposed model was compared with the original model on a set of data that we collected for evaluation. The results are evaluated through the Quadrilateral Intersection over Union value and demonstrate that the model has a significant improvement in performance.

translated by 谷歌翻译