智能论文笔记

Learning to integrate vision data into road network data

Oliver Stromann , Alireza Razavi , Michael Felsberg

分类：计算机视觉

2021-12-20

道路网络是连接和自动车辆的核心基础设施，但为机器学习应用程序创建有意义的表示是一个具有挑战性的任务。在这项工作中，我们建议将遥感视觉数据集成到道路网络数据中，以改进具有图形神经网络的嵌入式。我们基于时空道路和交通特性提出了道路边缘的分割，这允许通过卫星图像和数字表面模型的视觉特征来丰富一组道路网络。我们展示了这两者，分段和视觉数据的整合可以提高道路类型分类任务的性能，我们在中国成都的OSM + Didi Chuxing DataSet上实现了最先进的表现。

translated by 谷歌翻译

ParsiNorm: A Persian Toolkit for Speech Processing Normalization

Romina Oji , Seyedeh Fatemeh Razavi , Sajjad Abdi Dehsorkh , Alireza Hariri , Hadi Asheri , Reshad Hosseini

分类：自然语言处理 | 机器学习

2021-11-01

通常，语音处理模型包括语言模型以及声学模型。无论语言模型的复杂性和变体如何，语言模型需要三个关键的预处理步骤：清洁，标准化和标记。在提到的步骤中，归一化步骤对于在纯文本应用程序中格式化统一是必要的。然而，对于语音处理模块中的嵌入式语言模型，归一化不限于格式化统一。此外，它必须将每个可读符号，数字等转换为它们的发音方式。据我们所知，语音处理模块中没有用于嵌入式语言模型的波斯标准化工具包，因此在本文中，我们提出了一个用于语音应用程序中的文本处理的开源归一化工具包。简而言之，我们考虑不同的可读波斯文，如符号（常见的货币，＃，@，URL等），数字（日期，时间，电话号码，国家代码等）等。与其他可用波斯文本规范化工具的比较表明了语音处理中提出的方法的优越性。此外，将模型的性能与其他常见的自然语言库（如HATM和Parsivar）的其他常见的自然语言库进行比较，指示所提出的方法的正确性能。此外，它对一些波斯维基百科数据的评估证实了该方法的适当性能。

translated by 谷歌翻译

Conformal Prediction Intervals for Remaining Useful Lifetime Estimation

Alireza Javanmardi , Eyke Hüllermeier

分类：机器学习

2022-12-30

The main objective of Prognostics and Health Management is to estimate the Remaining Useful Lifetime (RUL), namely, the time that a system or a piece of equipment is still in working order before starting to function incorrectly. In recent years, numerous machine learning algorithms have been proposed for RUL estimation, mainly focusing on providing more accurate RUL predictions. However, there are many sources of uncertainty in the problem, such as inherent randomness of systems failure, lack of knowledge regarding their future states, and inaccuracy of the underlying predictive models, making it infeasible to predict the RULs precisely. Hence, it is of utmost importance to quantify the uncertainty alongside the RUL predictions. In this work, we investigate the conformal prediction (CP) framework that represents uncertainty by predicting sets of possible values for the target variable (intervals in the case of RUL) instead of making point predictions. Under very mild technical assumptions, CP formally guarantees that the actual value (true RUL) is covered by the predicted set with a degree of certainty that can be prespecified. We study three CP algorithms to conformalize any single-point RUL predictor and turn it into a valid interval predictor. Finally, we conformalize two single-point RUL predictors, deep convolutional neural networks and gradient boosting, and illustrate their performance on the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) data sets.

translated by 谷歌翻译

EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models

Xinlin Li , Mariana Parazeres , Adam Oberman , Alireza Ghaffari , Masoud Asgharian , Vahid Partovi Nia

分类：机器学习

2022-12-22

With the advent of deep learning application on edge devices, researchers actively try to optimize their deployments on low-power and restricted memory devices. There are established compression method such as quantization, pruning, and architecture search that leverage commodity hardware. Apart from conventional compression algorithms, one may redesign the operations of deep learning models that lead to more efficient implementation. To this end, we propose EuclidNet, a compression method, designed to be implemented on hardware which replaces multiplication, $xw$, with Euclidean distance $(x-w)^2$. We show that EuclidNet is aligned with matrix multiplication and it can be used as a measure of similarity in case of convolutional layers. Furthermore, we show that under various transformations and noise scenarios, EuclidNet exhibits the same performance compared to the deep learning models designed with multiplication operations.

translated by 谷歌翻译

Reinforcement Learning Based Approaches to Adaptive Context Caching in Distributed Context Management Systems

Shakthi Weerasinghe , Arkady Zaslavsky , Seng W. Loke , Amin Abken , Alireza Hassani

分类：机器学习

2022-12-22

Performance metrics-driven context caching has a profound impact on throughput and response time in distributed context management systems for real-time context queries. This paper proposes a reinforcement learning based approach to adaptively cache context with the objective of minimizing the cost incurred by context management systems in responding to context queries. Our novel algorithms enable context queries and sub-queries to reuse and repurpose cached context in an efficient manner. This approach is distinctive to traditional data caching approaches by three main features. First, we make selective context cache admissions using no prior knowledge of the context, or the context query load. Secondly, we develop and incorporate innovative heuristic models to calculate expected performance of caching an item when making the decisions. Thirdly, our strategy defines a time-aware continuous cache action space. We present two reinforcement learning agents, a value function estimating actor-critic agent and a policy search agent using deep deterministic policy gradient method. The paper also proposes adaptive policies such as eviction and cache memory scaling to complement our objective. Our method is evaluated using a synthetically generated load of context sub-queries and a synthetic data set inspired from real world data and query samples. We further investigate optimal adaptive caching configurations under different settings. This paper presents, compares, and discusses our findings that the proposed selective caching methods reach short- and long-term cost- and performance-efficiency. The paper demonstrates that the proposed methods outperform other modes of context management such as redirector mode, and database mode, and cache all policy by up to 60% in cost efficiency.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Privacy-Preserving Collaborative Learning through Feature Extraction

Alireza Sarmadi , Hao Fu , Prashanth Krishnamurthy , Siddharth Garg , Farshad Khorrami

分类：机器学习

2022-12-13

We propose a framework in which multiple entities collaborate to build a machine learning model while preserving privacy of their data. The approach utilizes feature embeddings from shared/per-entity feature extractors transforming data into a feature space for cooperation between entities. We propose two specific methods and compare them with a baseline method. In Shared Feature Extractor (SFE) Learning, the entities use a shared feature extractor to compute feature embeddings of samples. In Locally Trained Feature Extractor (LTFE) Learning, each entity uses a separate feature extractor and models are trained using concatenated features from all entities. As a baseline, in Cooperatively Trained Feature Extractor (CTFE) Learning, the entities train models by sharing raw data. Secure multi-party algorithms are utilized to train models without revealing data or features in plain text. We investigate the trade-offs among SFE, LTFE, and CTFE in regard to performance, privacy leakage (using an off-the-shelf membership inference attack), and computational cost. LTFE provides the most privacy, followed by SFE, and then CTFE. Computational cost is lowest for SFE and the relative speed of CTFE and LTFE depends on network architecture. CTFE and LTFE provide the best accuracy. We use MNIST, a synthetic dataset, and a credit card fraud detection dataset for evaluations.

translated by 谷歌翻译

Aerobat, A Bioinspired Drone to Test High-DOF Actuation and Embodied Aerial Locomotion

Alireza Ramezani , Eric Sihite

分类：机器人

2022-12-10

This work presents an actuation framework for a bioinspired flapping drone called Aerobat. This drone, capable of producing dynamically versatile wing conformations, possesses 14 body joints and is tail-less. Therefore, in our robot, unlike mainstream flapping wing designs that are open-loop stable and have no pronounced morphing characteristics, the actuation, and closed-loop feedback design can pose significant challenges. We propose a framework based on integrating mechanical intelligence and control. In this design framework, small adjustments led by several tiny low-power actuators called primers can yield significant flight control roles owing to the robot's computational structures. Since they are incredibly lightweight, the system can host the primers in large numbers. In this work, we aim to show the feasibility of joint's motion regulation in Aerobat's untethered flights.

translated by 谷歌翻译

Wake-Based Locomotion Gait Design for Aerobat

Eric Sihite , Alireza Ramezani

分类：机器人

2022-12-10

Flying animals, such as bats, fly through their fluidic environment as they create air jets and form wake structures downstream of their flight path. Bats, in particular, dynamically morph their highly flexible and dexterous armwing to manipulate their fluidic environment which is key to their agility and flight efficiency. This paper presents the theoretical and numerical analysis of the wake-structure-based gait design inspired by bat flight for flapping robots using the notion of reduced-order models and unsteady aerodynamic model incorporating Wagner function. The objective of this paper is to introduce the notion of gait design for flapping robots by systematically searching the design space in the context of optimization. The solution found using our gait design framework was used to design and test a flapping robot.

translated by 谷歌翻译

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

Ziniu Hu , Ahmet Iscen , Chen Sun , Zirui Wang , Kai-Wei Chang , Yizhou Sun , Cordelia Schmid , David A. Ross , Alireza Fathi

分类：计算机视觉 | 人工智能

2022-12-10

In this paper, we propose an end-to-end Retrieval-Augmented Visual Language Model (REVEAL) that learns to encode world knowledge into a large-scale memory, and to retrieve from it to answer knowledge-intensive queries. REVEAL consists of four key components: the memory, the encoder, the retriever and the generator. The large-scale memory encodes various sources of multimodal world knowledge (e.g. image-text pairs, question answering pairs, knowledge graph triplets, etc) via a unified encoder. The retriever finds the most relevant knowledge entries in the memory, and the generator fuses the retrieved knowledge with the input query to produce the output. A key novelty in our approach is that the memory, encoder, retriever and generator are all pre-trained end-to-end on a massive amount of data. Furthermore, our approach can use a diverse set of multimodal knowledge sources, which is shown to result in significant gains. We show that REVEAL achieves state-of-the-art results on visual question answering and image captioning.

translated by 谷歌翻译