智能论文笔记

Two-stage Hypothesis Tests for Variable Interactions with FDR Control

Jingyi Duan , Yang Ning , Xi Chen , Yong Chen

分类： (统计)机器学习

2022-08-31

在许多情况下，例如全基因组关联研究，通常存在变量之间的依赖性，通常可以推断模型中的相互作用效应。但是，在复杂和高维数据中数百万变量之间的成对相互作用受到低统计功率和巨大的计算成本的影响。为了应对这些挑战，我们提出了一个具有错误发现率（FDR）控制的两阶段测试程序，该程序被称为不太保守的多次测试校正。从理论上讲，FDR控制会费在两个阶段的数据依赖性方面的难度以及第二阶段进行的假设检验的数量取决于第一阶段的筛选结果。通过使用CRAM \'ER类型中度偏差技术，我们表明我们的过程在普遍的线性模型（GLM）中渐近地控制FDR，其中允许模型被误认为。另外，严格确定了FDR控制程序的渐近力。我们通过全面的仿真研究证明，我们的两阶段程序在计算上比经典BH程序具有可比或改进的统计能力更有效。最后，我们将提出的方法应用于DBGAP的膀胱癌数据，科学目标是鉴定膀胱癌的遗传易感性基因座。

translated by 谷歌翻译

Ultra-sensitive Flexible Sponge-Sensor Array for Muscle Activities Detection and Human Limb Motion Recognition

Jiao Suo , Yifan Liu , Clio Cheng , Keer Wang , Meng Chen , Ho-yin Chan , Roy Vellaisamy , Ning Xi , Vivian W. Q. Lou , Wen Jung Li

分类：机器学习

2022-04-30

人体肢体运动跟踪和识别在医疗康复训练，下肢辅助，截肢者的假肢设计，辅助机器人的反馈控制等中起着重要作用。轻质可穿戴的传感器，包括惯性传感器，表面肌电图传感器以及柔性应变/压力，柔性应变/压力，有望成为下一代人类运动捕获装置。本文中，我们提供了一种无线可穿戴设备，该设备由16通道柔性海绵的压力传感器阵列组成，通过检测由小腿胃gastrocnemius肌肉作用引起的人类皮肤上的轮廓来识别各种人类下肢运动。每个感应元件都是薄碳纳米管/聚二甲基硅氧烷纳米复合材料的圆形多孔结构，直径为4 mm，厚度约为400 {\ mu} m。招募了十个人类受试者，以执行十个不同的下肢运动，同时佩戴开发设备。用支持向量机方法的运动分类结果显示，所有十项测试的动作的宏记录约为97.3％。这项工作证明了具有下肢运动识别应用的便携式可穿戴肌肉活动检测装置，可以在辅助机器人控制，医疗保健，体育监测等中使用该设备。

translated by 谷歌翻译

PMT-IQA: Progressive Multi-task Learning for Blind Image Quality Assessment

Qingyi Pan , Ning Guo , Letu Qingge , Jingyi Zhang , Pei Yang

分类：计算机视觉

2023-01-03

Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.

translated by 谷歌翻译

A Multi-Source Information Learning Framework for Airbnb Price Prediction

Lu Jiang , Yuanhan Li , Na Luo , Jianan Wang , Qiao Ning

分类：机器学习

2023-01-01

With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.

translated by 谷歌翻译

INO: Invariant Neural Operators for Learning Complex Physical Systems with Momentum Conservation

Ning Liu , Yue Yu , Huaiqian You , Neeraj Tatikola

分类：机器学习 | (统计)机器学习

2022-12-29

Neural operators, which emerge as implicit solution operators of hidden governing equations, have recently become popular tools for learning responses of complex real-world physical systems. Nevertheless, the majority of neural operator applications has thus far been data-driven, which neglects the intrinsic preservation of fundamental physical laws in data. In this paper, we introduce a novel integral neural operator architecture, to learn physical models with fundamental conservation laws automatically guaranteed. In particular, by replacing the frame-dependent position information with its invariant counterpart in the kernel space, the proposed neural operator is by design translation- and rotation-invariant, and consequently abides by the conservation laws of linear and angular momentums. As applications, we demonstrate the expressivity and efficacy of our model in learning complex material behaviors from both synthetic and experimental datasets, and show that, by automatically satisfying these essential physical laws, our learned neural operator is not only generalizable in handling translated and rotated datasets, but also achieves state-of-the-art accuracy and efficiency as compared to baseline neural operator models.

translated by 谷歌翻译

NEEDED: Introducing Hierarchical Transformer to Eye Diseases Diagnosis

Xu Ye , Meng Xiao , Zhiyuan Ning , Weiwei Dai , Wenjuan Cui , Yi Du , Yuanchun Zhou

分类：自然语言处理

2022-12-27

With the development of natural language processing techniques(NLP), automatic diagnosis of eye diseases using ophthalmology electronic medical records (OEMR) has become possible. It aims to evaluate the condition of both eyes of a patient respectively, and we formulate it as a particular multi-label classification task in this paper. Although there are a few related studies in other diseases, automatic diagnosis of eye diseases exhibits unique characteristics. First, descriptions of both eyes are mixed up in OEMR documents, with both free text and templated asymptomatic descriptions, resulting in sparsity and clutter of information. Second, OEMR documents contain multiple parts of descriptions and have long document lengths. Third, it is critical to provide explainability to the disease diagnosis model. To overcome those challenges, we present an effective automatic eye disease diagnosis framework, NEEDED. In this framework, a preprocessing module is integrated to improve the density and quality of information. Then, we design a hierarchical transformer structure for learning the contextualized representations of each sentence in the OEMR document. For the diagnosis part, we propose an attention-based predictor that enables traceable diagnosis by obtaining disease-specific information. Experiments on the real dataset and comparison with several baseline models show the advantage and explainability of our framework.

translated by 谷歌翻译

Toward a Unified Framework for Unsupervised Complex Tabular Reasoning

Zhenyu Li , Xiuxing Li , Zhichao Duan , Bowen Dong , Ning Liu , Jianyong Wang

分类：自然语言处理 | 人工智能

2022-12-20

Structured tabular data exist across nearly all fields. Reasoning task over these data aims to answer questions or determine the truthiness of hypothesis sentences by understanding the semantic meaning of a table. While previous works have devoted significant efforts to the tabular reasoning task, they always assume there are sufficient labeled data. However, constructing reasoning samples over tables (and related text) is labor-intensive, especially when the reasoning process is complex. When labeled data is insufficient, the performance of models will suffer an unendurable decline. In this paper, we propose a unified framework for unsupervised complex tabular reasoning (UCTR), which generates sufficient and diverse synthetic data with complex logic for tabular reasoning tasks, assuming no human-annotated data at all. We first utilize a random sampling strategy to collect diverse programs of different types and execute them on tables based on a "Program-Executor" module. To bridge the gap between the programs and natural language sentences, we design a powerful "NL-Generator" module to generate natural language sentences with complex logic from these programs. Since a table often occurs with its surrounding texts, we further propose novel "Table-to-Text" and "Text-to-Table" operators to handle joint table-text reasoning scenarios. This way, we can adequately exploit the unlabeled table resources to obtain a well-performed reasoning model under an unsupervised setting. Our experiments cover different tasks (question answering and fact verification) and different domains (general and specific), showing that our unsupervised methods can achieve at most 93% performance compared to supervised models. We also find that it can substantially boost the supervised performance in low-resourced domains as a data augmentation technique. Our code is available at https://github.com/leezythu/UCTR.

translated by 谷歌翻译

LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer

Ning Yu , Chia-Chih Chen , Zeyuan Chen , Rui Meng , Gang Wu , Paul Josel , Juan Carlos Niebles , Caiming Xiong , Ran Xu

分类：计算机视觉

2022-12-19

Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs are skill-demanding, time-consuming, and non-scalable to batch production. Although generative models emerge to make design automation no longer utopian, it remains non-trivial to customize designs that comply with designers' multimodal desires, i.e., constrained by background images and driven by foreground contents. In this study, we propose \textit{LayoutDETR} that inherits the high quality and realism from generative modeling, in the meanwhile reformulating content-aware requirements as a detection problem: we learn to detect in a background image the reasonable locations, scales, and spatial relations for multimodal elements in a layout. Experiments validate that our solution yields new state-of-the-art performance for layout generation on public benchmarks and on our newly-curated ads banner dataset. For practical usage, we build our solution into a graphical system that facilitates user studies. We demonstrate that our designs attract more subjective preference than baselines by significant margins. Our code, models, dataset, graphical system, and demos are available at https://github.com/salesforce/LayoutDETR.

translated by 谷歌翻译

Efficient Image Captioning for Edge Devices

Ning Wang , Jiangrong Xie , Hang Luo , Qinglin Cheng , Jihao Wu , Mingbo Jia , Linlin Li

分类：计算机视觉

2022-12-18

Recent years have witnessed the rapid progress of image captioning. However, the demands for large memory storage and heavy computational burden prevent these captioning models from being deployed on mobile devices. The main obstacles lie in the heavyweight visual feature extractors (i.e., object detectors) and complicated cross-modal fusion networks. To this end, we propose LightCap, a lightweight image captioner for resource-limited devices. The core design is built on the recent CLIP model for efficient image captioning. To be specific, on the one hand, we leverage the CLIP model to extract the compact grid features without relying on the time-consuming object detectors. On the other hand, we transfer the image-text retrieval design of CLIP to image captioning scenarios by devising a novel visual concept extractor and a cross-modal modulator. We further optimize the cross-modal fusion model and parallel prediction heads via sequential and ensemble distillations. With the carefully designed architecture, our model merely contains 40M parameters, saving the model size by more than 75% and the FLOPs by more than 98% in comparison with the current state-of-the-art methods. In spite of the low capacity, our model still exhibits state-of-the-art performance on prevalent datasets, e.g., 136.6 CIDEr on COCO Karpathy test split. Testing on the smartphone with only a single CPU, the proposed LightCap exhibits a fast inference speed of 188ms per image, which is ready for practical applications.

translated by 谷歌翻译

Unsupervised Dense Retrieval Deserves Better Positive Pairs: Scalable Augmentation with Query Extraction and Generation

Rui Meng , Ye Liu , Semih Yavuz , Divyansh Agarwal , Lifu Tu , Ning Yu , Jianguo Zhang , Meghana Bhat , Yingbo Zhou

分类：自然语言处理

2022-12-17

Dense retrievers have made significant strides in obtaining state-of-the-art results on text retrieval and open-domain question answering (ODQA). Yet most of these achievements were made possible with the help of large annotated datasets, unsupervised learning for dense retrieval models remains an open problem. In this work, we explore two categories of methods for creating pseudo query-document pairs, named query extraction (QExt) and transferred query generation (TQGen), to augment the retriever training in an annotation-free and scalable manner. Specifically, QExt extracts pseudo queries by document structures or selecting salient random spans, and TQGen utilizes generation models trained for other NLP tasks (e.g., summarization) to produce pseudo queries. Extensive experiments show that dense retrievers trained with individual augmentation methods can perform comparably well with multiple strong baselines, and combining them leads to further improvements, achieving state-of-the-art performance of unsupervised dense retrieval on both BEIR and ODQA datasets.

translated by 谷歌翻译