There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .
translated by 谷歌翻译
The proliferation of smartphones has accelerated mobility studies by largely increasing the type and volume of mobility data available. One such source of mobility data is from GPS technology, which is becoming increasingly common and helps the research community understand mobility patterns of people. However, there lacks a standardized framework for studying the different mobility patterns created by the non-Work, non-Home locations of Working and Nonworking users on Workdays and Offdays using machine learning methods. We propose a new mobility metric, Daily Characteristic Distance, and use it to generate features for each user together with Origin-Destination matrix features. We then use those features with an unsupervised machine learning method, $k$-means clustering, and obtain three clusters of users for each type of day (Workday and Offday). Finally, we propose two new metrics for the analysis of the clustering results, namely User Commonality and Average Frequency. By using the proposed metrics, interesting user behaviors can be discerned and it helps us to better understand the mobility patterns of the users.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.
translated by 谷歌翻译
Person re-identification plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. Recently, leveraging the supervised or semi-unsupervised learning paradigms, which benefits from the large-scale datasets and strong computing performance, has achieved a competitive performance on a specific target domain. However, when Re-ID models are directly deployed in a new domain without target samples, they always suffer from considerable performance degradation and poor domain generalization. To address this challenge, we propose a Deep Multimodal Fusion network to elaborate rich semantic knowledge for assisting in representation learning during the pre-training. Importantly, a multimodal fusion strategy is introduced to translate the features of different modalities into the common space, which can significantly boost generalization capability of Re-ID model. As for the fine-tuning stage, a realistic dataset is adopted to fine-tune the pre-trained model for better distribution alignment with real-world data. Comprehensive experiments on benchmarks demonstrate that our method can significantly outperform previous domain generalization or meta-learning methods with a clear margin. Our source code will also be publicly available at https://github.com/JeremyXSC/DMF.
translated by 谷歌翻译
无损图像压缩是图像压缩中必不可少的研究领域。最近,与传统的无损方法(例如WebP,JPEG2000和FLIF)相比,基于学习的图像压缩方法具有令人印象深刻的性能。但是,仍然有许多令人印象深刻的有损压缩方法可应用于无损压缩。因此,在本文中,我们探讨了广泛用于有损压缩的方法,并将其应用于无损压缩。受损失压缩显示的高斯混合模型(GMM)的令人印象深刻的性能的启发,我们与GMM生成了无损网络体系结构。除了注意到注意模块和自回归模型的成功成就外,我们建议利用注意模块,并为我们的网络体系结构中的原始图像添加额外的自动回归模型,以提高性能。实验结果表明,我们的方法优于大多数经典的无损压缩方法和现有基于学习的方法。
translated by 谷歌翻译
由于患者状况和治疗需求的变化,电子健康记录(EHR)表现出大量缺失数据。缺失价值的插补被认为是应对这一挑战的有效方法。现有的工作将插补方法和预测模型分为基于EHR的机器学习系统的两个独立部分。我们通过利用复合密度网络(CDNET)提出了一种集成的端对端方法,该方法允许插入方法和预测模型在单个框架中调整在一起。 CDNET由一个封闭式复发单元(GRU),混合物密度网络(MDN)和正则注意网络(RAN)组成。 GRU用作对EHR数据进行建模的潜在变量模型。 MDN旨在采样GRU生成的潜在变量。该运行是适用于较不可靠的估算值的正规器。 CDNET的结构使GRU和MDN迭代地利用彼此的输出来估算缺失值,从而导致更准确,更健壮的预测。我们验证cdnet关于模拟III数据集的死亡率预测任务。我们的模型以大幅度的利润率优于最先进的模型。我们还从经验上表明,正规化值是出色预测性能的关键因素。对预测不确定性的分析表明,我们的模型可以同时捕获核心和认知不确定性,从而使模型用户更好地了解模型结果。
translated by 谷歌翻译
深度度量学习(DML)有助于学习嵌入功能,以将语义上的数据投射到附近的嵌入空间中,并在许多应用中起着至关重要的作用,例如图像检索和面部识别。但是,DML方法的性能通常很大程度上取决于采样方法,从训练中的嵌入空间中选择有效的数据。实际上,嵌入空间中的嵌入是通过一些深层模型获得的,其中嵌入空间通常由于缺乏训练点而在贫瘠的区域中,导致所谓的“缺失嵌入”问题。此问题可能会损害样品质量,从而导致DML性能退化。在这项工作中,我们研究了如何减轻“缺失”问题以提高采样质量并实现有效的DML。为此,我们提出了一个密集锚定的采样(DAS)方案,该方案将嵌入的数据点视为“锚”,并利用锚附近的嵌入空间来密集地生成无数据点的嵌入。具体而言,我们建议用判别性特征缩放(DFS)和多个锚点利用单个锚周围的嵌入空间,并具有记忆转换转换(MTS)。通过这种方式,通过有或没有数据点的嵌入方式,我们能够提供更多的嵌入以促进采样过程,从而提高DML的性能。我们的方法毫不费力地集成到现有的DML框架中,并在没有铃铛和哨声的情况下改进了它们。在三个基准数据集上进行的广泛实验证明了我们方法的优势。
translated by 谷歌翻译