智能论文笔记

3D-LDM: Neural Implicit 3D Shape Generation with Latent Diffusion Models

Gimin Nam , Mariem Khlifi , Andrew Rodriguez , Alberto Tono , Linqi Zhou , Paul Guerrero

分类：计算机视觉

2022-12-01

Diffusion models have shown great promise for image generation, beating GANs in terms of generation diversity, with comparable image quality. However, their application to 3D shapes has been limited to point or voxel representations that can in practice not accurately represent a 3D surface. We propose a diffusion model for neural implicit representations of 3D shapes that operates in the latent space of an auto-decoder. This allows us to generate diverse and high quality 3D surfaces. We additionally show that we can condition our model on images or text to enable image-to-3D generation and text-to-3D generation using CLIP embeddings. Furthermore, adding noise to the latent codes of existing shapes allows us to explore shape variations.

translated by 谷歌翻译

NeuForm: Adaptive Overfitting for Neural Shape Editing

Connor Z. Lin , Niloy J. Mitra , Gordon Wetzstein , Leonidas Guibas , Paul Guerrero

分类：计算机视觉 | 机器学习

2022-07-18

神经表示是表示形状的流行，因为它们可以学习形式传感器数据，并用于数据清理，模型完成，形状编辑和形状合成。当前的神经表示形式可以归类为对单个对象实例的过度拟合或表示对象集合。但是，都不允许对神经场景表示的准确编辑：一方面，过度拟合对象实现高度准确的重建的方法，但不能推广到看不见的对象配置，因此无法支持编辑；另一方面，代表具有变化的对象家族的方法确实概括了，但仅产生近似重建。我们建议Neuform使用最适合每个形状区域的一个：可靠数据的过拟合表示，以及可靠的可用数据以及其他任何地方的可推广表示形式，以适应过度拟合和可推广表示的优势。我们通过精心设计的体系结构和一种将两个表示网络权重融合在一起的方法，避免接缝和其他工件。我们展示了成功重新配置人类设计的形状的部分，例如椅子，表和灯，同时保留语义完整性和过度拟合形状表示的准确性。我们与两个最先进的竞争对手进行了比较，并在合理性和结果编辑的忠诚度方面取得了明显的改善。

translated by 谷歌翻译

TileGen: Tileable, Controllable Material Generation and Capture

Xilong Zhou , Miloš Hašan , Valentin Deschaintre , Paul Guerrero , Kalyan Sunkavalli , Nima Kalantari

分类：计算机视觉

2022-06-12

最近的方法（例如材料gan）已使用无条件的gan来生成每像素材料图，或作为从输入照片重建材料之前的材料。这些模型可以生成各种随机材料外观，但没有任何将生成材料限制为特定类别或控制生成材料的粗体结构的机制，例如砖墙上的精确砖布局。此外，从单个输入照片中重建的材料通常具有伪像，并且通常不可易换，这限制了它们在实际内容创建管道中的使用。我们提出了Tilegen，这是一种针对SVBRDFS的生成模型，该模型特定于材料类别，始终可易换，并且在提供的输入结构模式上有条件。 Tilegen是Stylegan的变体，其架构经过修改以始终生成可易于的（周期性）材料图。除了标准的“样式”潜在代码外，Tilegen还可以选择拍摄条件图像，从而使用户直接控制材料的主要空间（和可选的颜色）功能。例如，在砖块中，用户可以指定砖布局和砖块，或者在皮革材料中，皱纹和褶皱的位置。我们的反渲染方法可以通过优化找到一种材料，从而感知到单个目标照片。这种重建也可以以用户提供的模式为条件。所得的材料是可拆卸的，可以大于目标图像，并且可以通过改变条件来编辑。

translated by 谷歌翻译

The Shape Part Slot Machine: Contact-based Reasoning for Generating 3D Shapes from Parts

Kai Wang , Paul Guerrero , Vladimir Kim , Siddhartha Chaudhuri , Minhyuk Sung , Daniel Ritchie

分类：计算机视觉 | 机器学习

2021-12-01

我们通过执行基于接触的推理，提供了一种形状部分插槽机，一种用于组装来自现有部件的新型3D形状。我们的方法表示每个形状作为“槽”的图形，其中每个槽是两个形状部件之间的接触区域。基于此表示，我们设计了一种基于图形 - 神经网络的模型，用于生成新的插槽图和检索兼容部分，以及基于梯度 - 下降的优化方案，用于将检索到的部分组装成尊重所生成的完整形状插槽图。这种方法不需要任何语义部分标签;有趣的是，它还不需要完整的部分几何形状 - 推理零件连接的区域足以产生新颖的，高质量的3D形状。我们展示了我们的方法在质量，多样性和结构复杂性方面产生了优于现有的逐个拟合方法的形状。

translated by 谷歌翻译

Mapping Knowledge Representations to Concepts: A Review and New Perspectives

Lars Holmberg , Paul Davidsson , Per Linde

分类：人工智能 | 机器学习

2022-12-31

The success of neural networks builds to a large extent on their ability to create internal knowledge representations from real-world high-dimensional data, such as images, sound, or text. Approaches to extract and present these representations, in order to explain the neural network's decisions, is an active and multifaceted research field. To gain a deeper understanding of a central aspect of this field, we have performed a targeted review focusing on research that aims to associate internal representations with human understandable concepts. In doing this, we added a perspective on the existing research by using primarily deductive nomological explanations as a proposed taxonomy. We find this taxonomy and theories of causality, useful for understanding what can be expected, and not expected, from neural network explanations. The analysis additionally uncovers an ambiguity in the reviewed literature related to the goal of model explainability; is it understanding the ML model or, is it actionable explanations useful in the deployment domain?

translated by 谷歌翻译

On Implicit Bias in Overparameterized Bilevel Optimization

Paul Vicol , Jonathan Lorraine , Fabian Pedregosa , David Duvenaud , Roger Grosse

分类：机器学习

2022-12-28

Many problems in machine learning involve bilevel optimization (BLO), including hyperparameter optimization, meta-learning, and dataset distillation. Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively. In practice, often at least one of these sub-problems is overparameterized. In this case, there are many ways to choose among optima that achieve equivalent objective values. Inspired by recent studies of the implicit bias induced by optimization algorithms in single-level optimization, we investigate the implicit bias of gradient-based algorithms for bilevel optimization. We delineate two standard BLO methods -- cold-start and warm-start -- and show that the converged solution or long-run behavior depends to a large degree on these and other algorithmic choices, such as the hypergradient approximation. We also show that the inner solutions obtained by warm-start BLO can encode a surprising amount of information about the outer objective, even when the outer parameters are low-dimensional. We believe that implicit bias deserves as central a role in the study of bilevel optimization as it has attained in the study of single-level neural net optimization.

translated by 谷歌翻译

Brain Cancer Segmentation Using YOLOv5 Deep Neural Network

Sudipto Paul , Dr. Md Taimur Ahad , Md. Mahedi Hasan

分类：计算机视觉

2022-12-27

An expansion of aberrant brain cells is referred to as a brain tumor. The brain's architecture is extremely intricate, with several regions controlling various nervous system processes. Any portion of the brain or skull can develop a brain tumor, including the brain's protective coating, the base of the skull, the brainstem, the sinuses, the nasal cavity, and many other places. Over the past ten years, numerous developments in the field of computer-aided brain tumor diagnosis have been made. Recently, instance segmentation has attracted a lot of interest in numerous computer vision applications. It seeks to assign various IDs to various scene objects, even if they are members of the same class. Typically, a two-stage pipeline is used to perform instance segmentation. This study shows brain cancer segmentation using YOLOv5. Yolo takes dataset as picture format and corresponding text file. You Only Look Once (YOLO) is a viral and widely used algorithm. YOLO is famous for its object recognition properties. You Only Look Once (YOLO) is a popular algorithm that has gone viral. YOLO is well known for its ability to identify objects. YOLO V2, V3, V4, and V5 are some of the YOLO latest versions that experts have published in recent years. Early brain tumor detection is one of the most important jobs that neurologists and radiologists have. However, it can be difficult and error-prone to manually identify and segment brain tumors from Magnetic Resonance Imaging (MRI) data. For making an early diagnosis of the condition, an automated brain tumor detection system is necessary. The model of the research paper has three classes. They are respectively Meningioma, Pituitary, Glioma. The results show that, our model achieves competitive accuracy, in terms of runtime usage of M2 10 core GPU.

translated by 谷歌翻译

Large Language Models Encode Clinical Knowledge

Karan Singhal , Shekoofeh Azizi , Tao Tu , S. Sara Mahdavi , Jason Wei , Hyung Won Chung , Nathan Scales , Ajay Tanwani , Heather Cole-Lewis , Stephen Pfohl

分类：自然语言处理

2022-12-26

Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.

translated by 谷歌翻译

Generalizable Natural Language Processing Framework for Migraine Reporting from Social Media

Yuting Guo , Swati Rajwal , Sahithi Lakamana , Chia-Chun Chiang , Paul C. Menell , Adnan H. Shahid , Yi-Chieh Chen , Nikita Chhabra , Wan-Ju Chao , Chieh-Ju Chao

分类：自然语言处理

2022-12-23

Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text classification system for automatically detecting self-reported migraine-related posts, and (iii) conduct analyses of the self-reported posts to assess the utility of social media for studying this problem. We manually annotated 5750 Twitter posts and 302 Reddit posts. Our system achieved an F1 score of 0.90 on Twitter and 0.93 on Reddit. Analysis of information posted by our 'migraine cohort' revealed the presence of a plethora of relevant information about migraine therapies and patient sentiments associated with them. Our study forms the foundation for conducting an in-depth analysis of migraine-related information using social media data.

translated by 谷歌翻译

Re-basin via implicit Sinkhorn differentiation

Fidel A. Guerrero Peña , Heitor Rapela Medeiros , Thomas Dubail , Masih Aminbeidokhti , Eric Granger , Marco Pedersoli

分类：计算机视觉

2022-12-22

The recent emergence of new algorithms for permuting models into functionally equivalent regions of the solution space has shed some light on the complexity of error surfaces, and some promising properties like mode connectivity. However, finding the right permutation is challenging, and current optimization techniques are not differentiable, which makes it difficult to integrate into a gradient-based optimization, and often leads to sub-optimal solutions. In this paper, we propose a Sinkhorn re-basin network with the ability to obtain the transportation plan that better suits a given objective. Unlike the current state-of-art, our method is differentiable and, therefore, easy to adapt to any task within the deep learning domain. Furthermore, we show the advantage of our re-basin method by proposing a new cost function that allows performing incremental learning by exploiting the linear mode connectivity property. The benefit of our method is compared against similar approaches from the literature, under several conditions for both optimal transport finding and linear mode connectivity. The effectiveness of our continual learning method based on re-basin is also shown for several common benchmark datasets, providing experimental results that are competitive with state-of-art results from the literature.

translated by 谷歌翻译