Geometry problem solving is a well-recognized testbed for evaluating the high-level multi-modal reasoning capability of deep models. In most existing works, two main geometry problems: calculation and proving, are usually treated as two specific tasks, hindering a deep model to unify its reasoning capability on multiple math tasks. However, in essence, these two tasks have similar problem representations and overlapped math knowledge which can improve the understanding and reasoning ability of a deep model on both two tasks. Therefore, we construct a large-scale Unified Geometry problem benchmark, UniGeo, which contains 4,998 calculation problems and 9,543 proving problems. Each proving problem is annotated with a multi-step proof with reasons and mathematical expressions. The proof can be easily reformulated as a proving sequence that shares the same formats with the annotated program sequence for calculation problems. Naturally, we also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously in the form of sequence generation, which finally shows the reasoning ability can be improved on both two tasks by unifying formulation. Furthermore, we propose a Mathematical Expression Pretraining (MEP) method that aims to predict the mathematical expressions in the problem solution, thus improving the Geoformer model. Experiments on the UniGeo demonstrate that our proposed Geoformer obtains state-of-the-art performance by outperforming task-specific model NGS with over 5.6% and 3.2% accuracies on calculation and proving problems, respectively.
translated by 谷歌翻译
注意机制在视力识别方面取得了巨大成功。许多作品致力于提高注意力机制的有效性,该机制精心设计了注意操作员的结构。这些作品需要大量实验才能在场景变化时挑选最佳设置,这会消耗大量时间和计算资源。此外,神经网络通常包含许多网络层,并且大多数研究通常使用相同的注意模块来增强不同的网络层,从而阻碍了自我发挥机制的性能的进一步改善。为了解决上述问题,我们提出了一个自我发挥的模块SEM。基于注意模块和替代注意操作员的输入信息,SEM可以自动决定选择和集成注意操作员以计算注意力图。 SEM的有效性通过广泛使用的基准数据集和流行的自我发挥网络的广泛实验来证明。
translated by 谷歌翻译
最近,提出了许多有效的自我发场模块来启动模型性能,通过利用计算机视觉中的卷积神经网络的内部信息。总的来说,许多以前的作品都忽略了考虑自我发挥机制的合并策略的设计,因为它们采用了全球平均水平,这是理所当然的,这阻碍了自我发挥机制的表现进一步改善。但是,我们从经验上发现并验证了一种现象,即全球最大速度和全球最小程度的简单线性组合可以产生匹配或超过全球平均平均水平的性能的合并策略。基于这一经验观察,我们提出了一个简单的自我发场模块SPENET,该模块Spenet采用了基于全球最大功能和全球最小程度的自适应汇总策略,以及用于生成注意力图的轻量级模块。 Spenet的有效性通过广泛使用的基准数据集和流行的自我注意力网络进行了广泛的实验证明。
translated by 谷歌翻译
自动数学问题解决最近引起了越来越多的关注作为长期的AI基准。在本文中,我们专注于解决几何问题,这需要全面了解文本描述,视觉图和定理知识。但是,现有方法高度依赖于手工规则,并且仅在小规模数据集上进行评估。因此,我们提出了一个几何问题应答DataSet GeoQA,其中包含4,998个几何问题,其中具有相应的注释程序,其说明了给定问题的解决过程。与另一个公开的数据集GEOS相比,GeoQA是25倍,程序注释可以为未来的明确和解释数值推理提供实际测试平台。此外,我们通过全面解析多媒体信息和产生可解释程序来引入神经几何求解器(NGS)来解决几何问题。我们进一步为NGS添加了多个自我监督的辅助任务,以增强跨模型语义表示。关于GeoQA的广泛实验验证了我们提出的NGS和辅助任务的有效性。然而,结果仍然明显低于人类性能,这为未来的研究留下了大型空间。我们的基准和代码在https://github.com/chen-judge/geoqa发布。
translated by 谷歌翻译
Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.
translated by 谷歌翻译
Smart City applications, such as traffic monitoring and disaster response, often use swarms of intelligent and cooperative drones to efficiently collect sensor data over different areas of interest and time spans. However, when the required sensing becomes spatio-temporally large and varying, a collective arrangement of sensing tasks to a large number of battery-restricted and distributed drones is challenging. To address this problem, we introduce a scalable and energy-aware model for planning and coordination of spatio-temporal sensing. The coordination model is built upon a decentralized multi-agent collective learning algorithm (EPOS) to ensure scalability, resilience, and flexibility that existing approaches lack of. Experimental results illustrate the outstanding performance of the proposed method compared to state-of-the-art methods. Analytical results contribute a deeper understanding of how coordinated mobility of drones influences sensing performance. This novel coordination solution is applied to traffic monitoring using real-world data to demonstrate a $46.45\%$ more accurate and $2.88\%$ more efficient detection of vehicles as the number of drones become a scarce resource.
translated by 谷歌翻译
Unbiased learning to rank (ULTR) studies the problem of mitigating various biases from implicit user feedback data such as clicks, and has been receiving considerable attention recently. A popular ULTR approach for real-world applications uses a two-tower architecture, where click modeling is factorized into a relevance tower with regular input features, and a bias tower with bias-relevant inputs such as the position of a document. A successful factorization will allow the relevance tower to be exempt from biases. In this work, we identify a critical issue that existing ULTR methods ignored - the bias tower can be confounded with the relevance tower via the underlying true relevance. In particular, the positions were determined by the logging policy, i.e., the previous production model, which would possess relevance information. We give both theoretical analysis and empirical results to show the negative effects on relevance tower due to such a correlation. We then propose three methods to mitigate the negative confounding effects by better disentangling relevance and bias. Empirical results on both controlled public datasets and a large-scale industry dataset show the effectiveness of the proposed approaches.
translated by 谷歌翻译
Benefiting from its single-photon sensitivity, single-photon avalanche diode (SPAD) array has been widely applied in various fields such as fluorescence lifetime imaging and quantum computing. However, large-scale high-fidelity single-photon imaging remains a big challenge, due to the complex hardware manufacture craft and heavy noise disturbance of SPAD arrays. In this work, we introduce deep learning into SPAD, enabling super-resolution single-photon imaging over an order of magnitude, with significant enhancement of bit depth and imaging quality. We first studied the complex photon flow model of SPAD electronics to accurately characterize multiple physical noise sources, and collected a real SPAD image dataset (64 $\times$ 32 pixels, 90 scenes, 10 different bit depth, 3 different illumination flux, 2790 images in total) to calibrate noise model parameters. With this real-world physical noise model, we for the first time synthesized a large-scale realistic single-photon image dataset (image pairs of 5 different resolutions with maximum megapixels, 17250 scenes, 10 different bit depth, 3 different illumination flux, 2.6 million images in total) for subsequent network training. To tackle the severe super-resolution challenge of SPAD inputs with low bit depth, low resolution, and heavy noise, we further built a deep transformer network with a content-adaptive self-attention mechanism and gated fusion modules, which can dig global contextual features to remove multi-source noise and extract full-frequency details. We applied the technique on a series of experiments including macroscopic and microscopic imaging, microfluidic inspection, and Fourier ptychography. The experiments validate the technique's state-of-the-art super-resolution SPAD imaging performance, with more than 5 dB superiority on PSNR compared to the existing methods.
translated by 谷歌翻译
Artificial Intelligence (AI) and its applications have sparked extraordinary interest in recent years. This achievement can be ascribed in part to advances in AI subfields including Machine Learning (ML), Computer Vision (CV), and Natural Language Processing (NLP). Deep learning, a sub-field of machine learning that employs artificial neural network concepts, has enabled the most rapid growth in these domains. The integration of vision and language has sparked a lot of attention as a result of this. The tasks have been created in such a way that they properly exemplify the concepts of deep learning. In this review paper, we provide a thorough and an extensive review of the state of the arts approaches, key models design principles and discuss existing datasets, methods, their problem formulation and evaluation measures for VQA and Visual reasoning tasks to understand vision and language representation learning. We also present some potential future paths in this field of research, with the hope that our study may generate new ideas and novel approaches to handle existing difficulties and develop new applications.
translated by 谷歌翻译
One of the key challenges in deploying RL to real-world applications is to adapt to variations of unknown environment contexts, such as changing terrains in robotic tasks and fluctuated bandwidth in congestion control. Existing works on adaptation to unknown environment contexts either assume the contexts are the same for the whole episode or assume the context variables are Markovian. However, in many real-world applications, the environment context usually stays stable for a stochastic period and then changes in an abrupt and unpredictable manner within an episode, resulting in a segment structure, which existing works fail to address. To leverage the segment structure of piecewise stable context in real-world applications, in this paper, we propose a \textit{\textbf{Se}gmented \textbf{C}ontext \textbf{B}elief \textbf{A}ugmented \textbf{D}eep~(SeCBAD)} RL method. Our method can jointly infer the belief distribution over latent context with the posterior over segment length and perform more accurate belief context inference with observed data within the current context segment. The inferred belief context can be leveraged to augment the state, leading to a policy that can adapt to abrupt variations in context. We demonstrate empirically that SeCBAD can infer context segment length accurately and outperform existing methods on a toy grid world environment and Mujuco tasks with piecewise-stable context.
translated by 谷歌翻译