Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure.
translated by 谷歌翻译
在本报告中,我们展示了ICDAR 2021版文档视觉问题挑战的结果。此版本在单个文档VQA和Document Collection VQA上补充了以前的任务,并在Infographics VQA上进行了新引入的。信息图表VQA基于超过5,000个信息图表图像和30,000个问题答案对的新数据集。获胜者方法在Infographics VQA任务中获得了0.6120个ANL,0.7743 anlsl在文档集中的VQA任务和单个文档VQA中的0.8705 ANL中。我们展示了用于每个任务的数据集的摘要,每个提交的方法的描述以及它们的性能的结果和分析。由于还提出了自从第一版DocVQA 2020挑战以来在单个文档VQA上取得的摘要。
translated by 谷歌翻译
translated by 谷歌翻译
Hyperspectral Imaging (HSI) provides detailed spectral information and has been utilised in many real-world applications. This work introduces an HSI dataset of building facades in a light industry environment with the aim of classifying different building materials in a scene. The dataset is called the Light Industrial Building HSI (LIB-HSI) dataset. This dataset consists of nine categories and 44 classes. In this study, we investigated deep learning based semantic segmentation algorithms on RGB and hyperspectral images to classify various building materials, such as timber, brick and concrete.
translated by 谷歌翻译
Scene text images have different shapes and are subjected to various distortions, e.g. perspective distortions. To handle these challenges, the state-of-the-art methods rely on a rectification network, which is connected to the text recognition network. They form a linear pipeline which uses text rectification on all input images, even for images that can be recognized without it. Undoubtedly, the rectification network improves the overall text recognition performance. However, in some cases, the rectification network generates unnecessary distortions on images, resulting in incorrect predictions in images that would have otherwise been correct without it. In order to alleviate the unnecessary distortions, the portmanteauing of features is proposed. The portmanteau feature, inspired by the portmanteau word, is a feature containing information from both the original text image and the rectified image. To generate the portmanteau feature, a non-linear input pipeline with a block matrix initialization is presented. In this work, the transformer is chosen as the recognition network due to its utilization of attention and inherent parallelism, which can effectively handle the portmanteau feature. The proposed method is examined on 6 benchmarks and compared with 13 state-of-the-art methods. The experimental results show that the proposed method outperforms the state-of-the-art methods on various of the benchmarks.
translated by 谷歌翻译
This paper computationally demonstrates a sharp improvement in predictive performance for $k$ nearest neighbors thanks to an efficient forward selection of the predictor variables. We show both simulated and real-world data that this novel repeatedly approaches outperformance regression models under stepwise selection
translated by 谷歌翻译
季节预测$ \ unicode {x2013} $预测温度和降水量为2至6周$ \ unicode {x2013} $,对于有效的水分配,野火管理,干旱和缓解洪水至关重要。最近的国际研究工作提高了操作动力学模型的亚季节能力,但是温度和降水预测技能仍然很差,部分原因是代表动态模型内大气动力学和物理学的顽固错误。为了应对这些错误,我们引入了一种自适应偏置校正(ABC)方法,该方法将最新的动力学预测与使用机器学习的观察结合在一起。当应用于欧洲中等天气预测中心(ECMWF)的领先的亚季节模型时,ABC将温度预测技能提高了60-90%,在美国的连续美国,降水预测技能提高了40-69%基于Shapley队列的实用工作流程,用于解释ABC技能的提高并根据特定的气候条件识别机遇的高技能窗口。
translated by 谷歌翻译
培训和评估机器学习模型的迭代是提高其性能的重要过程。但是,尽管可教学的接口使盲人用户能够在其独特的环境中拍摄的照片训练和测试对象识别器,但训练迭代和评估步骤的可访问性很少受到关注。迭代假设训练照片的目视检查,对于盲人用户来说是无法访问的。我们通过MyCam探索了这一挑战,Mycam是一个移动应用程序,该应用程序合并了自动估计的描述符,以在用户培训集中对照片进行非视觉访问。我们探索盲人参与者(n = 12)如何通过他们的家中的评估研究与mycam和描述符相互作用。我们证明,实时照片级描述符使盲人用户能够用裁剪的对象减少照片,并且参与者可以通过迭代并访问其训练集的质量来增加更多的变化。此外,参与者发现该应用程序易于使用,表明他们可以有效地训练它,并且描述符很有用。但是,主观反应并未反映在其模型的性能中,部分原因是训练和混乱背景的变化很小。
translated by 谷歌翻译
Drori等。(2022)报告说:“神经网络通过计划的综合来解决,解释和产生大学数学问题,在人类层面上学习很少……[它]自动回答了81 \%的大学级数学问题。”他们描述的系统确实令人印象深刻。但是,上述描述夸大了。解决问题的工作不是由神经网络而是由符号代数软件包Sympy完成的。各种格式的问题被排除在考虑之外。所谓的“说明”只是代码行的重新词。答案被标记为问题中未指定的形式的正确。最严重的是,似乎在许多情况下,系统使用测试语料库中给出的正确答案来指导其解决问题的道路。
translated by 谷歌翻译
translated by 谷歌翻译