3D地质模型中的每个网格块都需要一种代表该块的所有物理和化学性质的岩石类型。分类岩石类型的性质是岩性,渗透性和毛细管压力。科学家和工程师使用传统的实验室测量确定这些性质,其将破坏性方法嵌入样品或改变其一些性质(即,润湿性,渗透率和孔隙率),因为测量过程包括样品粉碎,流体流动或流体饱和度。最近,数字岩体物理学(DRT)已经出现了从微观计算机断层扫描(UCT)和磁共振成像(MRI)图像中量化这些性质。然而,文献没有尝试以完全数字语境的摇滚打字。我们提出表演数字摇滚打字(DRT):(1)整合最新的DRP在授予数字岩石属性确定的新工艺中; (2)数字化碳酸盐中最新的岩石打字方法,(3)引入了一种新颖的碳酸盐岩字打字过程,该过程利用计算机视觉功能,为异构碳酸岩纹理提供更多洞察力。
translated by 谷歌翻译
渗透性对天然液的流动性具有显性影响。格子Boltzmann模拟器确定纳米和微孔网络的渗透率。模拟器占据了数百万的流动动态计算,其累积的误差和高耗电量的计算能力。为了有效且始终如一地预测渗透性,我们提出了一种形态学解码器,从3D微型计算机层面扫描和核磁共振图像中提出了机器学习的平行和串行流量重建。对于3D视觉,我们将可控可测量的卷引入新的监督分段,其中一组独特的体素强度对应于晶粒和孔喉部尺寸。形态解码器以新颖的方式贬低并汇集形态边界以产生渗透性。形态学解码器方法由五种新方法组成,其中描述了本文,这些新方法是:(1)几何3D渗透率,(2)机器学习引导3D特性识别岩石形态,(3)3D图像特性集成模型的渗透率(4)MRI渗透成像器,(5)形态解码器(整合其他四个新颖过程的过程)。
translated by 谷歌翻译
为了定义最佳机器学习算法,该决定并不容易,我们将选择它。为了帮助未来的研究人员,我们在本文中描述了最好的算法中的最佳状态。我们构建了一个合成数据集,并执行了5个不同算法的监督机器学习。对于异质性,我们确定了随机森林等,是最好的算法。
translated by 谷歌翻译
自动图像处理算法可以提高分类异构碳酸盐岩石形态的质量,效率和一致性,可以无缝地处理大量的数据和图像。地质学家面临困难在设定从岩石图像,微计算断层扫描(UCT)或磁共振成像(MRI)中确定岩石物理性质的最佳方法的方向。大多数成功的工作是来自同质岩石,专注于2D图像,较少关注3D并需要数值模拟。目前,图像分析方法会聚到三种方法:图像处理,人工智能和具有人工智能的组合图像处理。在这项工作中,我们提出了两种方法来确定3D UCT和MRI图像的孔隙率:具有图像分辨率的图像处理方法优化高斯算法(IROGA);高斯随机森林机器学习差异(MLDGRF)启用先进的图像识别方法。我们已经建立了参考3D微型模型和收集的图像以校准Iroga和MLDGRF方法。为了评估这些校准方法的预测能力,我们在3D UCT和天然异质碳酸盐岩的MRI图像上运行它们。我们分别测量了三种行业标准方式的碳酸盐岩的孔隙度和岩性,分别为参考值。值得注意的是,与三种实验测量相比,IROGA和MLDGRF的精度产生96.2%和97.1%的精度为96.2%和97.1%,91.7%和94.4%。我们使用两种方法,X射线粉末衍射和晶粒密度测量测量石灰石和硫铁矿参考值。 MLDGRF生产岩性(石灰石和硫铁矿)卷,精度为97.7%。
translated by 谷歌翻译
Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM). Existing work has combined these in simple "retrieve-then-read" pipelines in which the RM retrieves passages that are inserted into the LM prompt. To begin to fully realize the potential of frozen LMs and RMs, we propose Demonstrate-Search-Predict (DSP), a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM. DSP can express high-level programs that bootstrap pipeline-aware demonstrations, search for relevant passages, and generate grounded predictions, systematically breaking down problems into small transformations that the LM and RM can handle more reliably. We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings, establishing in early evaluations new state-of-the-art in-context learning results and delivering 37-200%, 8-40%, and 80-290% relative gains against vanilla LMs, a standard retrieve-then-read pipeline, and a contemporaneous self-ask pipeline, respectively.
translated by 谷歌翻译
Over the last decade, an approach that has gained a lot of popularity to tackle non-parametric testing problems on general (i.e., non-Euclidean) domains is based on the notion of reproducing kernel Hilbert space (RKHS) embedding of probability distributions. The main goal of our work is to understand the optimality of two-sample tests constructed based on this approach. First, we show that the popular MMD (maximum mean discrepancy) two-sample test is not optimal in terms of the separation boundary measured in Hellinger distance. Second, we propose a modification to the MMD test based on spectral regularization by taking into account the covariance information (which is not captured by the MMD test) and prove the proposed test to be minimax optimal with a smaller separation boundary than that achieved by the MMD test. Third, we propose an adaptive version of the above test which involves a data-driven strategy to choose the regularization parameter and show the adaptive test to be almost minimax optimal up to a logarithmic factor. Moreover, our results hold for the permutation variant of the test where the test threshold is chosen elegantly through the permutation of the samples. Through numerical experiments on synthetic and real-world data, we demonstrate the superior performance of the proposed test in comparison to the MMD test.
translated by 谷歌翻译
When annotators label data, a key metric for quality assurance is inter-annotator agreement (IAA): the extent to which annotators agree on their labels. Though many IAA measures exist for simple categorical and ordinal labeling tasks, relatively little work has considered more complex labeling tasks, such as structured, multi-object, and free-text annotations. Krippendorff's alpha, best known for use with simpler labeling tasks, does have a distance-based formulation with broader applicability, but little work has studied its efficacy and consistency across complex annotation tasks. We investigate the design and evaluation of IAA measures for complex annotation tasks, with evaluation spanning seven diverse tasks: image bounding boxes, image keypoints, text sequence tagging, ranked lists, free text translations, numeric vectors, and syntax trees. We identify the difficulty of interpretability and the complexity of choosing a distance function as key obstacles in applying Krippendorff's alpha generally across these tasks. We propose two novel, more interpretable measures, showing they yield more consistent IAA measures across tasks and annotation distance functions.
translated by 谷歌翻译
Generating a chain of thought (CoT) can increase large language model (LLM) performance on a wide range of tasks. Zero-shot CoT evaluations, however, have been conducted primarily on logical tasks (e.g. arithmetic, commonsense QA). In this paper, we perform a controlled evaluation of zero-shot CoT across two sensitive domains: harmful questions and stereotype benchmarks. We find that using zero-shot CoT reasoning in a prompt can significantly increase a model's likelihood to produce undesirable output. Without future advances in alignment or explicit mitigation instructions, zero-shot CoT should be avoided on tasks where models can make inferences about marginalized groups or harmful topics.
translated by 谷歌翻译
Multi-object state estimation is a fundamental problem for robotic applications where a robot must interact with other moving objects. Typically, other objects' relevant state features are not directly observable, and must instead be inferred from observations. Particle filtering can perform such inference given approximate transition and observation models. However, these models are often unknown a priori, yielding a difficult parameter estimation problem since observations jointly carry transition and observation noise. In this work, we consider learning maximum-likelihood parameters using particle methods. Recent methods addressing this problem typically differentiate through time in a particle filter, which requires workarounds to the non-differentiable resampling step, that yield biased or high variance gradient estimates. By contrast, we exploit Fisher's identity to obtain a particle-based approximation of the score function (the gradient of the log likelihood) that yields a low variance estimate while only requiring stepwise differentiation through the transition and observation models. We apply our method to real data collected from autonomous vehicles (AVs) and show that it learns better models than existing techniques and is more stable in training, yielding an effective smoother for tracking the trajectories of vehicles around an AV.
translated by 谷歌翻译
Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality. Latency, hardware cost, and other efficiency considerations are paramount to the deployment of IR systems in user-facing settings. We propose that IR benchmarks structure their evaluation methodology to include not only metrics of accuracy, but also efficiency considerations such as a query latency and the corresponding cost budget for a reproducible hardware setting. For the popular IR benchmarks MS MARCO and XOR-TyDi, we show how the best choice of IR system varies according to how these efficiency considerations are chosen and weighed. We hope that future benchmarks will adopt these guidelines toward more holistic IR evaluation.
translated by 谷歌翻译