Code generation models have achieved impressive performance. However, they tend to be brittle as slight edits to a prompt could lead to very different generations; these robustness properties, critical for user experience when deployed in real-life applications, are not well understood. Most existing works on robustness in text or code tasks have focused on classification, while robustness in generation tasks is an uncharted area and to date there is no comprehensive benchmark for robustness in code generation. In this paper, we propose ReCode, a comprehensive robustness evaluation benchmark for code generation models. We customize over 30 transformations specifically for code on docstrings, function and variable names, code syntax, and code format. They are carefully designed to be natural in real-life coding practice, preserve the original semantic meaning, and thus provide multifaceted assessments of a model's robustness performance. With human annotators, we verified that over 90% of the perturbed prompts do not alter the semantic meaning of the original prompt. In addition, we define robustness metrics for code generation models considering the worst-case behavior under each type of perturbation, taking advantage of the fact that executing the generated code can serve as objective evaluation. We demonstrate ReCode on SOTA models using HumanEval, MBPP, as well as function completion tasks derived from them. Interesting observations include: better robustness for CodeGen over InCoder and GPT-J; models are most sensitive to syntax perturbations; more challenging robustness evaluation on MBPP over HumanEval.
translated by 谷歌翻译
Individual particle rotation and displacement were measured in triaxial tests on transparent sand stabilized with geogrid simulants. The Cellpose U-Net model, originally developed to segment biological cells, was trained to segment images of fused quartz particles. The Score-CAM metric from the field of Explainable AI was used to validate the application of Cellpose to segment particles of fused quartz. These segmented particles were characterized in terms of Fourier shape descriptors and tracked across images. The measured particle displacements in the monotonic triaxial tests correlated with displacement fields from Digital Image Correlation (DIC). In contrast to DIC, the new technique also allows for the measurement of individual particle rotation. The particle rotation measurements were found to be repeatable across different specimens. A state boundary line between probable and improbable particle motions could be identified for a given test based on the measured particle displacements and rotations. The size of the zone of probable motions was used to quantify the effectiveness of the stabilizing inclusions. The results of repeated load tests revealed that the honeycomb inclusions used stabilized the specimens by reducing both particle displacements and rotations.
translated by 谷歌翻译
The vision community has explored numerous pose guided human editing methods due to their extensive practical applications. Most of these methods still use an image-to-image formulation in which a single image is given as input to produce an edited image as output. However, the problem is ill-defined in cases when the target pose is significantly different from the input pose. Existing methods then resort to in-painting or style transfer to handle occlusions and preserve content. In this paper, we explore the utilization of multiple views to minimize the issue of missing information and generate an accurate representation of the underlying human model. To fuse the knowledge from multiple viewpoints, we design a selector network that takes the pose keypoints and texture from images and generates an interpretable per-pixel selection map. After that, the encodings from a separate network (trained on a single image human reposing task) are merged in the latent space. This enables us to generate accurate, precise, and visually coherent images for different editing tasks. We show the application of our network on 2 newly proposed tasks - Multi-view human reposing, and Mix-and-match human image generation. Additionally, we study the limitations of single-view editing and scenarios in which multi-view provides a much better alternative.
translated by 谷歌翻译
Most speech enhancement (SE) models learn a point estimate, and do not make use of uncertainty estimation in the learning process. In this paper, we show that modeling heteroscedastic uncertainty by minimizing a multivariate Gaussian negative log-likelihood (NLL) improves SE performance at no extra cost. During training, our approach augments a model learning complex spectral mapping with a temporary submodel to predict the covariance of the enhancement error at each time-frequency bin. Due to unrestricted heteroscedastic uncertainty, the covariance introduces an undersampling effect, detrimental to SE performance. To mitigate undersampling, our approach inflates the uncertainty lower bound and weights each loss component with their uncertainty, effectively compensating severely undersampled components with more penalties. Our multivariate setting reveals common covariance assumptions such as scalar and diagonal matrices. By weakening these assumptions, we show that the NLL achieves superior performance compared to popular losses including the mean squared error (MSE), mean absolute error (MAE), and scale-invariant signal-to-distortion ratio (SI-SDR).
translated by 谷歌翻译
We introduce a new method for diverse foreground generation with explicit control over various factors. Existing image inpainting based foreground generation methods often struggle to generate diverse results and rarely allow users to explicitly control specific factors of variation (e.g., varying the facial identity or expression for face inpainting results). We leverage contrastive learning with latent codes to generate diverse foreground results for the same masked input. Specifically, we define two sets of latent codes, where one controls a pre-defined factor (``known''), and the other controls the remaining factors (``unknown''). The sampled latent codes from the two sets jointly bi-modulate the convolution kernels to guide the generator to synthesize diverse results. Experiments demonstrate the superiority of our method over state-of-the-arts in result diversity and generation controllability.
translated by 谷歌翻译
与计算机视觉合并的基于无人机的遥感系统(UAV)遥感系统具有协助建筑物建设和灾难管理的潜力,例如地震期间的损害评估。可以通过检查来评估建筑物到地震的脆弱性,该检查考虑到相关组件的预期损害进展以及组件对结构系统性能的贡献。这些检查中的大多数是手动进行的,导致高利用人力,时间和成本。本文提出了一种通过基于无人机的图像数据收集和用于后处理的软件库来自动化这些检查的方法,该方法有助于估算地震结构参数。这里考虑的关键参数是相邻建筑物,建筑计划形状,建筑计划区域,屋顶上的对象和屋顶布局之间的距离。通过使用距离测量传感器以及通过Google Earth获得的数据进行的现场测量,可以验证所提出的方法在估计上述参数估算上述参数方面的准确性。可以从https://uvrsabi.github.io/访问其他详细信息和代码。
translated by 谷歌翻译
本文描述了对象目标导航任务的框架,该任务要求机器人从随机的启动位置查找并移至目标对象类的最接近实例。该框架使用机器人轨迹的历史记录来学习空间关系图(SRG)和图形卷积网络(GCN)基于基于不同语义标记区域的可能性以及这些区域不同对象类别的发生的可能性。为了在评估过程中定位目标对象实例,机器人使用贝叶斯推理和SRG估计可见区域,并使用学习的GCN嵌入来对可见区域进行排名,并选择接下来的区域。
translated by 谷歌翻译
人们最近开始通过社交网站上用户生成的多媒体材料来传达自己的思想和观点。此信息可以是图像,文本,视频或音频。近年来,这种模式的发生频率有所增加。 Twitter是最广泛使用的社交媒体网站之一,它也是最好的地点之一,可以使人们对与蒙基波疾病有关的事件有一种了解。这是因为Twitter上的推文被缩短并经常更新,这两者都促成了平台的角色。这项研究的基本目标是对人们对这种情况的存在的各种反应进行更深入的理解。这项研究重点是找出个人对猴蛋白酶疾病的看法,该疾病介绍了基于CNN和LSTM的混合技术。我们已经考虑了用户推文的所有三个可能的极性:正,负和中立。使用CNN和LSTM构建的架构来确定预测模型的准确性。推荐模型的准确性在Monkeypox Tweet数据集上为94%。其他性能指标(例如准确性,召回和F1得分)也用于测试我们的模型和最大程度和资源有效的方式。然后将发现与更传统的机器学习方法进行比较。这项研究的发现有助于提高对普通人群中蒙基托感染的认识。
translated by 谷歌翻译
有效的视觉在延迟预算下的精度最大化。这些作品一次评估脱机准确性,一次是一张图像。但是,诸如自动驾驶之类的实时视觉应用在流媒体设置中运行,在这些设置中,地面真相在推理开始和终点之间会发生变化。这会导致明显的准确性下降。因此,最近提出的一项旨在最大程度地提高流媒体设置准确性的工作。在本文中,我们建议在每个环境环境中最大化流的准确性。我们认为场景难度会影响初始(离线)精度差异,而场景中的障碍物位移会影响后续的准确性降解。我们的方法章鱼使用这些方案属性来选择在测试时最大化流量准确性的配置。我们的方法将跟踪性能(S-MOTA)提高了7.4%,而常规静态方法则提高了。此外,使用我们的方法提高性能,而不是离线准确性的进步,而不是代替而不是进步。
translated by 谷歌翻译
许多测量机器人和动态障碍状态的商品传感器具有非高斯噪声特征。然而,许多当前的方法将运动和感知的潜在不确定性视为高斯,主要是为了确保计算障碍。另一方面,与非高斯不确定性一起工作的现有计划者不会阐明运动和感知噪声的分布特征,例如偏见以避免有效碰撞。本文通过将避免反应性碰撞解释为碰撞约束违规与Dirac Delta分布之间的分配匹配问题来填补这一空白。为了确保策划者的快速反应性,我们将每个分布嵌入重现Hilbert空间,并将分布匹配重新匹配,以最大程度地减少两个分布之间的最大平均差异(MMD)。我们表明,评估给定对照输入的MMD归结为仅矩阵矩阵产品。我们利用这种见解来开发一种简单的控制抽样方法,以避免动态和不确定的障碍。我们在两个方面推进了最新的。首先,我们进行了广泛的实证研究,以表明我们的计划者可以从样本级别的信息中推断出分布偏差。因此,它使用此见解来指导机器人良好的同型。我们还强调了基本不确定性的高斯近似如何失去偏置估计值,并引导机器人以高碰撞概率为不利状态。其次,我们显示了与以前的非参数和高斯近似反应性碰撞避免碰撞的碰撞方法的拟议分布匹配方法的切实比较优势。
translated by 谷歌翻译