Nine language-vision AI models trained on web scrapes with the Contrastive Language-Image Pretraining (CLIP) objective are evaluated for evidence of a bias studied by psychologists: the sexual objectification of girls and women, which occurs when a person's human characteristics are disregarded and the person is treated as a body or a collection of body parts. A first experiment uses standardized images of women from the Sexual OBjectification and EMotion Database, and finds that, commensurate with prior research in psychology, human characteristics are disassociated from images of objectified women: the model's recognition of emotional state is mediated by whether the subject is fully or partially clothed. Embedding association tests (EATs) return significant effect sizes for both anger (d >.8) and sadness (d >.5). A second experiment measures the effect in a representative application: an automatic image captioner (Antarctic Captions) includes words denoting emotion less than 50% as often for images of partially clothed women than for images of fully clothed women. A third experiment finds that images of female professionals (scientists, doctors, executives) are likely to be associated with sexual descriptions relative to images of male professionals. A fourth experiment shows that a prompt of "a [age] year old girl" generates sexualized images (as determined by an NSFW classifier) up to 73% of the time for VQGAN-CLIP (age 17), and up to 40% of the time for Stable Diffusion (ages 14 and 18); the corresponding rate for boys never surpasses 9%. The evidence indicates that language-vision AI models trained on automatically collected web scrapes learn biases of sexual objectification, which propagate to downstream applications.
translated by 谷歌翻译
评估了三种最先进的语言和图像AI模型,即剪辑,滑移和BLIP,以证明以前在社会和实验心理学中观察到的偏见:将美国身份等同于白人。使用芝加哥面部数据库(CFD)的自我识别的亚洲,黑人,拉丁裔和白人的标准化图像的嵌入关联测试(eats)表明,白人与集体内词相比,比亚洲,黑色更相关,或拉丁裔/o个人。在评估社会心理学家报道的美国身份的三个核心方面时,单类饮食表明,白人个体的图像与爱国主义和出生在美国更相关,但与心理学的先前发现一致,白人个人是相关的不太可能平等对待所有种族和背景的人。三个下游机器学习任务表明了与白人相关联的偏见。在使用BLIP的视觉问题回答任务中,有97%的白人被确定为美国人,而仅3%的亚洲人。当被问及个人所描绘的生活状态时,该模型在亚洲人中有53%的时间回应中国,但始终具有美国对白人个人的国家。在图像字幕的任务中,Blip评论了亚洲人的种族多达36%的时间,但从未对白人人士进行比赛。最后,使用基于文本的剪辑指导的综合图像发生器(VQGAN)提供了CFD和文本“ American Person”的初始化图像,从而减轻了所有种族个体的肤色(黑人个人的肤色为35% ,基于像素亮度)。结果表明,语言和图像AI将其等同于美国身份与白人的偏见,并传播到此类模型的下游应用。
translated by 谷歌翻译
语言语料库中的统计规律将众所周知的社会偏见编码为单词嵌入。在这里,我们专注于性别,以全面分析在互联网语料库中训练的广泛使用的静态英语单词嵌入式(Glove 2014,FastText 2017)。使用单类单词嵌入关联测试,我们证明了性别偏见的广泛流行,这些偏见也显示出:(1)与男性与女性相关的单词频率; (b)与性别相关的单词中的言论部分; (c)与性别相关的单词中的语义类别; (d)性别相关的单词中的价,唤醒和优势。首先,就单词频率而言:我们发现,在词汇量中,有1000个最常见的单词与男性相比,有77%的人与男性相关,这是在英语世界的日常语言中直接证明男性默认的证据。其次,转向言论的部分:顶级男性相关的单词通常是动词(例如,战斗,压倒性),而顶级女性相关的单词通常是形容词和副词(例如,奉献,情感上)。嵌入中的性别偏见也渗透到言论部分。第三,对于语义类别:自下而上,对与每个性别相关的前1000个单词的群集分析。与男性相关的顶级概念包括大技术,工程,宗教,体育和暴力的角色和领域;相比之下,顶级女性相关的概念较少关注角色,包括女性特定的诽谤和性内容以及外观和厨房用语。第四,使用〜20,000个单词词典的人类评级,唤醒和主导地位,我们发现与男性相关的单词在唤醒和优势上较高,而与女性相关的单词在价上更高。
translated by 谷歌翻译
具有未观察变量的因果模型对观察变量的分布施加非竞争约束。当两个变量的常见原因未被观察时,无法揭示它们之间的因果关系,而不会对模型进行额外的假设。在这项工作中,我们考虑因果模型,并承诺未观察到的变量已知已知基数。我们派生在这种模型中的D分离暗示的不平等约束。此外,我们探讨了利用这一结果的可能性,以研究涉及量子系统的模型中的因果影响。
translated by 谷歌翻译
贝尔的定理通常被理解为量子理论与局部隐藏变量模型不兼容的证据。更一般地说,我们可以看到违反贝尔不等式,以证明与古典因果模型解释量子相关性的不可能性。然而,违反了贝尔不等式并不排除允许某些测量依赖性的经典模型,即,观察者所做的选择可以与生成要测量的系统的源相关联。在这里,我们表明,如果我们在网络内安排响铃测试,则可以定量上限测量依赖性。此外,我们还证明了这些结果可以调整,以便为大类因果网络获得非线性响铃不等式,并识别违反它们的量子可实现的相关性。
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
We explore the use of large language models (LLMs) for zero-shot semantic parsing. Semantic parsing involves mapping natural language utterances to task-specific meaning representations. Language models are generally trained on the publicly available text and code and cannot be expected to directly generalize to domain-specific parsing tasks in a zero-shot setting. In this work, we propose ZEROTOP, a zero-shot task-oriented parsing method that decomposes a semantic parsing problem into a set of abstractive and extractive question-answering (QA) problems, enabling us to leverage the ability of LLMs to zero-shot answer reading comprehension questions. For each utterance, we prompt the LLM with questions corresponding to its top-level intent and a set of slots and use the LLM generations to construct the target meaning representation. We observe that current LLMs fail to detect unanswerable questions; and as a result, cannot handle questions corresponding to missing slots. To address this problem, we fine-tune a language model on public QA datasets using synthetic negative samples. Experimental results show that our QA-based decomposition paired with the fine-tuned LLM can correctly parse ~16% of utterances in the MTOP dataset without requiring any annotated data.
translated by 谷歌翻译
Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method.
translated by 谷歌翻译
We apply the vision transformer, a deep machine learning model build around the attention mechanism, on mel-spectrogram representations of raw audio recordings. When adding mel-based data augmentation techniques and sample-weighting, we achieve comparable performance on both (PRS and CCS challenge) tasks of ComParE21, outperforming most single model baselines. We further introduce overlapping vertical patching and evaluate the influence of parameter configurations. Index Terms: audio classification, attention, mel-spectrogram, unbalanced data-sets, computational paralinguistics
translated by 谷歌翻译
Common to all different kinds of recurrent neural networks (RNNs) is the intention to model relations between data points through time. When there is no immediate relationship between subsequent data points (like when the data points are generated at random, e.g.), we show that RNNs are still able to remember a few data points back into the sequence by memorizing them by heart using standard backpropagation. However, we also show that for classical RNNs, LSTM and GRU networks the distance of data points between recurrent calls that can be reproduced this way is highly limited (compared to even a loose connection between data points) and subject to various constraints imposed by the type and size of the RNN in question. This implies the existence of a hard limit (way below the information-theoretic one) for the distance between related data points within which RNNs are still able to recognize said relation.
translated by 谷歌翻译