培训和评估机器学习模型的迭代是提高其性能的重要过程。但是,尽管可教学的接口使盲人用户能够在其独特的环境中拍摄的照片训练和测试对象识别器,但训练迭代和评估步骤的可访问性很少受到关注。迭代假设训练照片的目视检查,对于盲人用户来说是无法访问的。我们通过MyCam探索了这一挑战,Mycam是一个移动应用程序,该应用程序合并了自动估计的描述符,以在用户培训集中对照片进行非视觉访问。我们探索盲人参与者(n = 12)如何通过他们的家中的评估研究与mycam和描述符相互作用。我们证明,实时照片级描述符使盲人用户能够用裁剪的对象减少照片,并且参与者可以通过迭代并访问其训练集的质量来增加更多的变化。此外,参与者发现该应用程序易于使用,表明他们可以有效地训练它,并且描述符很有用。但是,主观反应并未反映在其模型的性能中,部分原因是训练和混乱背景的变化很小。
translated by 谷歌翻译
A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, we find that - across 6 architectures trained with 4 algorithms on massive datasets - they exhibit little compositionality. To arrive at this conclusion, we introduce a new compositionality evaluation benchmark CREPE which measures two important aspects of compositionality identified by cognitive science literature: systematicity and productivity. To measure systematicity, CREPE consists of three test datasets. The three test sets are designed to test models trained on three of the popular training datasets: CC-12M, YFCC-15M, and LAION-400M. They contain 385K, 385K, and 373K image-text pairs and 237K, 210K, and 178K hard negative captions. To test productivity, CREPE contains 17K image-text pairs with nine different complexities plus 246K hard negative captions with atomic, swapping, and negation foils. The datasets are generated by repurposing the Visual Genome scene graphs and region descriptions and applying handcrafted templates and GPT-3. For systematicity, we find that model performance decreases consistently when novel compositions dominate the retrieval set, with Recall@1 dropping by up to 8%. For productivity, models' retrieval success decays as complexity increases, frequently nearing random chance at high complexity. These results hold regardless of model and training dataset size.
translated by 谷歌翻译
This paper proposes embedded Gaussian Process Barrier States (GP-BaS), a methodology to safely control unmodeled dynamics of nonlinear system using Bayesian learning. Gaussian Processes (GPs) are used to model the dynamics of the safety-critical system, which is subsequently used in the GP-BaS model. We derive the barrier state dynamics utilizing the GP posterior, which is used to construct a safety embedded Gaussian process dynamical model (GPDM). We show that the safety-critical system can be controlled to remain inside the safe region as long as we can design a controller that renders the BaS-GPDM's trajectories bounded (or asymptotically stable). The proposed approach overcomes various limitations in early attempts at combining GPs with barrier functions due to the abstention of restrictive assumptions such as linearity of the system with respect to control, relative degree of the constraints and number or nature of constraints. This work is implemented on various examples for trajectory optimization and control including optimal stabilization of unstable linear system and safe trajectory optimization of a Dubins vehicle navigating through an obstacle course and on a quadrotor in an obstacle avoidance task using GP differentiable dynamic programming (GP-DDP). The proposed framework is capable of maintaining safe optimization and control of unmodeled dynamics and is purely data driven.
translated by 谷歌翻译
The use of emojis affords a visual modality to, often private, textual communication. The task of predicting emojis however provides a challenge for machine learning as emoji use tends to cluster into the frequently used and the rarely used emojis. Much of the machine learning research on emoji use has focused on high resource languages and has conceptualised the task of predicting emojis around traditional server-side machine learning approaches. However, traditional machine learning approaches for private communication can introduce privacy concerns, as these approaches require all data to be transmitted to a central storage. In this paper, we seek to address the dual concerns of emphasising high resource languages for emoji prediction and risking the privacy of people's data. We introduce a new dataset of $118$k tweets (augmented from $25$k unique tweets) for emoji prediction in Hindi, and propose a modification to the federated learning algorithm, CausalFedGSD, which aims to strike a balance between model performance and user privacy. We show that our approach obtains comparative scores with more complex centralised models while reducing the amount of data required to optimise the models and minimising risks to user privacy.
translated by 谷歌翻译
Damage to the inferior frontal gyrus (Broca's area) can cause agrammatic aphasia wherein patients, although able to comprehend, lack the ability to form complete sentences. This inability leads to communication gaps which cause difficulties in their daily lives. The usage of assistive devices can help in mitigating these issues and enable the patients to communicate effectively. However, due to lack of large scale studies of linguistic deficits in aphasia, research on such assistive technology is relatively limited. In this work, we present two contributions that aim to re-initiate research and development in this field. Firstly, we propose a model that uses linguistic features from small scale studies on aphasia patients and generates large scale datasets of synthetic aphasic utterances from grammatically correct datasets. We show that the mean length of utterance, the noun/verb ratio, and the simple/complex sentence ratio of our synthetic datasets correspond to the reported features of aphasic speech. Further, we demonstrate how the synthetic datasets may be utilized to develop assistive devices for aphasia patients. The pre-trained T5 transformer is fine-tuned using the generated dataset to suggest 5 corrected sentences given an aphasic utterance as input. We evaluate the efficacy of the T5 model using the BLEU and cosine semantic similarity scores. Affirming results with BLEU score of 0.827/1.00 and semantic similarity of 0.904/1.00 were obtained. These results provide a strong foundation for the concept that a synthetic dataset based on small scale studies on aphasia can be used to develop effective assistive technology.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
We present SLATE, a sequence labeling approach for extracting tasks from free-form content such as digitally handwritten (or "inked") notes on a virtual whiteboard. Our approach allows us to create a single, low-latency model to simultaneously perform sentence segmentation and classification of these sentences into task/non-task sentences. SLATE greatly outperforms a baseline two-model (sentence segmentation followed by classification model) approach, achieving a task F1 score of 84.4\%, a sentence segmentation (boundary similarity) score of 88.4% and three times lower latency compared to the baseline. Furthermore, we provide insights into tackling challenges of performing NLP on the inking domain. We release both our code and dataset for this novel task.
translated by 谷歌翻译
我们在室外环境中自动驾驶的背景下研究了视觉和语言导航(VLN)问题。我们通过明确接地与Textual命令相对应的可通道区域来解决问题。在每个时间戳,该模型预测与中间或最终可通道区域相对应的分割掩码。我们的工作与VLN中的现有工作形成鲜明对比,VLN的现有工作将该任务置于节点选择问题,并且给定与环境相对应的离散连接图。我们不假定这种离散的地图的可用性。我们的工作朝着动作领域的连续性发展,通过视觉反馈提供了解释性,并允许在需要更精细的操作的命令上进行VLN,例如“两辆汽车之间的停车”。此外,我们提出了一种新型的元数据carla-nav,以允许有效的训练和验证。该数据集包括预录制的培训序列以及用于验证和测试的实时环境。我们提供广泛的定性和定量经验结果,以验证所提出的方法的功效。
translated by 谷歌翻译
人类有自然能够毫不费力地理解语言指挥,如“黄色轿车旁边的公园”,本能地知道车辆的道路的哪个地区应该导航。扩大这种对自主车辆的能力是创建根据人类命令响应和行动的完全自治代理的下一步。为此,我们提出了通过语言命令引用可导航区域(RNR),即导航的接地区域的新任务。 RNR与引用图像分割(RIS)不同,该图像分割(RIS)侧重于自然语言表达式而不是接地导航区域的对象接地。例如,对于指令“黄色轿车旁边的公园,”RIS将旨在分割推荐的轿车,而RNR旨在将建议的停车位分段在道路上分割。我们介绍了一个新的DataSet,talk2car-regseg,它将现有的talk2car数据集扩展,其中包含语言命令描述的区域的分段掩码。提供了一个单独的测试拆分,具有简明的机动指导命令,以评估我们数据集的实用性。我们使用新颖的变换器的架构基准测试所提出的数据集。我们呈现广泛的消融,并在多个评估指标上显示出卓越的性能。基于RNR输出产生轨迹的下游路径规划器确认了所提出的框架的功效。
translated by 谷歌翻译
在这项研究中,我们将人工智力的普遍增强学习(URL)代理模型扩展到量子环境。经典探索随机知识寻求代理,KL-KSA的实用功能是从密度矩阵上量子信息理论的距离措施。量子处理断层扫描(QPT)算法形成了用于建模环境动态的易解的程序。基于基于算法复杂度以及计算资源复杂性的可变成本函数来选择最佳QPT策略。我们而不是提供机器,我们估计了高级语言的成本指标,以允许现实的实验。整个代理设计封装在自我复制Quine中,基于最佳策略选择方案的预测值突变成本函数。因此,具有帕累托 - 最佳QPT政策的多个代理商使用遗传编程而发展,模仿各种资源权衡的物理理论的发展。这一正式框架被称为量子知识寻求代理(QKSA)。尽管其重要性,但很少有量子强化学习模型与量子机器学习中的电流推力相反。 QKSA是类似于古典URL模型的框架的第一个提议。类似于AIXI-TL如何是SOLOMONOFF通用归纳的资源有限的活动版本,QKSA是一个资源有限的参与观察者框架,用于最近提出的基于量子力学的基于量子学的算法的重建。 QKSA可以应用于仿真和研究量子信息理论的方面。具体地,我们证明它可以用于加速量子变分算法,该算法包括断层重建作为其积分子程序。
translated by 谷歌翻译