One of the most successful paradigms for reward learning uses human feedback in the form of comparisons. Although these methods hold promise, human comparison labeling is expensive and time consuming, constituting a major bottleneck to their broader applicability. Our insight is that we can greatly improve how effectively human time is used in these approaches by batching comparisons together, rather than having the human label each comparison individually. To do so, we leverage data dimensionality-reduction and visualization techniques to provide the human with a interactive GUI displaying the state space, in which the user can label subportions of the state space. Across some simple Mujoco tasks, we show that this high-level approach holds promise and is able to greatly increase the performance of the resulting agents, provided the same amount of human labeling time.
translated by 谷歌翻译
Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the FlexiBiT framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks. We show that a single FlexiBiT model is simultaneously capable of carrying out many tasks with performance similar to or better than specialized models. Additionally, we show that performance can be further improved by fine-tuning our general model on specific tasks of interest.
translated by 谷歌翻译
推荐系统(RS)向用户显示的内容会影响他们。 Therefore, when choosing a recommender to deploy, one is implicitly also choosing to induce specific internal states in users.更重要的是,通过长匹马优化培训的系统将有直接的激励措施来操纵用户:在这项工作中,我们专注于转移用户偏好的动力,因此他们更容易满足。我们认为 - 在部署之前 - 系统设计师应:估计推荐人会引起的转变;评估这种转变是否是不受欢迎的;也许甚至可以积极优化以避免有问题的转变。这些步骤涉及两种具有挑战性的成分:估算需要预测假设算法如何影响用户偏好,如果部署 - 我们通过使用历史用户交互数据来训练隐含其偏好动态的预测用户模型来实现此操作;评估和优化另外需要指标来评估这种影响是操纵还是其他不必要的 - 我们使用“安全转移”的概念,该概念定义了行为安全的信任区域:例如,用户无需移动的自然方式而无需使用系统的干扰可以被视为“安全”。在模拟实验中,我们表明我们学习的偏好动力学模型可有效估计用户偏好以及它们如何对新推荐人的反应。此外,我们表明,在信托区域中优化的推荐人可以避免在仍在产生参与的同时避免操纵行为。
translated by 谷歌翻译
Vision transformers (ViTs) are quickly becoming the de-facto architecture for computer vision, yet we understand very little about why they work and what they learn. While existing studies visually analyze the mechanisms of convolutional neural networks, an analogous exploration of ViTs remains challenging. In this paper, we first address the obstacles to performing visualizations on ViTs. Assisted by these solutions, we observe that neurons in ViTs trained with language model supervision (e.g., CLIP) are activated by semantic concepts rather than visual features. We also explore the underlying differences between ViTs and CNNs, and we find that transformers detect image background features, just like their convolutional counterparts, but their predictions depend far less on high-frequency information. On the other hand, both architecture types behave similarly in the way features progress from abstract patterns in early layers to concrete objects in late layers. In addition, we show that ViTs maintain spatial information in all layers except the final layer. In contrast to previous works, we show that the last layer most likely discards the spatial information and behaves as a learned global pooling operation. Finally, we conduct large-scale visualizations on a wide range of ViT variants, including DeiT, CoaT, ConViT, PiT, Swin, and Twin, to validate the effectiveness of our method.
translated by 谷歌翻译
Cutting-edge diffusion models produce images with high quality and customizability, enabling them to be used for commercial art and graphic design purposes. But do diffusion models create unique works of art, or are they stealing content directly from their training sets? In this work, we study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated. Applying our frameworks to diffusion models trained on multiple datasets including Oxford flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training set size impact rates of content replication. We also identify cases where diffusion models, including the popular Stable Diffusion model, blatantly copy from their training data.
translated by 谷歌翻译
Reservoir computing is a recurrent neural network paradigm in which only the output layer is trained. Recently, it was demonstrated that adding time-shifts to the signals generated by a reservoir can provide large improvements in performance accuracy. In this work, we present a technique to choose the optimal time shifts. Our technique maximizes the rank of the reservoir matrix using a rank-revealing QR algorithm and is not task dependent. Further, our technique does not require a model of the system, and therefore is directly applicable to analog hardware reservoir computers. We demonstrate our time-shift optimization technique on two types of reservoir computer: one based on an opto-electronic oscillator and the traditional recurrent network with a $tanh$ activation function. We find that our technique provides improved accuracy over random time-shift selection in essentially all cases.
translated by 谷歌翻译
Deep neural networks are susceptible to shortcut learning, using simple features to achieve low training loss without discovering essential semantic structure. Contrary to prior belief, we show that generative models alone are not sufficient to prevent shortcut learning, despite an incentive to recover a more comprehensive representation of the data than discriminative approaches. However, we observe that shortcuts are preferentially encoded with minimal information, a fact that generative models can exploit to mitigate shortcut learning. In particular, we propose Chroma-VAE, a two-pronged approach where a VAE classifier is initially trained to isolate the shortcut in a small latent subspace, allowing a secondary classifier to be trained on the complementary, shortcut-free latent subspace. In addition to demonstrating the efficacy of Chroma-VAE on benchmark and real-world shortcut learning tasks, our work highlights the potential for manipulating the latent space of generative classifiers to isolate or interpret specific correlations.
translated by 谷歌翻译
标准扩散模型涉及图像变换 - 添加高斯噪声 - 以及逆转此降解的图像恢复操作员。我们观察到,扩散模型的生成行为并不是很大程度上取决于图像降解的选择,实际上,可以通过改变这种选择来构建整个生成模型家族。即使使用完全确定性的降解(例如,模糊,掩蔽等),培训和测试时间更新规则是基于扩散模型的培训和测试时间更新规则,可以轻松地概括为创建生成模型。这些完全确定的模型的成功使社区对扩散模型的理解质疑,这依赖于梯度Langevin动力学或变异推理中的噪声,并为反转任意过程的广义扩散模型铺平了道路。我们的代码可从https://github.com/arpitbansal297/cold-diffusion-models获得
translated by 谷歌翻译
对表格数据的深度学习的最新工作表明了深层表格模型的强劲表现,通常会弥合梯度增强的决策树和神经网络之间的差距。除了准确性之外,神经模型的主要优点是它们学习可重复使用的功能,并且在新域中很容易进行微调。该属性通常在计算机视觉和自然语言应用中被利用,在特定于任务的培训数据稀缺时,转移学习是必不可少的。在这项工作中,我们证明上游数据使表格神经网络比广泛使用的GBDT模型具有决定性的优势。我们为表格转移学习提出了一个现实的医学诊断基准,并提出了使用上游数据来通过各种表格神经网络体系结构来提高性能的方法指南。最后,我们为上游和下游特征集不同的情况提出了一种伪特征方法,在现实世界中,特定于表格的问题广泛。我们的代码可在https://github.com/levinroman/tabular-transfer-learning上找到。
translated by 谷歌翻译
新的天文任务通常与已经收集的标签的早期任务有关。我们将对比度框架BYOL调整为利用这些标签作为预处理的任务,同时还可以增强不变性。对于大规模预处理,我们介绍了GZ-EVO V0.1,这是552K星系图像的9650万志愿者响应,再加上另外134万个可比较的未标记星系。206 GZ-EVO答案中的大多数对于任何给定的星系都不为人所知,因此我们的预读任务使用了自然处理未知答案的差异损失。在有或没有混合学习的情况下,GZ-EVO预训练即使有很多下游标签(44K标签的精度为+4%)也可以改善直接训练。我们的混合预处理/对比方法进一步提高了下游准确性,而对比度学习或对比度学习,尤其是在低标签转移方案中(具有750个标签的6%精度)。
translated by 谷歌翻译