Prevailing methods for assessing and comparing generative AIs incentivize responses that serve a hypothetical representative individual. Evaluating models in these terms presumes homogeneous preferences across the population and engenders selection of agglomerative AIs, which fail to represent the diverse range of interests across individuals. We propose an alternative evaluation method that instead prioritizes inclusive AIs, which provably retain the requisite knowledge not only for subsequent response customization to particular segments of the population but also for utility-maximizing decisions.
translated by 谷歌翻译
We study the compute-optimal trade-off between model and training data set sizes for large neural networks. Our result suggests a linear relation similar to that supported by the empirical analysis of Chinchilla. While that work studies transformer-based large language models trained on the MassiveText corpus (gopher), as a starting point for development of a mathematical theory, we focus on a simpler learning model and data generating process, each based on a neural network with a sigmoidal output unit and single hidden layer of ReLU activation units. We establish an upper bound on the minimal information-theoretically achievable expected error as a function of model and data set sizes. We then derive allocations of computation that minimize this bound. We present empirical results which suggest that this approximation correctly identifies an asymptotic linear compute-optimal scaling. This approximation can also generate new insights. Among other things, it suggests that, as the input space dimension or latent space complexity grows, as might be the case for example if a longer history of tokens is taken as input to a language model, a larger fraction of the compute budget should be allocated to growing the learning model rather than training data set.
translated by 谷歌翻译
We develop an extension of posterior sampling for reinforcement learning (PSRL) that is suited for a continuing agent-environment interface and integrates naturally into agent designs that scale to complex environments. The approach maintains a statistically plausible model of the environment and follows a policy that maximizes expected $\gamma$-discounted return in that model. At each time, with probability $1-\gamma$, the model is replaced by a sample from the posterior distribution over environments. For a suitable schedule of $\gamma$, we establish an $\tilde{O}(\tau S \sqrt{A T})$ bound on the Bayesian regret, where $S$ is the number of environment states, $A$ is the number of actions, and $\tau$ denotes the reward averaging time, which is a bound on the duration required to accurately estimate the average reward of any policy.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
在过去的十年中,神经网络的成功已将它们确立为许多相关数据生成过程的有效模型。神经网络的统计理论表明样品复杂性的优雅缩放。例如,Joen&van Roy(Arxiv:2203.00246)证明,当带有$ W $参数的Relu教师网络生成数据时,最佳学习者只需要$ \ tilde {o} {o}(w/\ epsilon)$ sample达到预期错误$ \ epsilon $。但是,现有的计算理论表明,即使对于单层层教师网络,为了达到所有此类教师网络的小错误,实现此样本复杂性所需的计算也很棘手。在这项工作中,我们将单层神经网络拟合到由单层层的relu教师网络生成的数据,该网络具有从自然分布中绘制的参数。我们证明,具有自动宽度选择的随机梯度下降(SGD)达到了预期误差小的较小的预期误差,许多样本和查询总数几乎在输入维度和宽度中几乎是线性的。这表明SGD几乎以计算上有效的方式实现了Joen&van Roy(Arxiv:2203.00246)的信息理论样品复杂性界限。我们的积极经验结果与负理论结果之间的一个重要区别在于,后者解决了确定性算法的最坏情况误差,而我们的分析集中在随机算法的预期误差上。
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
在机器学习中,代理需要估计不确定性,以有效地探索和适应并做出有效的决策。不确定性估计的一种常见方法维护了模型的合奏。近年来,已经提出了几种用于培训合奏的方法,并且在这些方法的各种成分的重要性方面占上风。在本文中,我们旨在解决已受到质疑的两种成分的好处 - 先前的功能和引导。我们表明,先前的功能可以显着改善整体代理在输入之间的关节预测,如果信噪比在输入之间有所不同,则引导程序为其他好处提供了额外的好处。我们的主张是通过理论和实验结果证明的。
translated by 谷歌翻译
Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to nonstationary environments. We show that such failures are attributed to the fact that, when exploring, the algorithm does not differentiate actions based on how quickly the information acquired loses its usefulness due to nonstationarity. Building upon this insight, we propose predictive sampling, which extends Thompson sampling to do this. We establish a Bayesian regret bound and establish that, in nonstationary bandit environments, the regret incurred by Thompson sampling can far exceed that of predictive sampling. We also present implementations of predictive sampling that scale to complex bandit environments of practical interest in a computationally tractable manner. Through simulations, we demonstrate that predictive sampling outperforms Thompson sampling and other state-of-the-art algorithms across a wide range of nonstationary bandit environments.
translated by 谷歌翻译
translated by 谷歌翻译