上下文:在问题跟踪器中报告的问题中对错误的识别对于问题的分类至关重要。机器学习模型已显示出有关自动化问题类型预测的性能的有希望的结果。但是,除了我们的假设如何识别错误之外,我们只有有限的知识。石灰和外形是解释分类器预测的流行技术。目的:我们想了解机器学习模型是否为人类合理的分类提供了解释,并与我们对模型应该学习的知识保持一致。我们还想知道预测质量是否与解释的质量相关。方法:我们进行了一项研究,我们根据解释问题类型预测模型的结果的质量来评估石灰和塑造解释。为此,我们对解释本身的质量进行评分,即,如果它们与我们的期望保持一致,并帮助我们了解基础机器学习模型。
translated by 谷歌翻译
上下文:差分测试是一种有用的方法,它使用相同算法的不同实现,并比较软件测试的结果。近年来,这种方法已成功用于深度学习框架的测试活动。目的:对超出深度学习的差异测试的应用几乎没有知识。在本文中,我们要缩小此差距以进行分类算法。方法:我们使用Scikit-Learn,Weka,Spark Mllib和Caret进行了案例研究,在其中我们通过考虑哪些算法在多个框架中可用,通过识别应识别的算法对应表现出相同的算法来确定差异测试的潜力。行为,以及通过执行确定对的测试并分析偏差来实现的有效性。结果:尽管我们发现流行算法的潜力很大,但可行性似乎有限,因为通常无法确定其他框架中相同的配置。可行测试的执行表明,分数和类别存在很大的偏差。只有基于班级统计意义的宽松方法不会导致大量的测试失败。结论:超出深度学习的差异测试的潜力似乎有限用于研究机器学习库的质量。如果从业人员对实施有深入的了解,则可能仍然使用该方法,尤其是如果仅考虑班级显着差异的粗糙甲骨文就足够了。
translated by 谷歌翻译
本文介绍了一个专家决策支持系统,用于识别时间不变,气动声源类型。该系统包括两个步骤:首先,基于光谱和空间信息计算声学特性。其次,基于这些属性执行群集。群集旨在帮助和指导专家以便快速识别不同的源类型,了解源的差异如何。这支持专家确定类似或不典型的行为。提出了一种用于捕获来源特征的各种特征。这些特征代表了机器和专家可以解释的气动声特性。该特征独立于绝对Mach数,其使得所提出的方法能够以不同的流量配置测量的集群数据。从两个缩放的机身半模型测量评估该方法的解码波束形成数据。对于该示例性数据,所提出的支持系统方法导致大多数对应于作者标识的源类型的集群。群集还为每个群集提供平均特征值和群集层次结构,以及每个集群成员是聚类信心。此附加信息使结果透明并允许专家了解聚类选择。
translated by 谷歌翻译
Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method.
translated by 谷歌翻译
We consider the end-to-end abstract-to-title generation problem, exploring seven recent transformer based models (including ChatGPT) fine-tuned on more than 30k abstract-title pairs from NLP and machine learning venues. As an extension, we also consider the harder problem of generating humorous paper titles. For the latter, we compile the first large-scale humor annotated dataset for scientific papers in the NLP/ML domains, comprising almost 2.5k titles. We evaluate all models using human and automatic metrics. Our human evaluation suggests that our best end-to-end system performs similarly to human authors (but arguably slightly worse). Generating funny titles is more difficult, however, and our automatic systems clearly underperform relative to humans and often learn dataset artefacts of humor. Finally, ChatGPT, without any fine-tuning, performs on the level of our best fine-tuned system.
translated by 谷歌翻译
State-of-the-art poetry generation systems are often complex. They either consist of task-specific model pipelines, incorporate prior knowledge in the form of manually created constraints or both. In contrast, end-to-end models would not suffer from the overhead of having to model prior knowledge and could learn the nuances of poetry from data alone, reducing the degree of human supervision required. In this work, we investigate end-to-end poetry generation conditioned on styles such as rhyme, meter, and alliteration. We identify and address lack of training data and mismatching tokenization algorithms as possible limitations of past attempts. In particular, we successfully pre-train and release ByGPT5, a new token-free decoder-only language model, and fine-tune it on a large custom corpus of English and German quatrains annotated with our styles. We show that ByGPT5 outperforms other models such as mT5, ByT5, GPT-2 and ChatGPT, while also being more parameter efficient and performing favorably compared to humans. In addition, we analyze its runtime performance and introspect the model's understanding of style conditions. We make our code, models, and datasets publicly available.
translated by 谷歌翻译
State-of-the-art machine translation evaluation metrics are based on black-box language models. Hence, recent works consider their explainability with the goals of better understandability for humans and better metric analysis, including failure cases. In contrast, we explicitly leverage explanations to boost the metrics' performance. In particular, we perceive explanations as word-level scores, which we convert, via power means, into sentence-level scores. We combine this sentence-level score with the original metric to obtain a better metric. Our extensive evaluation and analysis across 5 datasets, 5 metrics and 4 explainability techniques shows that some configurations reliably improve the original metrics' correlation with human judgment. On two held datasets for testing, we obtain improvements in 15/18 resp. 4/4 cases. The gains in Pearson correlation are up to 0.032 resp. 0.055. We make our code available.
translated by 谷歌翻译
We apply the vision transformer, a deep machine learning model build around the attention mechanism, on mel-spectrogram representations of raw audio recordings. When adding mel-based data augmentation techniques and sample-weighting, we achieve comparable performance on both (PRS and CCS challenge) tasks of ComParE21, outperforming most single model baselines. We further introduce overlapping vertical patching and evaluate the influence of parameter configurations. Index Terms: audio classification, attention, mel-spectrogram, unbalanced data-sets, computational paralinguistics
translated by 谷歌翻译
We introduce organism networks, which function like a single neural network but are composed of several neural particle networks; while each particle network fulfils the role of a single weight application within the organism network, it is also trained to self-replicate its own weights. As organism networks feature vastly more parameters than simpler architectures, we perform our initial experiments on an arithmetic task as well as on simplified MNIST-dataset classification as a collective. We observe that individual particle networks tend to specialise in either of the tasks and that the ones fully specialised in the secondary task may be dropped from the network without hindering the computational accuracy of the primary task. This leads to the discovery of a novel pruning-strategy for sparse neural networks
translated by 谷歌翻译
Common to all different kinds of recurrent neural networks (RNNs) is the intention to model relations between data points through time. When there is no immediate relationship between subsequent data points (like when the data points are generated at random, e.g.), we show that RNNs are still able to remember a few data points back into the sequence by memorizing them by heart using standard backpropagation. However, we also show that for classical RNNs, LSTM and GRU networks the distance of data points between recurrent calls that can be reproduced this way is highly limited (compared to even a loose connection between data points) and subject to various constraints imposed by the type and size of the RNN in question. This implies the existence of a hard limit (way below the information-theoretic one) for the distance between related data points within which RNNs are still able to recognize said relation.
translated by 谷歌翻译