智能论文笔记

CLICKER: A Computational LInguistics Classification Scheme for Educational Resources

Swapnil Hingmire , Irene Li , Rena Kawamura , Benjamin Chen , Alexander Fabbri , Xiangru Tang , Yixin Liu , Thomas George , Tammy Liao , Wai Pan Wong

分类：自然语言处理

2021-12-16

科学主题的分类方案概述了其知识体系。它还可以用于促进访问研究文章和与受试者相关的其他材料。例如，ACM计算分类系统（CCS）用于ACM数字库搜索界面以及索引计算机科学论文。我们观察到，计算语言学（CL）和自然语言处理（NLP），不存在综合分类系统等CCS或数学主题分类（MSC）。我们提出了一个分类方案 - 基于在这一主题的77个大学课程的在线讲座的分析，Cl / NLP的Clicker。目前拟议的分类学包括334个主题，并侧重于CL / NLP的教育方面;它主要是基于，但不是完全，在NLP课程的讲义中。我们讨论这种分类系统如何帮助各种现实世界应用，包括辅导平台，资源检索，资源推荐，先决条件链学习和调查生成。

translated by 谷歌翻译

DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech

Kazuki Kawamura , Jun Rekimoto

分类：机器学习

2022-12-08

When beginners learn to speak a non-native language, it is difficult for them to judge for themselves whether they are speaking well. Therefore, computer-assisted pronunciation training systems are used to detect learner mispronunciations. These systems typically compare the user's speech with that of a specific native speaker as a model in units of rhythm, phonemes, or words and calculate the differences. However, they require extensive speech data with detailed annotations or can only compare with one specific native speaker. To overcome these problems, we propose a new language learning support system that calculates speech scores and detects mispronunciations by beginners based on a small amount of unannotated speech data without comparison to a specific person. The proposed system uses deep learning--based speech processing to display the pronunciation score of the learner's speech and the difference/distance between the learner's and a group of models' pronunciation in an intuitively visual manner. Learners can gradually improve their pronunciation by eliminating differences and shortening the distance from the model until they become sufficiently proficient. Furthermore, since the pronunciation score and difference/distance are not calculated compared to specific sentences of a particular model, users are free to study the sentences they wish to study. We also built an application to help non-native speakers learn English and confirmed that it can improve users' speech intelligibility.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

A Transfer Learning Pipeline for Educational Resource Discovery with Application in Leading Paragraph Generation

Irene Li , Thomas George , Alexander Fabbri , Tammy Liao , Benjamin Chen , Rina Kawamura , Richard Zhou , Vanessa Yan , Swapnil Hingmire , Dragomir Radev

分类：自然语言处理 | 人工智能

2022-01-07

有效的人类学习取决于广泛的教育材料，与学习者目前对该主题保持一致。虽然互联网彻底改变了人类的学习或教育，但仍存在大量资源可访问性障碍。即，过剩的在线信息可以使其充满努力导航和发现高质量的学习材料。在本文中，我们提出了教育资源发现（ERD）管道，用于为新颖域自动化Web资源发现。管道由三个主要步骤组成：数据收集，功能提取和资源分类。我们从一个已知的源域开始，通过传输学习在两个看不见的目标域上进行资源发现。我们首先从一组种子文档中收集频繁查询并在网上搜索以获取候选资源，例如讲座幻灯片和介绍博客帖子。然后我们介绍一个小说预用信息检索深神经网络模型，查询文件屏蔽语言建模（QD-MLM），以提取这些候选资源的深度特征。我们应用基于树的分类器来决定候选人是否是一个积极的学习资源。当在两个类似但新的靶域评估时，管道在评估时实现0.94和0.82的F1分数。最后，我们展示了该管道如何使应用程序有益于应用：调查的领先段落生成。这是据我们所知，这是考虑各种网络资源的研究。我们还释放了39,728个手动标记的Web资源的语料库，以及来自NLP，计算机视觉（CV）和统计信息（统计数据）的659个查询。

translated by 谷歌翻译

Surfer100: Generating Surveys From Web Resources on Wikipedia-style

Irene Li , Alexander Fabbri , Rina Kawamura , Yixin Liu , Xiangru Tang , Jaesung Tae , Chang Shen , Sally Ma , Tomoe Mizutani , Dragomir Radev

分类：自然语言处理 | 机器学习

2021-12-13

诸如人工智能（AI）之类的快速发展领域经常超过维基百科等百科全书来源的努力，这些来源如不完全介绍最近引入的主题或完全缺乏这种内容。因此，自动产生内容的方法是解决此信息过载的有价值的工具。我们表明，最近的预训练语言建模的进展可以组合为维基百科铅段生成的两级提取和抽象方法。我们扩展了这种方法，以产生更长的维基百科风格的摘要，并通过详细研究100参考人体收集的调查，研究这种方法在本申请中争取如何奋斗。这是利用Web资源利用WEAL Wikipedia风格摘要的第一次研究。

translated by 谷歌翻译