智能论文笔记

CEFER: A Four Facets Framework based on Context and Emotion embedded features for Implicit and Explicit Emotion Recognition

Fereshteh Khoshnam , Ahmad Baraani-Dastjerdi , M. J. Liaghatdar

分类：自然语言处理

2022-09-28

人们的行为和反应是由他们的情绪驱动的。在线社交媒体正在成为以书面形式表达情感的绝佳工具。注意上下文和整个句子，帮助我们从文本中检测到情感。但是，这种观点抑制了我们注意文本中的一些情感单词或短语，尤其是当单词隐含地而不是明确地表达情感时。另一方面，仅关注单词并忽略上下文会导致对句子含义和感觉的扭曲理解。在本文中，我们提出了一个框架，该框架分析句子和单词级别的文本。我们将其命名为CEFER（情感识别的上下文和情感嵌入式框架）。我们的四个方法是通过同时考虑整个句子和每个单词以及隐式和明确的情绪来提取数据。从这些数据中获得的知识不仅减轻了前面方法中缺陷的影响，而且还可以增强特征向量。我们使用BERT家族评估几个功能空间，并根据其设计CEFER。 CEFER将每个单词的情感向量（包括明确和隐性情绪）与基于上下文的每个单词的特征向量相结合。 CEFER的表现比Bert家族更好。实验结果表明，识别隐性情绪比检测明确的情绪更具挑战性。 CEFER，提高了隐性情绪识别的准确性。根据结果，CEFER在识别明确的情绪和隐性中的3％方面的表现要比BERT家族好5％。

translated by 谷歌翻译

Computational Sarcasm Analysis on Social Media: A Systematic Review

Faria Binte Kader , Nafisa Hossain Nujat , Tasmia Binte Sogir , Mohsinul Kabir , Hasan Mahmud , Kamrul Hasan

分类：自然语言处理

2022-09-13

讽刺可以被定义为说或写讽刺与一个人真正想表达的相反，通常是为了侮辱，刺激或娱乐某人。由于文本数据中讽刺性的性质晦涩难懂，因此检测到情感分析研究社区的困难和非常感兴趣。尽管讽刺检测的研究跨越了十多年，但最近已经取得了一些重大进步，包括在多模式环境中采用了无监督的预训练的预训练的变压器，并整合了环境以识别讽刺。在这项研究中，我们旨在简要概述英语计算讽刺研究的最新进步和趋势。我们描述了与讽刺有关的相关数据集，方法，趋势，问题，挑战和任务，这些数据集，趋势，问题，挑战和任务是无法检测到的。我们的研究提供了讽刺数据集，讽刺特征及其提取方法以及各种方法的性能分析，这些表可以帮助相关领域的研究人员了解当前的讽刺检测中最新实践。

translated by 谷歌翻译

Improving the Generalizability of Text-Based Emotion Detection by Leveraging Transformers with Psycholinguistic Features

Sourabh Zanwar , Daniel Wiechmann , Yu Qiao , Elma Kerz

分类：自然语言处理

2022-12-19

In recent years, there has been increased interest in building predictive models that harness natural language processing and machine learning techniques to detect emotions from various text sources, including social media posts, micro-blogs or news articles. Yet, deployment of such models in real-world sentiment and emotion applications faces challenges, in particular poor out-of-domain generalizability. This is likely due to domain-specific differences (e.g., topics, communicative goals, and annotation schemes) that make transfer between different models of emotion recognition difficult. In this work we propose approaches for text-based emotion detection that leverage transformer models (BERT and RoBERTa) in combination with Bidirectional Long Short-Term Memory (BiLSTM) networks trained on a comprehensive set of psycholinguistic features. First, we evaluate the performance of our models within-domain on two benchmark datasets: GoEmotion and ISEAR. Second, we conduct transfer learning experiments on six datasets from the Unified Emotion Dataset to evaluate their out-of-domain robustness. We find that the proposed hybrid models improve the ability to generalize to out-of-distribution data compared to a standard transformer-based approach. Moreover, we observe that these models perform competitively on in-domain data.

translated by 谷歌翻译

DeepEmotex: Classifying Emotion in Text Messages using Deep Transfer Learning

Maryam Hasan , Elke Rundensteiner , Emmanuel Agu

分类：机器学习

2022-06-12

转移学习已通过深度审慎的语言模型广泛用于自然语言处理，例如来自变形金刚和通用句子编码器的双向编码器表示。尽管取得了巨大的成功，但语言模型应用于小型数据集时会过多地适合，并且很容易忘记与分类器进行微调时。为了解决这个忘记将深入的语言模型从一个域转移到另一个领域的问题，现有的努力探索了微调方法，以减少忘记。我们建议DeepeMotex是一种有效的顺序转移学习方法，以检测文本中的情绪。为了避免忘记问题，通过从Twitter收集的大量情绪标记的数据来仪器进行微调步骤。我们使用策划的Twitter数据集和基准数据集进行了一项实验研究。 DeepeMotex模型在测试数据集上实现多级情绪分类的精度超过91％。我们评估了微调DeepeMotex模型在分类Emoint和刺激基准数据集中的情绪时的性能。这些模型在基准数据集中的73％的实例中正确分类了情绪。所提出的DeepeMotex-Bert模型优于BI-LSTM在基准数据集上的BI-LSTM增长23％。我们还研究了微调数据集的大小对模型准确性的影响。我们的评估结果表明，通过大量情绪标记的数据进行微调提高了最终目标任务模型的鲁棒性和有效性。

translated by 谷歌翻译

A Comprehensive Review of Visual-Textual Sentiment Analysis from Social Media Networks

Israa Khalaf Salman Al-Tameemi , Mohammad-Reza Feizi-Derakhshi , Saeed Pashazadeh , Mohammad Asadpour

分类：自然语言处理 | 人工智能

2022-07-05

社交媒体网络已成为人们生活的重要方面，它是其思想，观点和情感的平台。因此，自动化情绪分析（SA）对于以其他信息来源无法识别人们的感受至关重要。对这些感觉的分析揭示了各种应用，包括品牌评估，YouTube电影评论和医疗保健应用。随着社交媒体的不断发展，人们以不同形式发布大量信息，包括文本，照片，音频和视频。因此，传统的SA算法已变得有限，因为它们不考虑其他方式的表现力。通过包括来自各种物质来源的此类特征，这些多模式数据流提供了新的机会，以优化基于文本的SA之外的预期结果。我们的研究重点是多模式SA的最前沿领域，该领域研究了社交媒体网络上发布的视觉和文本数据。许多人更有可能利用这些信息在这些平台上表达自己。为了作为这个快速增长的领域的学者资源，我们介绍了文本和视觉SA的全面概述，包括数据预处理，功能提取技术，情感基准数据集以及适合每个字段的多重分类方法的疗效。我们还简要介绍了最常用的数据融合策略，并提供了有关Visual Textual SA的现有研究的摘要。最后，我们重点介绍了最重大的挑战，并调查了一些重要的情感应用程序。

translated by 谷歌翻译

Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model

Hind Saleh , Areej Alhothali , Kawthar Moria

分类：自然语言处理

2021-11-02

在网络和社交媒体上生成的大量数据增加了检测在线仇恨言论的需求。检测仇恨言论将减少它们对他人的负面影响和影响。在自然语言处理（NLP）域中的许多努力旨在宣传仇恨言论或检测特定的仇恨言论，如宗教，种族，性别或性取向。讨厌的社区倾向于使用缩写，故意拼写错误和他们的沟通中的编码词来逃避检测，增加了讨厌语音检测任务的更多挑战。因此，词表示将在检测仇恨言论中发挥越来越关的作用。本文研究了利用基于双向LSTM的深度模型中嵌入的域特定词语的可行性，以自动检测/分类仇恨语音。此外，我们调查转移学习语言模型（BERT）对仇恨语音问题作为二进制分类任务。实验表明，与双向LSTM基于LSTM的深层模型嵌入的域特异性词嵌入了93％的F1分数，而BERT在可用仇恨语音数据集中的组合平衡数据集上达到了高达96％的F1分数。

translated by 谷歌翻译

ArmanEmo: A Persian Dataset for Text-based Emotion Detection

Hossein Mirzaee , Javad Peymanfard , Hamid Habibzadeh Moshtaghin , Hossein Zeinali

分类：自然语言处理 | 人工智能

2022-07-24

随着社交媒体平台上的开放文本数据的最新扩散，在过去几年中，文本的情感检测（ED）受到了更多关注。它有许多应用程序，特别是对于企业和在线服务提供商，情感检测技术可以通过分析客户/用户对产品和服务的感受来帮助他们做出明智的商业决策。在这项研究中，我们介绍了Armanemo，这是一个标记为七个类别的7000多个波斯句子的人类标记的情感数据集。该数据集是从不同资源中收集的，包括Twitter，Instagram和Digikala（伊朗电子商务公司）的评论。标签是基于埃克曼（Ekman）的六种基本情感（愤怒，恐惧，幸福，仇恨，悲伤，奇迹）和另一个类别（其他），以考虑Ekman模型中未包含的任何其他情绪。除数据集外，我们还提供了几种基线模型，用于情绪分类，重点是最新的基于变压器的语言模型。我们的最佳模型在我们的测试数据集中达到了75.39％的宏观平均得分。此外，我们还进行了转移学习实验，以将我们提出的数据集的概括与其他波斯情绪数据集进行比较。这些实验的结果表明，我们的数据集在现有的波斯情绪数据集中具有较高的概括性。 Armanemo可在https://github.com/arman-rayan-sharif/arman-text-emotion上公开使用。

translated by 谷歌翻译

Emotion Detection From Tweets Using a BERT and SVM Ensemble Model

Ionuţ-Alexandru Albu , Stelian Spînu

分类：自然语言处理

2022-08-09

在Twitter数据中表达的情绪的自动识别具有广泛的应用。我们通过将中性类添加到一个由四种情绪组成的基准数据集中添加中性类来创建一个均衡的数据集：恐惧，悲伤，喜悦和愤怒。在此扩展数据集上，我们研究了来自变压器（BERT）的支持向量机（SVM）和双向编码器表示情感识别的使用。我们通过组合两个BERT和SVM模型来提出一种新颖的合奏模型。实验表明，所提出的模型在推文中的情绪识别方面达到了0.91的最新精度。

translated by 谷歌翻译

Emotion Analysis using Multi-Layered Networks for Graphical Representation of Tweets

Anna Nguyen , Antonio Longa , Massimiliano Luca , Joe Kaul , Gabriel Lopez

分类：人工智能

2022-07-02

预期观众对某些文本的反应是社会的几个方面不可或缺的，包括政治，研究和商业行业。情感分析（SA）是一种有用的自然语言处理（NLP）技术，它利用词汇/统计和深度学习方法来确定不同尺寸的文本是否表现出正面，负面或中立的情绪。但是，目前缺乏工具来分析独立文本的组并从整体中提取主要情感。因此，当前的论文提出了一种新型算法，称为多层推文分析仪（MLTA），该算法使用多层网络（MLN）以图形方式对社交媒体文本进行了图形方式，以便更好地编码跨越独立的推文集的关系。与其他表示方法相比，图结构能够捕获复杂生态系统中有意义的关系。最先进的图形神经网络（GNN）用于从Tweet-MLN中提取信息，并根据提取的图形特征进行预测。结果表明，与标准的正面，负或中性相比，MLTA不仅可以从更大的可能情绪中预测，从而提供了更准确的情感，还允许对Twitter数据进行准确的组级预测。

translated by 谷歌翻译

Text-based automatic personality prediction: A bibliographic review

Ali-Reza Feizi-Derakhshi , Mohammad-Reza Feizi-Derakhshi , Majid Ramezani , Narjes Nikzad-Khasmakhi , Meysam Asgari-Chenaghlu , Taymaz Akan , Mehrdad Ranjbar-Khadivi , Elnaz Zafarni-Moattar , Zoleikha Jahanbakhsh-Naghadeh

分类：自然语言处理 | 人工智能

2021-10-04

人格检测是心理学和自动人格预测（或感知）（APP）的一个古老话题，是对不同类型的人类生成/交换内容（例如文本，语音，图像，视频，视频）对个性的自动化（计算）预测。这项研究的主要目的是自2010年以来对应用程序的自然语言处理方法进行浅（总体）审查。随着深度学习的出现并遵循NLP的转移学习和预先培训的模型，应用程序研究领域已经成为一个热门话题，因此在这篇评论中，方法分为三个；预先训练的独立，预训练的基于模型的多模式方法。此外，为了获得全面的比较，数据集为报告的结果提供了信息。

translated by 谷歌翻译

Troll Tweet Detection Using Contextualized Word Representations

Seyhmus Yilmaz , Sultan Zavrak

分类：自然语言处理 | 人工智能

2022-07-17

近年来，已经出现了许多巨魔帐户来操纵社交媒体的意见。对于社交网络平台而言，检测和消除巨魔是一个关键问题，因为企业，滥用者和民族国家赞助的巨魔农场使用虚假和自动化的帐户。 NLP技术用于从社交网络文本中提取数据，例如Twitter推文。在许多文本处理应用程序中，诸如BERT之类的单词嵌入表示方法的执行效果要好于先前的NLP技术，从而为各种任务提供了新颖的突破，以精确理解和分类社交网络工作信息。本文实施并比较了九个基于深度学习的巨魔推文检测体系结构，每个bert，elmo和手套词嵌入模型的三个模型。精度，召回，F1分数，AUC和分类精度用于评估每个体系结构。从实验结果中，大多数使用BERT模型的架构改进了巨魔推文检测。具有GRU分类器的基于自定义的基于ELMO的体系结构具有检测巨魔消息的最高AUC。所提出的体系结构可以由各种基于社会的系统用于未来检测巨魔消息。

translated by 谷歌翻译

Survey of Generative Methods for Social Media Analysis

Stan Matwin , Aristides Milios , Paweł Prałat , Amilcar Soares , François Théberge

分类：机器学习

2021-12-13

本次调查绘制了用于分析社交媒体数据的生成方法的研究状态的广泛的全景照片（Sota）。它填补了空白，因为现有的调查文章在其范围内或被约会。我们包括两个重要方面，目前正在挖掘和建模社交媒体的重要性：动态和网络。社会动态对于了解影响影响或疾病的传播，友谊的形成，友谊的形成等，另一方面，可以捕获各种复杂关系，提供额外的洞察力和识别否则将不会被注意的重要模式。

translated by 谷歌翻译

ReDDIT: Regret Detection and Domain Identification from Text

Fazlourrahman Balouchzahi , Sabur Butt , Grigori Sidorov , Alexander Gelbukh

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-14

In this paper, we present a study of regret and its expression on social media platforms. Specifically, we present a novel dataset of Reddit texts that have been classified into three classes: Regret by Action, Regret by Inaction, and No Regret. We then use this dataset to investigate the language used to express regret on Reddit and to identify the domains of text that are most commonly associated with regret. Our findings show that Reddit users are most likely to express regret for past actions, particularly in the domain of relationships. We also found that deep learning models using GloVe embedding outperformed other models in all experiments, indicating the effectiveness of GloVe for representing the meaning and context of words in the domain of regret. Overall, our study provides valuable insights into the nature and prevalence of regret on social media, as well as the potential of deep learning and word embeddings for analyzing and understanding emotional language in online text. These findings have implications for the development of natural language processing algorithms and the design of social media platforms that support emotional expression and communication.

translated by 谷歌翻译

Incorporating Emotions into Health Mention Classification Task on Social Media

Olanrewaju Tahir Aduragba , Jialin Yu , Alexandra I. Cristea

分类：自然语言处理 | 机器学习

2022-12-09

The health mention classification (HMC) task is the process of identifying and classifying mentions of health-related concepts in text. This can be useful for identifying and tracking the spread of diseases through social media posts. However, this is a non-trivial task. Here we build on recent studies suggesting that using emotional information may improve upon this task. Our study results in a framework for health mention classification that incorporates affective features. We present two methods, an intermediate task fine-tuning approach (implicit) and a multi-feature fusion approach (explicit) to incorporate emotions into our target task of HMC. We evaluated our approach on 5 HMC-related datasets from different social media platforms including three from Twitter, one from Reddit and another from a combination of social media sources. Extensive experiments demonstrate that our approach results in statistically significant performance gains on HMC tasks. By using the multi-feature fusion approach, we achieve at least a 3% improvement in F1 score over BERT baselines across all datasets. We also show that considering only negative emotions does not significantly affect performance on the HMC task. Additionally, our results indicate that HMC models infused with emotional knowledge are an effective alternative, especially when other HMC datasets are unavailable for domain-specific fine-tuning. The source code for our models is freely available at https://github.com/tahirlanre/Emotion_PHM.

translated by 谷歌翻译

Multi-task Learning for Personal Health Mention Detection on Social Media

Olanrewaju Tahir Aduragba , Jialin Yu , Alexandra I. Cristea

分类：自然语言处理 | 人工智能

2022-12-09

Detecting personal health mentions on social media is essential to complement existing health surveillance systems. However, annotating data for detecting health mentions at a large scale is a challenging task. This research employs a multitask learning framework to leverage available annotated data from a related task to improve the performance on the main task to detect personal health experiences mentioned in social media texts. Specifically, we focus on incorporating emotional information into our target task by using emotion detection as an auxiliary task. Our approach significantly improves a wide range of personal health mention detection tasks compared to a strong state-of-the-art baseline.

translated by 谷歌翻译

Knowledge Graph-Enabled Text-Based Automatic Personality Prediction

Majid Ramezani , Mohammad-Reza Feizi-Derakhshi , Mohammad-Ali Balafar

分类：自然语言处理 | 人工智能

2022-03-17

人们如何思考，感受和行为，主要是对其人格特征的代表。通过意识到我们正在与之打交道或决定处理的个人的个性特征，无论其类型如何，人们都可以胜任地改善这种关系。随着基于互联网的通信基础架构（社交网络，论坛等）的兴起，那里发生了相当多的人类通信。这种交流中最突出的工具是以书面和口语形式的语言，可以忠实地编码个人的所有基本人格特征。基于文本的自动人格预测（APP）是基于生成/交换的文本内容的个人个性的自动预测。本文提出了一种基于文本的应用程序的新型知识的方法，该方法依赖于五大人格特征。为此，给定文本，知识图是一组相互联系的概念描述，是通过将输入文本的概念与DBPEDIA知识基础条目匹配的。然后，由于实现了更强大的表示，该图被DBPEDIA本体论，NRC情感强度词典和MRC心理语言数据库信息丰富。之后，现在是输入文本的知识渊博的替代方案的知识图被嵌入以产生嵌入矩阵。最后，为了执行人格预测，将最终的嵌入矩阵喂入四个建议的深度学习模型，这些模型基于卷积神经网络（CNN），简单的复发性神经网络（RNN），长期短期记忆（LSTM）和双向长短短短术语内存（Bilstm）。结果表明，所有建议的分类器中的预测准确度有了显着改善。

translated by 谷歌翻译

Contextual Sentence Analysis for the Sentiment Prediction on Financial Data

Elvys Linhares Pontes , Mohamed Benjannet

分类：自然语言处理 | 机器学习

2021-12-27

通讯和社交网络可以从分析师和公众提供公司提供的产品和/或服务的角度来反映市场和特定股票的意见。因此，这些文本的情感分析可以提供有用的信息，以帮助投资者在市场上进行贸易。在本文中，建议通过预测-1和+1之间的范围内的分数（数据类型Rime）来确定与公司和股票相关的情绪。具体而言，我们精细调整了罗伯塔模型来处理头条和微博，并将其与其他变压器层组合，以处理与情绪词典的句子分析，以改善情绪分析。我们在Semeval-2017任务5发布的财务数据上进行了评估，我们的命题优于Semeval-2017任务5和强基线的最佳系统。实际上，与财务和一般情绪词典的上下文句子分析的组合为我们的模型提供了有用的信息，并允许它产生更可靠的情感分数。

translated by 谷歌翻译

PolyHope: Two-Level Hope Speech Detection from Tweets

Fazlourrahman Balouchzahi , Grigori Sidorov , Alexander Gelbukh

分类：自然语言处理 | 人工智能 | 机器学习

2022-10-25

Hope is characterized as openness of spirit toward the future, a desire, expectation, and wish for something to happen or to be true that remarkably affects human's state of mind, emotions, behaviors, and decisions. Hope is usually associated with concepts of desired expectations and possibility/probability concerning the future. Despite its importance, hope has rarely been studied as a social media analysis task. This paper presents a hope speech dataset that classifies each tweet first into "Hope" and "Not Hope", then into three fine-grained hope categories: "Generalized Hope", "Realistic Hope", and "Unrealistic Hope" (along with "Not Hope"). English tweets in the first half of 2022 were collected to build this dataset. Furthermore, we describe our annotation process and guidelines in detail and discuss the challenges of classifying hope and the limitations of the existing hope speech detection corpora. In addition, we reported several baselines based on different learning approaches, such as traditional machine learning, deep learning, and transformers, to benchmark our dataset. We evaluated our baselines using weighted-averaged and macro-averaged F1-scores. Observations show that a strict process for annotator selection and detailed annotation guidelines enhanced the dataset's quality. This strict annotation process resulted in promising performance for simple machine learning classifiers with only bi-grams; however, binary and multiclass hope speech detection results reveal that contextual embedding models have higher performance in this dataset.

translated by 谷歌翻译

Improving Sentiment Analysis By Emotion Lexicon Approach on Vietnamese Texts

An Long Doan , Son T. Luu

分类：自然语言处理

2022-10-05

The sentiment analysis task has various applications in practice. In the sentiment analysis task, words and phrases that represent positive and negative emotions are important. Finding out the words that represent the emotion from the text can improve the performance of the classification models for the sentiment analysis task. In this paper, we propose a methodology that combines the emotion lexicon with the classification model to enhance the accuracy of the models. Our experimental results show that the emotion lexicon combined with the classification model improves the performance of models.

translated by 谷歌翻译

Domain Adaptation of Transformer-Based Models using Unlabeled Data for Relevance and Polarity Classification of German Customer Feedback

Ahmad Idrissi-Yaghir , Henning Schäfer , Nadja Bauer , Christoph M. Friedrich

分类：自然语言处理 | 机器学习

2022-12-12

Understanding customer feedback is becoming a necessity for companies to identify problems and improve their products and services. Text classification and sentiment analysis can play a major role in analyzing this data by using a variety of machine and deep learning approaches. In this work, different transformer-based models are utilized to explore how efficient these models are when working with a German customer feedback dataset. In addition, these pre-trained models are further analyzed to determine if adapting them to a specific domain using unlabeled data can yield better results than off-the-shelf pre-trained models. To evaluate the models, two downstream tasks from the GermEval 2017 are considered. The experimental results show that transformer-based models can reach significant improvements compared to a fastText baseline and outperform the published scores and previous models. For the subtask Relevance Classification, the best models achieve a micro-averaged $F1$-Score of 96.1 % on the first test set and 95.9 % on the second one, and a score of 85.1 % and 85.3 % for the subtask Polarity Classification.

translated by 谷歌翻译