主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2024, Vol. 32 ›› Issue (6): 140-150.doi: 10.16381/j.cnki.issn1003-207x.2021.1159cstr: 32146.14.j.cnki.issn1003-207x.2021.1159

• • 上一篇    下一篇

基于BERT模型和动态集成选择的多分类文本情感识别研究

张忠良,费秦君,陈愉予,雒兴刚()   

  1. 杭州电子科技大学管理学院,浙江 杭州 310018
  • 收稿日期:2021-06-09 修回日期:2021-08-27 出版日期:2024-06-25 发布日期:2024-07-03
  • 通讯作者: 雒兴刚 E-mail:xgluo@hdu.edu.cn
  • 基金资助:
    国家自然科学基金青年项目(71801065);浙江省哲学社会科学规划课题(21NDJC072YB);浙江省自然科学基金重点项目(LZ20G010001);国家自然科学基金面上项目(71831006)

Researchon Multi-class Sentiment Classification Based on BERT and Dynamic Ensemble Selection

Zhongliang Zhang,Qinjun Fei,Yuyu Chen,Xinggang Luo()   

  1. School of Management,Hangzhou Dianzi University,Hangzhou 310018,China
  • Received:2021-06-09 Revised:2021-08-27 Online:2024-06-25 Published:2024-07-03
  • Contact: Xinggang Luo E-mail:xgluo@hdu.edu.cn

摘要:

针对传统方法提取文本特征向量存在语义缺失,以及有些文本情感识别任务涉及多分类问题,提出一种新的基于BERT(bidirectional encoder representations from transformers)和动态集成选择的多分类文本情感识别策略。首先,采用BERT对文本进行向量化处理,针对多分类文本情感识别任务采用OVO分解策略拆分成多个二分类子任务;其次,针对每个子任务采用动态集成选择策略构建分类器集成模型;最后,基于聚合策略获得最终的预测结果。采用公开的影评数据集对所提出的方法进行实证分析。结果表明:(1)相较于传统的TF-IDF与Word2Vec方法,基于BERT模型的词向量化处理有助于提高文本情感识别精度;(2)针对多分类情感识别任务中的每个子问题,采用动态集成选择策略可以有效提高识别效果;(3)本文建立的预测模型性能比其他现有情感识别模型具有显著优势。

关键词: 文本情感识别, BERT, 多分类, 动态选择集成, 分解策略

Abstract:

To handle semantic deficiency of text feature vector extracted by classic methods and the issue of multi-classsentimentclassification in the text emotion recognition task, a novel multi-class sentiment classification strategy based onBidirectional Encoder Representations from Transformers (BERT) and dynamic ensemble selection (DES) is proposed. First, BERT is used to vectorize the text.Then, the OVO strategy is used to divide the multi-class sentiment classification problem into multiple binary classification sub-problems.Next, the dynamic ensemble selection strategy is developed to construct binary classifier for dealing with each sub-problem.Finally, the final prediction result is obtained based on the aggregation strategy. A public movie review data set is employed to carry out the experimental analysis. The experimental results indicate that(1) the BERT model is helpful in improving the multi-class sentiment classification performancewith respect to these traditional methods, namely TFIDF and Wor2Vec, (2) it is effective to use the DES strategy for dealing with each sub-problem in multi-class sentiment classification, and (3)the performance of the proposed method is also significantlybetter than that of the existing well-known methods for multi-class sentiment analysis.

Key words: text sentiment analysis, BERT, multi-class, dynamic ensemble selection, decomposition strategy

中图分类号: