主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2015, Vol. 23 ›› Issue (10): 162-169.doi: 10.16381/j.cnki.issn1003-207x.2015.10.019

• 论文 • 上一篇    下一篇

基于改进GMDH的目标客户选择模型研究

肖进1, 唐静2,3, 刘敦虎4, 谢玲1, 汪寿阳5   

  1. 1. 四川大学商学院, 四川 成都 610064;
    2. 中国科学院大学经济与管理学院, 北京 100190;
    3. 中国科学院虚拟经济与数据科学研究中心, 北京 100190;
    4. 成都信息工程大学管理学院, 四川 成都 610225;
    5. 中国科学院数学与系统科学研究院, 北京 100190
  • 收稿日期:2014-10-30 修回日期:2015-01-08 出版日期:2015-10-20 发布日期:2015-10-24
  • 作者简介:肖进(1983-),男(汉族),四川广安人,四川大学商学院副教授,管理学博士,中国科学院数学与系统科学研究院博士后,研究方向:大数据分析、商务智能、客户关系管理.
  • 基金资助:

    国家自然科学基金资助项目(71471124, 71101100, 71273036);四川省社科规划项目(SC14C019);四川大学优秀青年基金(2013SCU04A08);四川省青年基金项目(2015RZ0056);四川省科技厅基础研究项目(2015JY0022)

Customer Targeting Model Based on Improved GMDH

XIAO Jin1, TANG Jing2,3, LIU Dun-hu4, XIE Ling1, WANG Shou-yang5   

  1. 1. Business School, Sichuan University, Chengdu 610064, China;
    2. School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China;
    3. Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing 100190, China;
    4. Management Faculty, Chengdu University of Information Technology, Chengdu 610225, China;
    5. Academy of Mathematics and System Sciences, Chinese Academy of Sciences, Beijing 100190 China
  • Received:2014-10-30 Revised:2015-01-08 Online:2015-10-20 Published:2015-10-24

摘要: 近年来,目标客户选择建模成为客户关系管理领域的研究热点。为了解决用于目标客户选择建模的训练样本类别分布高度不平衡的问题,本文首先提出了混合抽样方法。进一步地,将数据分组处理(GMDH)神经元网络引入到客户特征选择中,提出新的特征选择算法Log-GMDH。该算法分别从传递函数的选择和新的外准则的构建两个方面对传统GMDH网络模型进行了改进。最后,将提出的混合抽样、Log-GMDH和Logistic回归分类算法相结合,构建目标客户选择模型LogGMDH-Logistic。在CoIL2000预测竞赛中某汽车保险公司的目标客户选择数据集上进行实证分析,结果表明,LogGMDH-Logistic模型不仅在性能上优于已有的一些目标客户选择模型,而且具有很好的可解释性。

关键词: 目标客户选择, GMDH神经元网络, 特征选择, 混合抽样, Logistic回归

Abstract: In recent years, database marketing has become a hot topic in customer relationship management (CRM), and customer targeting modeling is one of the most important issues in database marketing. Essentially, customer targeting modeling is a binary classification problem, that is, all customers are divided into two categories: the customers responding to the corporate marketing activities and the ones responding to no activities. This study combines group method of data handling (GMDH) neural networks, re-sampling technique, as well as Logistic regression classification algorithm to construct customer targeting model LogGMDH-Logistic. This model consists of three phases: (1) In order to solve the highly imbalanced class distribution of training set for customer targeting modeling, a new resampling method (hybrid sampling) is proposed to balance the class distribution of training set; (2) To select some key features from a large number of characteristics describing the customers, the GMDH neural network is introduced and a new feature selection algorithm Log-GMDH is presented, which improves the traditional GMDH neural network model in both the selection of transfer function and the construction of new external criterion. In terms of the selection of transfer function, it uses the non-linear Logistic regression function to replace the linear transfer function of the traditional GMDH neural network; and in the construction of external criterion, it selects the hit rate suitable for the customer targeting modeling to replace the regularization criterion of the traditional GMDH neural network; (3) It obtains the training set by mapping according to the selected feature subset, trains the Logistic regression classification algorithm and predicts the response probability of potential customers. The experiment is carried out in a customer targeting dataset of a car insurance company from CoIL2000 prediction competition, and the results show that LogGMDH-Logistic model is superior to some existing customer targeting models both in performance and interpretability. In CRM, there are a lot of customer classification problems, such as customer churn prediction, customer credit scoring, which are similar to customer targeting modeling. Thus, the model proposed in this study can also be used to solve the above problems, and is expected to achieve satisfaction classification performance.

Key words: customer targeting, GMDH neural network, feature selection, hybrid sampling, logistic regression

中图分类号: