Customer credit scoring is one of the most important issues in customer relationship management (CRM). In some real credit scoring issues, many customer samples without class labels are abandoned and just only a few samples with class labels can be used to train the classification models, because it costs a lot of manpower, financial and material resources for labeling the samples. Furthermore, single classification model is difficult to achieve the accurate classification of the whole sample space as the current customer credit scoring problem with class imbalance characteristic. To solve the two problems, semi-supervised learning is introduced and combined with random subspace (RSS) in multiple classifiers ensemble, and then RSS is proposed based semi-supervised co-training model for class imbalance, RSSCI. This model includes the following three phases: 1) Obtains many base classifiers by RSS; 2) Labels some most appropriate samples in U which obtains lots of samples without class labels. Firstly, 3 base classifiers with the best performance are selected to classify the samples in U, the samples with the same forecasted class are put into the candidate set, and then the label confidence of each sample is calculated. Considering the class imbalance of the training data, the candidate are divided set into the positive and negative subsets, and the samples with higher confidence are selected from the two subsets according to the ratio of two classes in the original training set and added the original training set; 3) Trains the classification model in the final training set, and classifies the test set. Empirical analysis is conducted in three credit scoring datasets (German, Australia, UK-thomas, all of them are imbalanced data sets of a type distribution ; moreover, German and Australia are from the UCI international public database) , and the results show that the performance of RSSCI model is superior to the common used supervised ensemble credit scoring models and some existing semi-supervised CO-training credit scoring models, demonstrating the superiority of the RSSCI model of selective mechanism of labeling samples. In CRM, there are a lot of customer classification problems, such as customer churn prediction, customer targeting, which are similar to customer credit scoring. Thus, the model proposed in this study can also be used to solve the above problems, and thus is expected to achieve satisfaction classification performance.
XIAO Jin, XUE Shu-tian, HUANG Jiing, XIE Ling, GU Xin
. A Semi-Supervised Co-Training Model for Customer Credit Scoring[J]. Chinese Journal of Management Science, 2016
, 24(6)
: 124
-131
.
DOI: 10.16381/j.cnki.issn1003-207x.2016.06.015
[1] Orgler Y E. A credit scoring model for commercial loans[J]. Journal of Money, Credit and Banking, 1970, 2(4):435-445.
[2] 于立勇. 商业银行信用风险评估预测模型研究[J]. 管理科学学报, 2003, 6(5):46-52.
[3] 王春峰, 万海晖. 基于神经网络技术的商业银行信用风险评估[J]. 系统工程理论与实践, 1999, 19(9):24-32.
[4] Premachandra I M, Bhabra G S, Sueyoshi T. DEA as a tool for bankruptcy assessment:A comparative study with logistic regression technique[J]. European Journal of Operational Research, 2009, 193(2):412-424.
[5] 李旭升, 郭春香, 郭耀煌. 扩展的树增强朴素贝叶斯网络信用评估模型[J]. 系统工程理论与实践, 2008, 28(6):129-136.
[6] Laha A. Building contextual classifiers by integrating fuzzy rule based classification technique and k-nn method for credit scoring[J]. Advanced Engineering Informatics, 2007, 21(3):281-291.
[7] 刘京礼, 李建平, 徐伟宣, 等. 信用评估中的鲁棒赋权自适应L_p最小二乘支持向量机方法[J]. 中国管理科学, 2010, 18(5):28-33.
[8] 姚潇, 余乐安. 模糊近似支持向量机模型及其在信用风险评估中的应用[J]. 系统工程理论与实践, 2012, 32(3):549-554.
[9] 吴冲, 夏晗. 基于支持向量机集成的电子商务环境下客户信用评估模型研究[J]. 中国管理科学, 2008, 16(S1):368-373.
[10] 王春峰, 康莉. 基于遗传规划方法的商业银行信用风险评估模型[J]. 系统工程理论与实践, 2001, 21(2):73-79.
[11] Chen Muchen, Huang S H. Credit scoring and rejected instances reassigning through evolutionary computation techniques[J]. Expert Systems with Applications, 2003, 24(4):433-441.
[12] Marqués A I, García V, Sánchez J S. On the suitability of resampling techniques for the class imbalance problem in credit scoring[J]. Journal of the Operational Research Society, 2012, 64(7):1060-1070.
[13] Schwenker F, Trentin E. Pattern classification and clustering:A review of partially supervised learning approaches[J]. Pattern Recognition Letters, 2014, 37(1):4-14.
[14] Sugiyama M, Idé T, Nakajima S, et al. Semi-supervised local Fisher discriminant analysis for dimensionality reduction[J]. Machine Learning, 2010, 78(1-2):35-61.
[15] Zhu Xiaojin. Semi-supervised learning literature survey[J]. Technical Report 1530, University of Wisconsin at Madison, 2006.
[16] Zhang Yihao, Wen Junhao, Wang Xibin, et al. Semi-supervised learning combining co-training with active learning[J]. Expert Systems with Applications, 2014, 41(5):2372-2378.
[17] Yang Tao, Fu Dongmei. Semi-supervised classification with Laplacian multiple kernel learning[J]. Neurocomputing, 2014, 140(9):19-26.
[18] Xiao Jin, He Changzheng, Jiang Xiaoyi, et al. A dynamic classifier ensemble selection approach for noise data[J]. Information Sciences, 2010, 180(18):3402-3421.
[19] Hansen L K, Salamon P. Neural network ensembles[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, 12(10):993-1001.
[20] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training[C]//Proceedings of the Eleventh Annual Conference on Computational Learning Theory, ACM, New York, 1998.
[21] Zhou Zhihua, Li Ming. Tri-training:Exploiting unlabeled data using three classifiers[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11):1529-1541.
[22] 王娇, 罗四维, 曾宪华. 基于随机子空间的半监督协同训练算法[J]. 电子学报, 2008, 36(12):60-65.
[23] 苏艳, 居胜峰, 王中卿, 等. 基于随机特征子空间的半监督情感分类方法研究[J]. 中文信息学报, 2012, 26(4):85-90.
[24] Ho T K. The random subspace method for constructing decision forests[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8):832-844.
[25] Paleologo G, Elisseeff A, Antonini G. Subagging for credit scoring models[J]. European Journal of Operational Research, 2010, 201(2):490-499.
[26] Merz C J, Murphy P. UCI repository of machine learning 820 databases[EB/OL]. 1995, http://www.ics.uci.edu/~mlearn/MLRepository.html.
[27] Thomas L C, Edelman D B, Crook J N. Credit scoring and its applications[M].US:Siam, 2002.
[28] Chen Feilong, Li Fengchia. Combination of feature selection approaches with SVM in credit scoring[J]. Expert Systems with Applications, 2010, 37(7):4902-4909.