通过社会媒体信息预测股票行为已经成为近年来金融和知识管理等领域的研究热点。考虑到社会媒体参与人员和讨论话题的多样性,传统的基于整体层面分析社会媒体信息来预测股票行为的方法过于粗糙。本文根据社会媒体信息在写作风格和内容特征上的不同,利用文本特征提取技术、主成分分析法、EM聚类技术等分析参与社会媒体的干系人和他们关注的话题。进一步,我们针对每类干系人和话题,从信息活动强度和情感倾向两个方面提取四个社会媒体变量构建股票行为的回归预测模型,用以分析各干系人和话题在社会媒体上的活动状况对公司股票行为的影响。最后,本文以雅虎金融论坛的Bank of America板块为实验平台进行实验研究,验证了所提出方法的有效性和实用性。
Predicting stock behavior via social media has attracted a great deal of attentions in the finance and knowledge management disciplines. Due to the diversity of social media participants and discussion topics, it is difficult to improve the accuracy of stock behavior prediction by applying traditional methods which based on whole level of social media information. In this paper, text feature extraction technology, principal components analysis and EM clustering are used to identify stakeholders and topics related to a special firm by social media messages' similar writing style and content feature. Furthermore, four types of social media variable are extracted from information activity intensity and sentiment inclinations to build stock behavior regression models for each stakeholder and topic. Finally, Bank of America Company's message board on Yahoo! Finance forum is chosen as our experimental platform. The validity and practicability of our proposed method are tested in experimental result.
[1] Hansen P R, Lunde A. A forecast comparison of volatility models: Does anything beat a GARCH (1, 1)?[J]. Journal of applied econometrics, 2005, 20(7): 873-889.
[2] Das S R, Chen M Y. Yahoo! for Amazon: Sentiment extraction from small talk on the web[J]. Management Science, 2007, 53(9): 1375-1388.
[3] 朱庆华, 赵宇翔. 信息管理与信息系统研究进展[M]. 武汉: 武汉大学出版社, 2010.
[4] Antweiler W, Frank M Z. Is all that talk just noise? The information content of internet stock message boards[J]. The Journal of Finance, 2004, 59(3): 1259-1294.
[5] Chen H. Smart market and money[J]. IEEE Intell Syst, 2011, 26: 82-96.
[6] Donaldson T, Preston L E. The stakeholder theory of the corporation: Concepts, evidence, and implications[J]. Academy of management Review, 1995, 20(1): 65-91.
[7] Kim W, Jeong O R, Lee S W. On social web sites[J]. Information Systems, 2010, 35(2): 215-236.
[8] Chung W, Chen H, Reid E. Business stakeholder analyzer: An experiment of classifying stakeholders on the Web[J]. Journal of the American Society for Information science and Technology, 2009, 60(1): 59-74.
[9] Zhang Yulei, Dang Yan, Chen H. Gender classification for web forums[J]. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 2011, 41(4): 668-677.
[10] Abbasi A, Chen H, Nunamaker J F. Stylometric identification in electronic markets: Scalability and robustness[J]. Journal of Management Information Systems, 2008, 25(1): 49-78.
[11] Zheng Rong, Li Jiexun, Chen H, et al. A framework for authorship identification of online messages: Writing-style features and classification techniques[J]. Journal of the American Society for Information Science and Technology, 2006, 57(3): 378-393.
[12] Witten I H, Frank E. Data mining: Practical machine learning tools and techniques[M]. Burlington,Massachusetts:Morgan Kaufmann, 2005.
[13] 李涛.我国35个大中城市人力资本投资实证分析[J]. 中国管理科学, 2004, 12 (04): 124-129.
[14] 冯中慧, 鲍军鹏, 沈钧毅.基于EM算法的文本聚类优化研究[J]. 信息与控制 , 2006, 35(05):657-661.
[15] Efron M. Cultural orientation: Classifying subjective documents by cociation analysis[C].Proceedings of AAAI Fall Symposium on Style and Meaning in Language, Art, and Music, washington DC,October 21-24,2004.
[16] Zhang Changli, Zeng D, Li Jiexun, et al. Sentiment analysis of Chinese documents: From sentence to document level[J]. Journal of the American Society for Information Science and Technology, 2009, 60(12): 2474-2487.
[17] Wilson T, Hoffmann P, Somasundaran S, et al. OpinionFinder: A system for subjectivity analysis[C].Proceedings of HLT/EMNLP on Interactive Demonstrations, Association for Computational Linguistics, Vancouver Canada,October 6-8,2005.
[18] Esuli A, Sebastiani F. Sentiwordnet: A publicly available lexical resource for opinion mining[C].Proceedings of LREC,Genoa,Italy,May 22-28,2006.
[19] Pang B, Lee L, Vaithyanathan S. Thumbs up: sentiment classification using machine learing techniques[C]. Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10. Association for Computational Lingguistics, 2002:79-86.
[20] Yu H, Hatzivassiloglou V. Towards answering opinion questions:Separating facts from opinions and identifying the po;arity of opinion sentences [C]. Proceedings of the 2003 conference on Empirical methods in natural language precessing. Association for Computational Linguistics, 2003:129-136.
[21] A.Meena, T. Prabhakar, G.Amati, et al."Sentence Level Sentiment Analysis in the Presence of Conjuncts Using Linguistic Analysis," Advances in Information Retrieveal, vol.4425,Berlin, Heidelberg:Springer Berlin Heidelberg, 2007,pp.573-580.