Predicting stock behavior via social media has attracted a great deal of attentions in the finance and knowledge management disciplines. Due to the diversity of social media participants and discussion topics, it is difficult to improve the accuracy of stock behavior prediction by applying traditional methods which based on whole level of social media information. In this paper, text feature extraction technology, principal components analysis and EM clustering are used to identify stakeholders and topics related to a special firm by social media messages' similar writing style and content feature. Furthermore, four types of social media variable are extracted from information activity intensity and sentiment inclinations to build stock behavior regression models for each stakeholder and topic. Finally, Bank of America Company's message board on Yahoo! Finance forum is chosen as our experimental platform. The validity and practicability of our proposed method are tested in experimental result.
JIANG Cui-qing, LIANG Kun, DING Yong, LIU Shi-xi, LIU Yao
. Predicting Stock Behaviorvia Social Media[J]. Chinese Journal of Management Science, 2015
, 23(1)
: 17
-24
.
DOI: 10.16381/j.cnki.issn1003-207x.2015.01.003
[1] Hansen P R, Lunde A. A forecast comparison of volatility models: Does anything beat a GARCH (1, 1)?[J]. Journal of applied econometrics, 2005, 20(7): 873-889.
[2] Das S R, Chen M Y. Yahoo! for Amazon: Sentiment extraction from small talk on the web[J]. Management Science, 2007, 53(9): 1375-1388.
[3] 朱庆华, 赵宇翔. 信息管理与信息系统研究进展[M]. 武汉: 武汉大学出版社, 2010.
[4] Antweiler W, Frank M Z. Is all that talk just noise? The information content of internet stock message boards[J]. The Journal of Finance, 2004, 59(3): 1259-1294.
[5] Chen H. Smart market and money[J]. IEEE Intell Syst, 2011, 26: 82-96.
[6] Donaldson T, Preston L E. The stakeholder theory of the corporation: Concepts, evidence, and implications[J]. Academy of management Review, 1995, 20(1): 65-91.
[7] Kim W, Jeong O R, Lee S W. On social web sites[J]. Information Systems, 2010, 35(2): 215-236.
[8] Chung W, Chen H, Reid E. Business stakeholder analyzer: An experiment of classifying stakeholders on the Web[J]. Journal of the American Society for Information science and Technology, 2009, 60(1): 59-74.
[9] Zhang Yulei, Dang Yan, Chen H. Gender classification for web forums[J]. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 2011, 41(4): 668-677.
[10] Abbasi A, Chen H, Nunamaker J F. Stylometric identification in electronic markets: Scalability and robustness[J]. Journal of Management Information Systems, 2008, 25(1): 49-78.
[11] Zheng Rong, Li Jiexun, Chen H, et al. A framework for authorship identification of online messages: Writing-style features and classification techniques[J]. Journal of the American Society for Information Science and Technology, 2006, 57(3): 378-393.
[12] Witten I H, Frank E. Data mining: Practical machine learning tools and techniques[M]. Burlington,Massachusetts:Morgan Kaufmann, 2005.
[13] 李涛.我国35个大中城市人力资本投资实证分析[J]. 中国管理科学, 2004, 12 (04): 124-129.
[14] 冯中慧, 鲍军鹏, 沈钧毅.基于EM算法的文本聚类优化研究[J]. 信息与控制 , 2006, 35(05):657-661.
[15] Efron M. Cultural orientation: Classifying subjective documents by cociation analysis[C].Proceedings of AAAI Fall Symposium on Style and Meaning in Language, Art, and Music, washington DC,October 21-24,2004.
[16] Zhang Changli, Zeng D, Li Jiexun, et al. Sentiment analysis of Chinese documents: From sentence to document level[J]. Journal of the American Society for Information Science and Technology, 2009, 60(12): 2474-2487.
[17] Wilson T, Hoffmann P, Somasundaran S, et al. OpinionFinder: A system for subjectivity analysis[C].Proceedings of HLT/EMNLP on Interactive Demonstrations, Association for Computational Linguistics, Vancouver Canada,October 6-8,2005.
[18] Esuli A, Sebastiani F. Sentiwordnet: A publicly available lexical resource for opinion mining[C].Proceedings of LREC,Genoa,Italy,May 22-28,2006.
[19] Pang B, Lee L, Vaithyanathan S. Thumbs up: sentiment classification using machine learing techniques[C]. Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10. Association for Computational Lingguistics, 2002:79-86.
[20] Yu H, Hatzivassiloglou V. Towards answering opinion questions:Separating facts from opinions and identifying the po;arity of opinion sentences [C]. Proceedings of the 2003 conference on Empirical methods in natural language precessing. Association for Computational Linguistics, 2003:129-136.
[21] A.Meena, T. Prabhakar, G.Amati, et al."Sentence Level Sentiment Analysis in the Presence of Conjuncts Using Linguistic Analysis," Advances in Information Retrieveal, vol.4425,Berlin, Heidelberg:Springer Berlin Heidelberg, 2007,pp.573-580.