《Constructing Financial Sentimental Factors in Chinese Market Using
Natural Language Processing》
---
作者:
Junfeng Jiang, Jiahao Li
---
最新提交年份:
2018
---
英文摘要:
In this paper, we design an integrated algorithm to evaluate the sentiment of Chinese market. Firstly, with the help of the web browser automation, we crawl a lot of news and comments from several influential financial websites automatically. Secondly, we use techniques of Natural Language Processing(NLP) under Chinese context, including tokenization, Word2vec word embedding and semantic database WordNet, to compute Senti-scores of these news and comments, and then construct the sentimental factor. Here, we build a finance-specific sentimental lexicon so that the sentimental factor can reflect the sentiment of financial market but not the general sentiments as happiness, sadness, etc. Thirdly, we also implement an adjustment of the standard sentimental factor. Our experimental performance shows that there is a significant correlation between our standard sentimental factor and the Chinese market, and the adjusted factor is even more informative, having a stronger correlation with the Chinese market. Therefore, our sentimental factors can be important references when making investment decisions. Especially during the Chinese market crash in 2015, the Pearson correlation coefficient of adjusted sentimental factor with SSE is 0.5844, which suggests that our model can provide a solid guidance, especially in the special period when the market is influenced greatly by public sentiment.
---
中文摘要:
在本文中,我们设计了一个综合算法来评估中国市场的情绪。首先,借助web浏览器自动化,我们自动抓取多个有影响力的金融网站的大量新闻和评论。其次,我们利用汉语语境下的自然语言处理技术,包括标记化、Word2vec单词嵌入和语义数据库WordNet,计算这些新闻和评论的Senti分数,然后构建情感因素。在这里,我们构建了一个特定于金融的情感词汇,以便情感因素能够反映金融市场的情绪,而不是幸福、悲伤等一般情绪。第三,我们还对标准情感因素进行了调整。我们的实验结果表明,我们的标准情绪因素与中国市场之间存在显著的相关性,调整后的因素信息量更大,与中国市场的相关性更强。因此,我们的情感因素可以作为投资决策的重要参考。特别是在2015年中国股市崩盘期间,调整后的情绪因素与苏格兰和南方能源公司的皮尔逊相关系数为0.5844,这表明我们的模型能够提供坚实的指导,尤其是在市场受公众情绪影响较大的特殊时期。
---
分类信息:
一级分类:Quantitative Finance 数量金融学
二级分类:Computational Finance 计算金融学
分类描述:Computational methods, including Monte Carlo, PDE, lattice and other numerical methods with applications to financial modeling
计算方法,包括蒙特卡罗,偏微分方程,格子和其他数值方法,并应用于金融建模
--
一级分类:Computer Science 计算机科学
二级分类:Computation and Language 计算与语言
分类描述:Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.
涵盖自然语言处理。大致包括ACM科目I.2.7类的材料。请注意,人工语言(编程语言、逻辑学、形式系统)的工作,如果没有明确地解决广义的自然语言问题(自然语言处理、计算语言学、语音、文本检索等),就不适合这个领域。
--
---
PDF下载:
-->