A big data study on the sentiment analysis of social networks and nonlinear system modelling
University of Sheffield, 2018
Online
Hochschulschrift
Zugriff:
In the big data age, the development of social network services has already changed people's way of life. Twitter, as one of the most popular microblogging service, has profoundly influenced and changed our daily life. Twitter users are discussing different kinds of topics, include celebrities, movies, economics, the military and politics. Considering about the number of Twitter users, Twitter may contain numerous useful information. Based on the behaviour psychology, these rich-sentiment data can easily affect other people, especially in consumption behaviour, investment and political issue. Therefore, extract and analysis of Twitter interactive data may help researchers to investigate the political issues and economic systems. This thesis introduces an original programme based on Twitter API and R programming language. This programme applied Twitter keywords search function to obtain related tweets, these opinion-rich datasets about tweets contents, tweets' authors and tweets post time on Twitter can be extracted by Twitter API and R programming. In order to collect more comprehensive Twitter sentiment about political and economic issues, this programme has been extended to geography location search and post time search. This Twitter data extracting method is widely applied in this thesis: there are over 3 million tweets about 2016 US presidential election; 23332 tweets about 2016 UK Brexit referendum; around 90000 daily tweets related to FTSE100 are extracted. A novel text pre-process method for Twitter data is proposed and discussed. The extracted tweets may contain a variety of interference information such as different languages, links, @ someone and garbled. The text pre-process method includes: keep English Twitter and filter other languages Twitter; get the frequency of key sentiment words; reduce interference from garbled, links and @ someone. The NRC lexicon for sentiment analysis has been utilized to real world problems to explore: Twitter sentiment and emotion index daily change about Hillary Clinton and Donald Trump during the period of US presidential election; Twitter sentiment in different parts of UK towards Brexit referendum; daily Twitter sentiment index about UK stock market. According to these datasets, we investigate whether the collective sentiment on Twitter can help to visualize, model and predict these political issues. For the first time, this thesis proposed a hybrid model for Twitter sentiment classification. A novel feature selection methods based on NRC lexicon and classic classification algorithms KNN and Naïve Bayes are combined to improve the performance of Twitter polarity classification. The results are evaluated and validated. Furthermore, this thesis employed wavelet based nonlinear models on stock market systems. There are two case studies has been discussed: the first one is about crude oil price and FTSE100 system; the second one on the study of Twitter sentiment & FTSE100 system. Although applying crude oil price and Twitter sentiment index to model stock market change has been studied by Granger Causality test and ANN related algorithms, this thesis firstly using Wavelet based NARX to model these processes.
Titel: |
A big data study on the sentiment analysis of social networks and nonlinear system modelling
|
---|---|
Autor/in / Beteiligte Person: | Wang, Youchen ; Wei, Hualiang ; Robert, Harrison |
Link: | |
Veröffentlichung: | University of Sheffield, 2018 |
Medientyp: | Hochschulschrift |
Schlagwort: |
|
Sonstiges: |
|