NLP for Stock Market Prediction with Reddit Data

Reddit and the WallStreetBet subreddit has become a very hot topic on the capital market since the beginning of 2021. The discussions on these forums show the potential to influence the stock market. My project is to build a model to forecast the market movement based on the rich text data from Reddit. Specifically, I have explored sentence embedding, document embedding, CNN-based model, and sentiment analysis methods to leverage the sentence of posts & comments information for market forecasting. This project has tested and compared several types of model architectures. So far, the performance shows that the model could slightly improve performance from the naive forecasting method.