Análisis de sentimientos integrado en un modelo de predicción del precio de las acciones, utilizando técnicas de aprendizaje automático

Abstract
El mercado de acciones es uno que cuenta con cierta antigüedad, es por esto que se han desarrollado y probado una gran variedad técnicas para intentar predecir el comportamiento del valor del precio de las acciones. Sin embargo, para estas predicciones se tienen en cuenta, en mayor medida, variables como el histórico del precio, dejando de lado otro tipo de información como el análisis de sentimientos. En este proyecto se quiso contribuir a la investigación sobre el aporte que puede hacer este tipo de variable menos usada en la predicción del precio de las acciones, por este motivo, se llevó a cabo una integración entre dos técnicas con diferente grado de exploración. Para la parte del análisis de sentimientos se usó un dataset de tweets de la empresa Apple al cual se le aplicó técnicas de Procesamiento de Lenguaje Natural para el preprocesa miento, 2 algoritmos de aprendizaje semi-supervisado para ayudar a etiquetar los sentimientos de todos los tweets y 3 modelos de aprendizaje automático para que pudieran etiquetar tweets nuevos, siendo este la Máquina de Soporte Vectorial. Para la parte del histórico del precio se usó un dataset de diferentes valores de las acciones de la empresa Apple, gracias al cual se en traron 3 modelos de aprendizaje automático de los cuales la LSTM tuvo los mejores resultados. Posteriormente, se integraron los sentimientos obtenidos al histórico del precio, obteniendo que el mejor modelo era el Random Forest, sin embargo, no conseguía superar al mejor modelo que únicamente usaba el histórico del precio. Con dicho modelo de Random Forest se realizaron pruebas en un mercado simulado, determinando que el uso tanto del histórico del precio como del análisis de sentimientos es posible y tiene resultados aceptables, no obstante, la estrategia de compra y venta debe examinarse con mayor rigurosidad para darle un uso a estos modelos en el mundo real.
Description
item.page.descriptioneng
The stock market is one that has a certain antiquity, for this reason, over the years techniques have been tested and designed to predict the behavior of the value of stocks. Some of these use only the historical price as a basis, while others include indices associated with the companies of the stock. All of the above is done to obtain a trend with which to decide whether it is a good time to buy or sell shares in the market. In the literature, the application of machine learning in the field of the stock market has been explored to a greater extent using price history. However, there are other methods that have been less employed, being sentiment analysis an example of these. For this reason, in this project we sought to integrate two techniques with different degrees of exploration such as price history and sentiment analysis. In order to develop the sentiment analysis section, Natural Language Processing techniques we re used for data processing. In addition to the above, semi-supervised learning algorithms such as Label Propagation and Label Spreading were explored in order to label all records in the dataset based on only a few known sentiment data. In addition, three machine learning models were trained and evaluated with the aim of finding the best one to subsequently use for sentiment classification of current tweets, finding that the one with the best results was the Support Vector Machine model. To test the performance of the models used in this project, the R2-score, Mean Square Error, F1-score and Accuracy metrics were used. In addition, tests were performed with the help of a broker platform to interact with a simulated market using a buy/sell strategy. Prior to integration with sentiment analysis, the best model was found to be the LSTM and once sentiment analysis was integrated, the best model was found to be the Random Forest. From the tests with the broker platform, it could be observed how one of the models with sen timent analysis had acceptable results. However, the weaknesses of the selected buy/sell strategy were also evidenced, so that even having a model with good results, a further review of buy and sell strategies is needed, to give a practical use in real life to the models.
Keywords
Stock market, Machine learning, Sentiment analysis, Natural language processing, Semi-supervised learning
Citation