Desarrollo de un modelo de aprendizaje automático no supervisado para seleccionar noticias relevantes

Hernández Saavedra, Juan Camilo

This item is non-discoverable

Desarrollo de un modelo de aprendizaje automático no supervisado para seleccionar noticias relevantes

dc.contributor.advisor	Linares Ospina, Diego Luis
dc.contributor.advisor	Álvarez Vargas, Gloria Inés
dc.contributor.author	Hernández Saavedra, Juan Camilo
dc.date.accessioned	2024-08-22T15:27:57Z
dc.date.available	2024-08-22T15:27:57Z
dc.date.issued	2024
dc.description.abstract	La forma en que las personas se informan ha evolucionado constantemente con la proliferación de la tecnología. La mayoría de todos los medios de comunicación han abandonado parcialmente sus formatos físicos para adaptarse al mundo digital, más precisamente al entorno web. Este cambio ha llevado a un gran aumento en el número de lectores, generando beneficios tanto a los noticieros como a los lectores. Uno de los tantos beneficios que podemos encontrar es la facilidad y la rapidez con la que la información es llevada a los lectores, permitiéndoles acceder a las noticias en el lugar y momento que quieran con solo hacer un par de clics. Aunque inicialmente los beneficios eran evidentes, con el tiempo surgieron desafíos que han afectado a los medios de comunicación que publican noticias en formato web. Entre los problemas más comunes podemos encontrar la combinación de noticias irrelevantes con noticias relevantes para el lector, lo cual puede influir en el pensamiento e interés que ellos reciben durante sucesos importantes, teniendo en cuenta que las noticias, al tener una gran influencia en la percepción y toma de decisiones en la población, son una parte fundamental de la sociedad. En este trabajo se presentó una solución haciendo uso de modelos de aprendizaje automático no supervisado, representación de textos haciendo uso de técnicas del procesamiento del lenguaje natural, junto con una estrategia que consiste en obtener noticias de varios portales web de noticias. Si una noticia aparece dentro de varios portales, es considerada relevante. Esto se logra gracias a los modelos de representación de textos que permiten extraer el sentido y contexto de un titular, para posteriormente ser agrupados haciendo uso de modelos de clustering. Por último, estos modelos de clustering son ajustados haciendo uso de búsqueda de hiperparámetros, permitiendo obtener su mayor precisión posible. Finalmente, se logró construir dos modelos de clustering que, haciendo uso de modelos, representación de texto, técnicas de procesamiento del lenguaje natural y búsqueda de hiperparámetros para ajustar al máximo su precisión, son capaces de discernir qué noticias son relevantes de un grupo de noticias. Para demostrar el funcionamiento, se diseñó un pequeño prototipo de portal web de noticias, que contiene estos modelos de clustering en funcionamiento.
dc.description.abstracteng	The way people inform themselves has constantly evolved with the proliferation of technology. Most media have partially abandoned their physical formats to adapt to the digital world, more precisely to the web environment. This change has led to a large increase in readership, generating benefits for both newscasters and readers. One of the many benefits we can find is the ease and speed with which information is brought to readers, allowing them to access the news wherever and whenever they want with just a few clicks. Although initially the benefits were evident, over time, challenges arose that have affected media outlets that publish news in web format. Among the most common problems, we can find the combination of irrelevant news with relevant news for the reader, which can influence the thinking and interest they receive during important events, taking into account that the news has a great influence on perception. Decision-making in the population is a fundamental part of society. In this work, a solution was presented using unsupervised machine learning models, text representation using natural language processing techniques, along with a strategy that consists of obtaining news from various news web portals. If a news item appears on several portals, it is considered relevant. This is achieved thanks to text representation models that allow the meaning and context of a headline to be extracted, to later be grouped using clustering models. Finally, these clustering models are adjusted using hyperparameter search, allowing them to obtain the highest possible precision. Finally, it was possible to build two clustering models that, using models, text representation, natural language processing techniques, and hyperparameter search to adjust their precision as much as possible, were able to discern which news is relevant from a newsgroup. To demonstrate how it works, a small prototype of a news web portal was designed, which contains these clustering models in operation.
dc.format.extent	52 p.
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/11522/3875
dc.language.iso	spa
dc.publisher	Pontificia Universidad Javeriana Cali
dc.publisher.faculty	Facultad de Ingeniería y Ciencias
dc.rights.accessrights	http://purl.org/coar/access_right/c_14cb
dc.rights.creativecommons	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Noticias
dc.subject	Representación de textos
dc.subject	Relevante e irrelevante
dc.subject	News
dc.subject	Text representation
dc.subject	Clustering
dc.thesis.discipline	Facultad de Ingeniería y Ciencias. Ingeniería de Sistemas y Computación
dc.thesis.grantor	Pontificia Universidad Javeriana Cali
dc.thesis.level	Pregrado
dc.thesis.name	Ingeniero(a)de Sistemas y Computación
dc.title	Desarrollo de un modelo de aprendizaje automático no supervisado para seleccionar noticias relevantes	spa
dc.type.coar	http://purl.org/coar/resource_type/c_7a1f
dc.type.local	Tesis/Trabajo de grado - Monografía - Pregrado
dc.type.redcol	https://purl.org/redcol/resource_type/TP

Files

Original bundle

Now showing 1 - 3 of 3

Name:: Modelo_noticias_relevantes.pdf.pdf
Size:: 1.58 MB
Format:: Adobe Portable Document Format

Download

Name:: Articulo_científico.pdf.pdf
Size:: 168.64 KB
Format:: Adobe Portable Document Format

Download

Name:: Licencia_autorizacion.pdf.pdf
Size:: 187.43 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Ingeniería de Sistemas y Computación