Model Comparison for the Classification of Comments Containing Suicidal Traits from Reddit via NLP and Supervised Learning

Camila Mantilla-Saavedra, Juan Gutiérrez-Cárdenas

Producción científica: Capítulo del libro/informe/acta de congresoArticulo (Contribución a conferencia)revisión exhaustiva

Resumen

In recent years, suicide has become one of the most critical issues regarding public health between teenagers and adults. On the other hand, the growth and wide-spread of social networks and mobile devices have allowed us to compile relevant information that helps us understand the thoughts, feelings, and emotions extracted from these platforms. The detection of suicidal traits on social media has be-come one relevant research topic. It has permitted the identification of probable suicide traits among media users by examining their posts on known social net-works such as Reddit. For that reason, the purpose of the present research is to compare different supervised classification models such as Logistic Regression, Support Vector Machines, Random Forest, AdaBoost, Gradient Boosting, and XGBoost; together with feature extraction techniques such as TF-IDF and Glove. The results from our experiments show that the best model is SVM with TF-IDF obtaining metrics of 91.50% in Accuracy, 92.40% in Precision, 90.30% in Re-call, and 91.50% regarding the F1-score. This study also shows that TF-IDF for feature extraction outperforms Glove when applied to the different models tested.

Idioma originalInglés
Título de la publicación alojadaInformation Management and Big Data - 8th Annual International Conference, SIMBig 2021, Proceedings
EditoresJuan Antonio Lossio-Ventura, Jorge Valverde-Rebaza, Eduardo Díaz, Denisse Muñante, Carlos Gavidia-Calderon, Alan Demétrius Valejo, Hugo Alatrista-Salas
EditorialSpringer Science and Business Media Deutschland GmbH
Páginas253-263
Número de páginas11
ISBN (versión digital)978-3-031-04447-2
ISBN (versión impresa)978-3-031-04446-5
DOI
EstadoPublicada - 20 abr. 2022
Evento8th Annual International Conference on Information Management and Big Data, SIMBig 2021 - Virtual, Online
Duración: 1 dic. 20213 dic. 2021

Serie de la publicación

NombreCommunications in Computer and Information Science
Volumen1577 CCIS
ISSN (versión impresa)1865-0929
ISSN (versión digital)1865-0937

Conferencia

Conferencia8th Annual International Conference on Information Management and Big Data, SIMBig 2021
CiudadVirtual, Online
Período1/12/213/12/21

COAR

  • Artículo de conferencia

Huella

Profundice en los temas de investigación de 'Model Comparison for the Classification of Comments Containing Suicidal Traits from Reddit via NLP and Supervised Learning'. En conjunto forman una huella única.

Citar esto