Skip to main navigation Skip to search Skip to main content

Optimizing Credit Risk Prediction in the Financial Sector Using Boosting Algorithms: A Comparative Study with Financial Datasets

Research output: Contribution to journalArticle (Contribution to Journal)peer-review

Abstract

This paper presents a study of credit risk, which is a significant concern for financial institutions. Despite advances in predictive models, there is still room for improvement in accurately assessing credit risk. This study focuses on developing a methodological process to predict credit risk in the financial sector using algorithms based on boosting techniques, such as XGBoost, LightGBM and Boosted Random Forest. We found that datasets with good accessibility and an appropriate variable distribution are contained in the UCI Machine Learning Repository. These datasets have the potential to outperform results with different metrics, such as the F-Score and the Area Under the Curve. The datasets used include Statlog German Credit Data, Statlog Australian Credit Approval, Bank Marketing, Credit Approval, and South German Credit Data. The approach involves feature engineering, exploratory data analysis, and hyperparameter tuning. Furthermore, we propose a new strategy that involves adding a column based on an unsupervised algorithm such as K means. Our results indicate that XGBoost performs better than LightGBM and Boosted Random Forest in different scenarios. Finally, the performance of these boosting-based models is superior to that of Boosted Decision Trees and Factorization Machine models from previous studies. These findings are important for financial institutions seeking an effective methodology to improve the rate of credit risk prediction.

Original languageEnglish
Pages (from-to)793-808
Number of pages16
JournalComputacion y Sistemas
Volume29
Issue number2
DOIs
StatePublished - 2025

Fingerprint

Dive into the research topics of 'Optimizing Credit Risk Prediction in the Financial Sector Using Boosting Algorithms: A Comparative Study with Financial Datasets'. Together they form a unique fingerprint.

Cite this