TY - JOUR
T1 - Classification of Breast Cancer and Breast Neoplasm Scenarios Based on Machine Learning and Sequence Features from lncRNAs–miRNAs-Diseases Associations
AU - Gutiérrez-Cárdenas, Juan
AU - Wang, Zenghui
N1 - Funding Information:
This research is supported partially by South African National Research Foundation Grants (Nos. 114911 & 132797) and Tertiary Education Support Programme (TESP) of South African ESKOM.
Publisher Copyright:
© 2021, International Association of Scientists in the Interdisciplinary Areas.
PY - 2021/12
Y1 - 2021/12
N2 - The influence of non-coding RNAs, such as lncRNAs (long non-coding RNAs) and miRNAs (microRNAs), is undeniable in several diseases, for example, in the formation of neoplasms and cancer scenarios. However, there are challenges due to the scarcity of validated datasets and the imbalance in the data. We found that the research of associations between miRNAs-lncRNAs and diseases is limited or done separately. In addition, those investigations, which use Machine Learning models joined with genomic sequence features extracted from miRNAs and lncRNAs, are few compared with using some methods such as genomic expression or Deep Learning techniques. In this paper, we propose a structure of using supervised and unsupervised machine learning models with genomic sequence features, such as k-mers, sequence alignments, and energy folding values, to validate miRNAs and lncRNAs association with breast cancer and neoplasms scenarios. Using One-Class SVM for outlier detection and comparing two supervised models such as SVM and Random Forest, we manage to obtain accuracy results of 95.44% for the One-class model, with 88.79% and 99.65% for the SVM and Random Forest models, respectively. The results showed a promising path for the study of sequence features interactions joined with Machine Learning models comparable to those found in the existing literature. Graphic Abstract: [Figure not available: see fulltext.]
AB - The influence of non-coding RNAs, such as lncRNAs (long non-coding RNAs) and miRNAs (microRNAs), is undeniable in several diseases, for example, in the formation of neoplasms and cancer scenarios. However, there are challenges due to the scarcity of validated datasets and the imbalance in the data. We found that the research of associations between miRNAs-lncRNAs and diseases is limited or done separately. In addition, those investigations, which use Machine Learning models joined with genomic sequence features extracted from miRNAs and lncRNAs, are few compared with using some methods such as genomic expression or Deep Learning techniques. In this paper, we propose a structure of using supervised and unsupervised machine learning models with genomic sequence features, such as k-mers, sequence alignments, and energy folding values, to validate miRNAs and lncRNAs association with breast cancer and neoplasms scenarios. Using One-Class SVM for outlier detection and comparing two supervised models such as SVM and Random Forest, we manage to obtain accuracy results of 95.44% for the One-class model, with 88.79% and 99.65% for the SVM and Random Forest models, respectively. The results showed a promising path for the study of sequence features interactions joined with Machine Learning models comparable to those found in the existing literature. Graphic Abstract: [Figure not available: see fulltext.]
KW - Breast cancer
KW - Breast neoplasms
KW - Long non-coding RNAs
KW - microRNAs
KW - One-class SVM
KW - Supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85117284645&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/f0e9fa57-4329-385c-9ad1-3ce6b865bb5d/
U2 - 10.1007/s12539-021-00451-6
DO - 10.1007/s12539-021-00451-6
M3 - Artículo (Contribución a Revista)
AN - SCOPUS:85117284645
SN - 1913-2751
VL - 13
SP - 572
EP - 581
JO - Interdisciplinary Sciences: Computational Life Sciences
JF - Interdisciplinary Sciences: Computational Life Sciences
IS - 4
ER -