Developing a Machine Learning Model Using Gene Expression for Breast Cancer Prediction

Main Article Content

Babatunde Abdulrauph Olarewaju Alausa Babatunde Mubarak Oke Afeez Adeshina

Abstract

Recent advancements in genomics have generated vast gene expression datasets, offering profound insights into cancer biology. This study investigates an ensemble machine learning model, integrating K-Nearest Neighbors (KNN), Support Vector Classifier (SVC), and XGBoost, to predict and classify breast cancer subtypes from gene expression profiles. The methodology encompassed data preprocessing, including one-hot encoding, followed by model training and evaluation using standard metrics. The ensemble model achieved a strong overall accuracy of 90.32%. Crucially, it demonstrated a high precision of 0.9240, effectively minimizing false positives which is a key consideration for clinical diagnostics. While the model showed balanced performance with an F1-score of 0.9015, a comparative analysis revealed that, although individual baseline models (SVM, RF) reported higher raw accuracy of ~99%, the proposed ensemble provides a robust and interpretable framework optimized for reliable multi-class discrimination.

Article Details

How to Cite
OLAREWAJU, Babatunde Abdulrauph; MUBARAK, Alausa Babatunde; ADESHINA, Oke Afeez. Developing a Machine Learning Model Using Gene Expression for Breast Cancer Prediction. Zambia Journal of Library & Information Science (ZAJLIS ), ISSN: 2708-2695, [S.l.], v. 9, n. 2, p. 10-21, dec. 2025. ISSN 2708-2695. Available at: <https://zajlis.unza.zm/index.php/journal/article/view/208>. Date accessed: 01 jan. 2026. doi: https://doi.org/10.53974/unza.zajlis.9.2.208.
Section
Information and Communication Technologies(ICTs)