Developing a Machine Learning Model Using Gene Expression for Breast Cancer Prediction
Main Article Content
Abstract
Recent advancements in genomics have generated vast gene expression datasets, offering profound insights into cancer biology. This study investigates an ensemble machine learning model, integrating K-Nearest Neighbors (KNN), Support Vector Classifier (SVC), and XGBoost, to predict and classify breast cancer subtypes from gene expression profiles. The methodology encompassed data preprocessing, including one-hot encoding, followed by model training and evaluation using standard metrics. The ensemble model achieved a strong overall accuracy of 90.32%. Crucially, it demonstrated a high precision of 0.9240, effectively minimizing false positives which is a key consideration for clinical diagnostics. While the model showed balanced performance with an F1-score of 0.9015, a comparative analysis revealed that, although individual baseline models (SVM, RF) reported higher raw accuracy of ~99%, the proposed ensemble provides a robust and interpretable framework optimized for reliable multi-class discrimination.
Article Details
Articles submitted to ZAJLIS should not have been published before in their current or substantially similar form, or be under consideration for publication with another journal. Authors submitting articles for publication warrant that the work is not an infringement of any existing copyright and will indemnify the publisher against any breach of such warranty. For ease of dissemination and to ensure proper policing of use, papers and contributions become the legal copyright of the publisher unless otherwise agreed. The editors may make use of software for checking the originality of submissions received.
Prior to article submission, authors should clear permission to use any content that has not been created by them. Failure to do so may lead to lengthy delays in publication. ZAJLIS is unable to publish any article which has permissions pending. The rights ZAJLIS require are:
- Non-exclusive rights to reproduce the material in the article or book chapter.
- Print and electronic rights.
- Worldwide English language rights.
- To use the material for the life of the work (i.e. there should be no time restrictions on the re-use of material e.g. a one-year licence).