Breast Cancer Prediction Project
This project was done during my HarvardX Data Science Certificate as a capstone to my Machine Learning course. In this project, I utilize breast cancer diagnosis biopsy samples for tumors that were benign or malignant to develop a machine learning model to predict malignant or benign tumors. There were 30 features in this model and a sample size of 569. PCA Analysis was conducted for practice, but its results not utilized. For further optimization, it would be ideal to utilize the results of those analysis.
The data can be obtained from the 'dslabs' package in R and loading the dataset 'brca'.
Logistic Regression
Linear Discriminant Analysis (LDA)
Quadratic Discriminant Analysis (QDA)
LOESS Regression
Random Forest
K-Nearest Neighbors
K-Means Clustering
Then, I also utilize the ensemble method by combining these models and selecting an output based on “majority rules”.
Models and Accuracy
The ensemble model, or the synthesis of all the models described previously as a whole to make predictions had a good performance as well.
Accuracy: 98.3%
Sensitivity: 100.0%
Specificity: 95.3%
Overall, this was a very fun project where I got to utilize many different ML models and understand the advantages/disadvantages between one or the other.