Breast Cancer Prediction Project

This project was done during my HarvardX Data Science Certificate as a capstone to my Machine Learning course. In this project, I utilize breast cancer diagnosis biopsy samples for tumors that were benign or malignant to develop a machine learning model to predict malignant or benign tumors. There were 30 features in this model and a sample size of 569. PCA Analysis was conducted for practice, but its results not utilized. For further optimization, it would be ideal to utilize the results of those analysis.

The data can be obtained from the 'dslabs' package in R and loading the dataset 'brca'.

  • Logistic Regression

  • Linear Discriminant Analysis (LDA)

  • Quadratic Discriminant Analysis (QDA)

  • LOESS Regression

  • Random Forest

  • K-Nearest Neighbors

  • K-Means Clustering

Then, I also utilize the ensemble method by combining these models and selecting an output based on “majority rules”.

Models and Accuracy

The ensemble model, or the synthesis of all the models described previously as a whole to make predictions had a good performance as well.

  • Accuracy: 98.3%

  • Sensitivity: 100.0%

  • Specificity: 95.3%

Overall, this was a very fun project where I got to utilize many different ML models and understand the advantages/disadvantages between one or the other.

GitHub Link

Previous
Previous

California's Creative Economy

Next
Next

Movie Recommendations Project