← All projects
Featured Solo · modeling and analysis · 2024 · Model

Diabetes Prediction Model Analysis

Predicting diabetes risk from clinical features with multiple ML approaches

Diabetes Prediction Model Analysis
3 models Compared head-to-head
Stratified Cross-validated
Calibrated Threshold-tuned per use case

The problem

Public diabetes-prediction tutorials usually report a single accuracy number on a single split — useless for understanding when a model will fail in deployment.

My contribution

Built a comparative diabetes risk pipeline (logistic regression, random forest, gradient boosting) with stratified cross-validation, threshold tuning, and per-class performance breakdown. Surfaced where each model under- and over-predicts across the population.

Outcome

Calibrated risk model with explainable feature importance and confusion-matrix breakdown by demographic. Shows the trade-offs between sensitivity and specificity that matter for actual clinical screening.

What I learned

Different model families latch onto different feature subsets even when their accuracy is similar — looking at feature importance alone (without checking calibration and per-class metrics) can be misleading.

Type
Model
Role
Solo · modeling and analysis
Timeframe
2024
Stack
PythonScikit-learnPandasMatplotlibSeaborn
Tags
ClassificationDiabetesRisk AnalysisPublic HealthPredictive Analytics