Diabetes Prediction Model Analysis

Predicting diabetes risk from clinical features with multiple ML approaches

Live demo View code

3 models Compared head-to-head

Stratified Cross-validated

Calibrated Threshold-tuned per use case

The problem

Public diabetes-prediction tutorials usually report a single accuracy number on a single split — useless for understanding when a model will fail in deployment.

My contribution

Built a comparative diabetes risk pipeline (logistic regression, random forest, gradient boosting) with stratified cross-validation, threshold tuning, and per-class performance breakdown. Surfaced where each model under- and over-predicts across the population.

Outcome

Calibrated risk model with explainable feature importance and confusion-matrix breakdown by demographic. Shows the trade-offs between sensitivity and specificity that matter for actual clinical screening.

What I learned

Different model families latch onto different feature subsets even when their accuracy is similar — looking at feature importance alone (without checking calibration and per-class metrics) can be misleading.

Type: Model
Role: Solo · modeling and analysis
Timeframe: 2024
Stack: PythonScikit-learnPandasMatplotlibSeaborn
Tags: ClassificationDiabetesRisk AnalysisPublic HealthPredictive Analytics

🎯The problem

🛠️My contribution

📈Outcome

💡What I learned

Related work

The problem

My contribution

Outcome

What I learned