Diabetes Prediction Model Analysis
Predicting diabetes risk from clinical features with multiple ML approaches
The problem
Public diabetes-prediction tutorials usually report a single accuracy number on a single split — useless for understanding when a model will fail in deployment.
My contribution
Built a comparative diabetes risk pipeline (logistic regression, random forest, gradient boosting) with stratified cross-validation, threshold tuning, and per-class performance breakdown. Surfaced where each model under- and over-predicts across the population.
Outcome
Calibrated risk model with explainable feature importance and confusion-matrix breakdown by demographic. Shows the trade-offs between sensitivity and specificity that matter for actual clinical screening.
What I learned
Different model families latch onto different feature subsets even when their accuracy is similar — looking at feature importance alone (without checking calibration and per-class metrics) can be misleading.
- Type
- Model
- Role
- Solo · modeling and analysis
- Timeframe
- 2024
- Stack
-
PythonScikit-learnPandasMatplotlibSeaborn
- Tags
-
ClassificationDiabetesRisk AnalysisPublic HealthPredictive Analytics


