Featured Capstone · ML lead, team of 2 · May — Jul 2024 · Model

Predicting PHQ-9 Depression Scores from Multi-Clinic EHR Data

Benchmarking and predicting patient-reported depression severity at scale

30% improvement in PHQ-9 prediction accuracy vs. Random Forest baseline

14 Snowflake tables joined into the feature set

Multi clinics contributing patient records

Problem

Mental-health clinicians administer the PHQ-9 questionnaire to track patient depression severity over time, but the scores arrive irregularly and don’t always map cleanly to other clinical signals already captured in the EHR. The company wanted to know two things: how well can existing baseline approaches actually predict PHQ-9 from the rest of the clinical record, and is there room for a temporal model to do better than per-visit snapshots?

Approach

Pulled multi-clinic patient data from the company’s Snowflake warehouse across roughly fourteen related tables — demographics, visit history, diagnostic codes, prior questionnaire responses, and prescription patterns. Built the feature pipeline in Python with pandas, with feature engineering specifically designed to preserve temporal ordering. Trained a Random Forest as the benchmark model, then a Temporal Neural Network that explicitly modeled the sequence of visits per patient. Evaluated both with held-out patients (not held-out visits) to avoid leakage from same-patient correlation.

Outcome

The Temporal Neural Network improved PHQ-9 prediction accuracy by approximately 30% over the Random Forest benchmark on the held-out patient set. Delivered an executive summary, a detailed technical report, and a presentation to my supervising professors and the company’s CEO; the findings fed into subsequent product-design decisions about which clinical signals are worth surfacing in the platform’s clinician-facing views.

Lessons

Held-out splits matter more than model architecture in clinical ML — the early Random Forest looked deceptively strong until the validation strategy switched from held-out visits to held-out patients. The biggest practical win was investing in the data pipeline before the model: most of the accuracy gain came from feature engineering on the warehouse side, not from the temporal architecture itself.

Stack

PythonSnowflake SQLscikit-learnpandasRandom ForestTemporal Neural Network

Done as my Master of Data Science capstone with GreenspaceHealth. Underlying patient data is confidential under research agreement and is not reproduced here — this case study covers methodology, role, and outcomes only.

Problem

Approach

Outcome

Lessons

Stack

Related work