Predictive Modeling
High-Cost Claim Classifier
Prospective healthcare risk model using medallion data engineering, calibrated scoring, lift curves, and top-k capture diagnostics.
Data Science | Machine Learning | Actuarial Analytics
A deployed healthcare risk application that uses distributed data engineering, statistically defensible model selection, and calibrated decision support to identify beneficiaries likely to become high-cost next year.
This portfolio page centers the deployed Project 2 application while making the broader engineering story visible: model development, cloud pipeline design, and operational serving.
Predictive Modeling
Prospective healthcare risk model using medallion data engineering, calibrated scoring, lift curves, and top-k capture diagnostics.
Policy Prototype
A simulated intervention-policy prototype that maps scored beneficiaries into discrete MDP states and compares long-run action values.
Cloud Pipeline
Databricks bronze, silver, and gold layers feeding MLflow artifacts, FastAPI serving, and public deployment.
The modeling frame is prospective: year-t beneficiary features predict year-t+1 high-cost status. Logistic regression is retained as an interpretable actuarial benchmark, while random forest, gradient boosting, and XGBoost are evaluated using PR-AUC, top-k capture, lift, calibration, and Brier score rather than raw accuracy alone.
The application combines a Databricks medallion pipeline, model artifacts served through FastAPI, and a public static interface deployed through Cloudflare Pages.
Enter current-year utilization and cost signals. The app estimates next-year high-cost risk and returns a simulated policy recommendation.
The risk engine is empirically trained on observed data. The intervention policy layer is a simulated decision prototype.