Data Science About Predictive Models

Ever had a model that looked great on paper but underperformed in the real world?

Last semester, I worked on the ASU Displaced Voice project with a team of five, analyzing eviction cases in Maricopa County. We used demographic data—such as ethnic composition, median income, and homeownership rates—mapped to ZIP-code-level eviction counts in 2024.

Our first approach involved classification models to predict whether an eviction occurred in a given area. But the results were underwhelming: the confusion matrix showed a high false positive rate, highlighting a classic case of class imbalance and poor predictive accuracy.

We pivoted to clustering and regression analysis. The initial regression model explained 88% of the variance in eviction counts. In the final phase, I applied manual feature reduction to stabilize the model and improve generalization. This boosted the R² from 88% to 92%—a meaningful 15% improvement in explained variance.

I learned that building a predictive model isn’t just about choosing the right algorithm—it’s equally important to have quality training data and to ask the right questions.

Picture from ChatGPT