672. Evaluating traditional ML models in predicting uncertainty of real estate prices
Invited abstract in session WC-34: Business Applications of Knowledge and Technology, stream Advancements of OR-analytics in statistics, machine learning and data science.
Wednesday, 12:30-14:00Room: Michael Sadler LG10
Authors (first author is the speaker)
| 1. | Jose A. Rodriguez-Serrano
|
| Operations, Innovation and Data Sciences, Universtitat Ramon Llull Esade |
Abstract
A body of OR literature has dealt with the prediction of real estate prices, as some firms rely on accurate prediction to enable revenue management or portfolio optimization; plus it is a basic component of operations in specific sectors (e.g. insurance, market research).
Recently, machine learning (ML) is replacing old models such as OLS in valuation. Models such as random forests typically yield better metrics, such as root mean squared error (RMSE) or R2, in prediction tasks.
One drawback is that standard ML models for regression only provide point estimates (the expectation of the price conditioned on the predictors). In realistic valuation tasks, however, the point estimate alone is not enough as it ignores the uncertainty of the price; and traditional ML models are not designed to provide the conditional distribution.
In this work we propose a novel methodology to evaluate the quality of a ML model in predicting ranges, not punctual estimates. We select 5 standard ML models (linear regression, decision tree, K-NN, random forest, lightgbm), adapt them to “recover” the conditional price distribution, and compare them with new metrics on data of 3 real estate markets in Spain. We find that random forest has the best quality in predicting ranges in general, while K-NN are well-suited when constrained to small prediction ranges.
This work sheds light on using ML for valuation and other OR tasks in settings where prediction risk is critical.
Keywords
- Machine Learning
Status: accepted
Back to the list of papers