Operations Research 2025
Abstract Submission

2317. Feature selection for Neural Network Forecasting - an empirical evaluation of partial dependence, pertubation and gradient techniques for railway revenue forecasting

Invited abstract in session WE-6: Predictive Analytics: Forecasting II, stream Analytics, Data Science, and Forecasting.

Wednesday, 16:30-18:00
Room: H9

Authors (first author is the speaker)

1. Sven F. Crone
Department of Management Science, Lancaster University Management School

Abstract

Feature selection is considered of preeminent importance for specifying accurate, robust and efficient machine learning methods, attracting over 22,000 academicspapers. While some prominent methods include feature selection in their methodology, such as decision tree-based methods of xgboost, others inlcuding neural networks (NN) leave feature selection to the modeller. For time series data, most papers employ simple heuristics such as p= n*s autoregressive lags yt-p, with s = seasonal length, and n typically set between 1 and 3 (e.g. nnetar, Hyndman and Caceres, 2024). Crone and Kourentzes (2010) propose established statistical techniques, such as ACF, PACF, and stepwise regression for NN feature selection. Zimmermann et al. (2020) suggest to unfold partial dependence plots across time series and check importance, consistency and nonlinearity. More recent methods determine feature importance based on the predictive performance of the output, including pertubation baeed feature importance applicable across ML methods, and gradient based feature importance utilising the learning information of neural networks during parameterisation.
However, despite their prominence in classification, clustering and regression research, only few papers consider these feature selection methods on time series data for forecasting. This paper seeks to address this gap by comparing statistical approaches, heuristics and three ML feature selection approaches pertubation, partial dependence and gradient weight based approaches on the empirical accuracy for a real world railway demand dataset with many features. The results indicate the promise of more advanced approaches of feature importance over simpler methods, and suggest computationally efficient trade-offs between accuracy and speed.

Keywords

Status: accepted


Back to the list of papers