8 September 2023 – Improving quality of OR data sets

Presented by Dr. Inci Yüksel-Ergün, Zuse Institute Berlin (ZIB)

Data is ubiquitous in the age of analytics. The reliability of decisions based on OR studies depends on the underlying data quality. However, identifying pertinent data and assessing its quality is challenging. It is inevitable to employ highly-connected and consistent real-world data sets to model complex decisions. When expert knowledge becomes obsolete with disruptive changes, we require more complex models to comprehend the impacts of these changes.

When conducting projects with industry using highly connected data, we encountered several cases where our analysis detected data errors that were too complex for humans to understand. Examples for our analysis include irreducible infeasible subsystems (IIS) of large mixed-integer programs (MIP) and bottlenecks in highly nonlinear networks. While detecting such errors is a significant achievement, removing them is extremely difficult.

In this presentation, we highlight our insights on data quality improvement. We report our results on data from the German high-pressure gas transport network using methods from data preprocessing and mathematical optimization.

