EURO 2024 Copenhagen
Abstract Submission

EURO-Online login

2723. Simulating Data Envelopment Analysis with Machine Learning: A Clustering-Based Data Preprocessing Technique for Training Set Selection

Invited abstract in session MB-48: DEA and Machine Learning, stream Data Envelopment Analysis and its Application.

Monday, 10:30-12:00
Room: 60 (building: 324)

Authors (first author is the speaker)

1. Barbara Kaminska
Department of Management Systems and Organization Development, Wroclaw University of Science and Technology
2. Dimitrios-Georgios Sotiros
Department of Operations Research and Business Intelligence, Wroclaw University of Science and Technology

Abstract

Data Envelopment Analysis (DEA) is a non-parametric technique for measuring the relative efficiency of a set of decision making units (DMUs), on the basis of multiple inputs and multiple outputs. Performing a typical analysis with DEA requires to solve a series of linear programs, one for each DMU. Therefore, DEA suffers from the curse of dimensionality, i.e., on big data the computational load is very high. This issue is commonly treated in the literature with the adoption of Machine Learning (ML) algorithms. Nevertheless, even though the selection of the training dataset is of crucial importance in such algorithms, in the DEA literature this factor is neglected and all methods rely on random sampling. In this paper, we built on the existing literature and we introduce a clustering-based data preprocessing technique to select the training dataset in a way that it represents the entire dataset as much as possible. We use simulated data to test this new technique against random sampling under different ML algorithms, number of netputs and standard DEA models. We further test it on a network DEA model for two-stage series structures in which the efficiency scores are represented in a two-dimensional vector. In all cases, the results highlight that the proposed technique increases the accuracy of the ML algorithms, whereas it may even decrease the required computational load.

Keywords

Status: accepted


Back to the list of papers