EURO 2024 Copenhagen
Abstract Submission

EURO-Online login

2302. Hierarchical clustering with Mixed Integer Linear Optimization Models

Invited abstract in session WD-27: Machine Learning for and with Mathematical Optimization, stream Mathematical Optimization for XAI.

Wednesday, 14:30-16:00
Room: 047 (building: 208)

Authors (first author is the speaker)

1. Lorena Nácher
University Miguel Hernández of Elche
2. Mercedes Landete
Departamento de Estadística y Matemática Aplicada, University Miguel Hernández of Elche
3. Marina Leal Palazón
Statistics, Mathematics and Informatics, Universidad Miguel Hernandez
4. Vanesa Guerrero
Department of Statistics, Universidad Carlos III de Madrid
5. Martine Labbé
computer Science, Université Libre de Bruxelles

Abstract

Dendrograms are graphical representations of hierarchical clustering that illustrate how individual elements can be grouped into clusters. Depending on the chosen distance measure, different dendrograms can be obtained. The purity measure is used to evaluate the quality of the dendrograms. Purity is defined in terms of the homogeneity in the distribution of observations in the dendrogram with respect to their classes. The first part of this work focuses on single-linkage dendrograms and on purity measure. We introduce the first mathematical optimization model to calculate the purity of single-linkage dendrograms in hierarchical clustering with labeled data. This mixed integer linear model is based on the ordered median problem. The second part addresses complete-linkage dendrograms. We propose a system of linear inequalities whose solutions are complete-linkage dendrograms. We analyze the impact of ties in the cost matrix in the system and prove that when there are no ties the solution of the system is unique and is the complete-linkage dendrogram while when there are ties the system has as many solutions as alternatives complete-linkage dendrograms the matrix has. This study contributes to understanding the construction and evaluation of dendrograms in hierarchical clustering.

Keywords

Status: accepted


Back to the list of papers