2875. A Mathematical Programming Approach to Hierarchical Clustering
Invited abstract in session TD-34: Advancements of OR-analytics in statistics, machine learning and data science 5, stream Advancements of OR-analytics in statistics, machine learning and data science.
Tuesday, 14:30-16:00Room: Michael Sadler LG10
Authors (first author is the speaker)
| 1. | Lavinia Amorosi
|
| Statistical Sciences, Sapienza | |
| 2. | Justo Puerto
|
| Estadistica e I.O., Universidad de Sevilla | |
| 3. | Carlos Valverde
|
| University of Seville |
Abstract
Hierarchical clustering is a statistical technique to study the occurring groups (clusters) within a dataset creating a hierarchy of clusters. This is represented by a rooted tree (dendrogram) whose leaves correspond to the data points, and each internal node represents the cluster containing its descendant leaves. Among methods to perform hierarchical clustering, the agglomerative ones are based on greedy procedures that return a sequence of nested partitions, where each level up joins two clusters of the lower partition relying on a local criterion. In this work, motivated by the lack of exact approaches that guarantee global optimality, we focus on a unified mathematical programming formalisation that embeds single and complete linkage procedures. Through preliminary experiments, we evaluate, according to different measures commonly used in this context, the dendrograms obtained from the exact resolution of the formulations and those produced by the greedy approach. Furthermore, by exploiting the mathematical formulation, we also present a scalable matheuristic algorithm capable of generating better quality dendrograms than those produced by the greedy approach, even for large-sized datasets.
Keywords
- Mathematical Programming
- Analytics and Data Science
- Programming, Mixed-Integer
Status: accepted
Back to the list of papers