EUROPT 2025
Abstract Submission

289. Similarity-based fuzzy clustering scientific articles: potentials and challenges from mathematical and computational perspectives

Invited abstract in session MB-9: Generalized convexity and monotonicity 1, stream Generalized convexity and monotonicity.

Monday, 10:30-12:30
Room: B100/8013

Authors (first author is the speaker)

1. Thi Huong Vu
Digital Data and Information for Society, Science, and Culture, Zuse Institute Berlin
2. Ida Litzel
Digital Data and Information for Society, Science and Culture, Zuse Institute Berlin
3. Thorsten Koch
Applied Algorithmic Intelligence Methods, ZIB / TU Berlin

Abstract

Fuzzy clustering, which allows an article to belong to multiple clusters with soft membership degrees, plays a vital role in analyzing large-scale publication data. This problem can be formulated as a constrained optimization model, where the goal is to minimize the discrepancy between the similarity observed from data and the similarity derived from a predicted distribution. While this approach benefits from leveraging state-of-the-art optimization algorithms, tailoring them to work with real, massive databases like OpenAlex or Web of Science - containing about 70 million articles and a billion citations - poses significant challenges. In this talk, we discuss potentials and challenges of the approach from both mathematical and computational perspectives. Among other things, second-order optimality conditions are established - providing new theoretical insights - and practical solution methods are proposed by exploiting the problem’s structures. Specifically, we accelerate the gradient projection method with GPU-based parallel computing to handle large-scale data efficiently.

Keywords

Status: accepted


Back to the list of papers