289. Similarity-based fuzzy clustering scientific articles: potentials and challenges from mathematical and computational perspectives
Invited abstract in session MB-9: Generalized convexity and monotonicity 1, stream Generalized convexity and monotonicity.
Monday, 10:30-12:30Room: B100/8013
Authors (first author is the speaker)
| 1. | Thi Huong Vu
|
| Digital Data and Information for Society, Science, and Culture, Zuse Institute Berlin | |
| 2. | Ida Litzel
|
| Digital Data and Information for Society, Science and Culture, Zuse Institute Berlin | |
| 3. | Thorsten Koch
|
| Applied Algorithmic Intelligence Methods, ZIB / TU Berlin |
Abstract
Fuzzy clustering, which allows an article to belong to multiple clusters with soft membership degrees, plays a vital role in analyzing large-scale publication data. This problem can be formulated as a constrained optimization model, where the goal is to minimize the discrepancy between the similarity observed from data and the similarity derived from a predicted distribution. While this approach benefits from leveraging state-of-the-art optimization algorithms, tailoring them to work with real, massive databases like OpenAlex or Web of Science - containing about 70 million articles and a billion citations - poses significant challenges. In this talk, we discuss potentials and challenges of the approach from both mathematical and computational perspectives. Among other things, second-order optimality conditions are established - providing new theoretical insights - and practical solution methods are proposed by exploiting the problem’s structures. Specifically, we accelerate the gradient projection method with GPU-based parallel computing to handle large-scale data efficiently.
Keywords
- Computational mathematical optimization
- First-order optimization
- Large-scale optimization
Status: accepted
Back to the list of papers