Operations Research 2025
Abstract Submission

2257. Clustering scientific publications: lessons learned through experiments with a real citation network

Invited abstract in session TC-12: Insights through Unsupervised Learning, stream Artificial Intelligence, Machine Learning and Optimization.

Thursday, 11:45-13:15
Room: H10

Authors (first author is the speaker)

1. Thi Huong Vu
Digital Data and Information for Society, Science, and Culture, Zuse Institute Berlin
2. Thorsten Koch
Applied Algorithmic Intelligence Methods, ZIB / TU Berlin

Abstract

Clustering scientific publications helps uncover research structures within bibliographic databases. Graph-based methods such as spectral, Louvain, and Leiden clustering are commonly used due to their ability to model citation networks. However, their effectiveness can diminish when applied to real-world data. This study evaluates these clustering algorithms on a citation graph of about 700,000 articles and 4.6 million citations from the Web of Science. The results show that while scalable methods like Louvain and Leiden perform efficiently, their default settings often yield poor partitioning. Meaningful outcomes require careful parameter tuning, especially for large networks with uneven structures, including a dense core and loosely connected papers. These findings highlight practical lessons about the challenges of large-scale data, method selection and tuning based on specific structures of bibliometric clustering tasks

Keywords

Status: accepted


Back to the list of papers