2257. Clustering scientific publications: lessons learned through experiments with a real citation network
Invited abstract in session TC-12: Insights through Unsupervised Learning, stream Artificial Intelligence, Machine Learning and Optimization.
Thursday, 11:45-13:15Room: H10
Authors (first author is the speaker)
| 1. | Thi Huong Vu
|
| Digital Data and Information for Society, Science, and Culture, Zuse Institute Berlin | |
| 2. | Thorsten Koch
|
| Applied Algorithmic Intelligence Methods, ZIB / TU Berlin |
Abstract
Clustering scientific publications helps uncover research structures within bibliographic databases. Graph-based methods such as spectral, Louvain, and Leiden clustering are commonly used due to their ability to model citation networks. However, their effectiveness can diminish when applied to real-world data. This study evaluates these clustering algorithms on a citation graph of about 700,000 articles and 4.6 million citations from the Web of Science. The results show that while scalable methods like Louvain and Leiden perform efficiently, their default settings often yield poor partitioning. Meaningful outcomes require careful parameter tuning, especially for large networks with uneven structures, including a dense core and loosely connected papers. These findings highlight practical lessons about the challenges of large-scale data, method selection and tuning based on specific structures of bibliometric clustering tasks
Keywords
- Machine Learning
- Graphs and Networks
- Big Data
Status: accepted
Back to the list of papers