Treat abstract

> Treat abstract

2257. Clustering scientific publications: lessons learned through experiments with a real citation network

Invited abstract in session TC-12: Insights through Unsupervised Learning, stream Artificial Intelligence, Machine Learning and Optimization.

Thursday, 11:45-13:15
Room: H10

Authors (first author is the speaker)

1.	Thi Huong Vu
	Digital Data and Information for Society, Science, and Culture, Zuse Institute Berlin
2.	Thorsten Koch
	Applied Algorithmic Intelligence Methods, ZIB / TU Berlin

Abstract

Clustering scientific publications helps uncover research structures within bibliographic databases. Graph-based methods such as spectral, Louvain, and Leiden clustering are commonly used due to their ability to model citation networks. However, their effectiveness can diminish when applied to real-world data. This study evaluates these clustering algorithms on a citation graph of about 700,000 articles and 4.6 million citations from the Web of Science. The results show that while scalable methods like Louvain and Leiden perform efficiently, their default settings often yield poor partitioning. Meaningful outcomes require careful parameter tuning, especially for large networks with uneven structures, including a dense core and loosely connected papers. These findings highlight practical lessons about the challenges of large-scale data, method selection and tuning based on specific structures of bibliometric clustering tasks

Keywords

Machine Learning
Graphs and Networks
Big Data

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy