ECCO 2024
Abstract Submission

33. A combinatorial optimization approach to query-guided document set expansion

Invited abstract in session TC-1: Algorithms, stream Algorithms.

Thursday, 11:30 - 13:00
Room: L226

Authors (first author is the speaker)

1. Arne Deloose
Data Analysis and Mathematical Modelling, UGent
2. Jan Verwaeren
Data Analysis and Mathematical Modeling, Ghent University
3. Bernard De Baets
Ghent University

Abstract

Document set expansion is a fundamental problem in Information Retrieval, entailing the augmentation of an initial document set with additional relevant documents retrieved from a larger corpus. While query-based techniques, such as query expansion and refinement, offer effective means to modify initial queries and retrieve supplementary documents, they are inherently limited by their dependence on an initial query and relevance feedback. In contrast, embedding methods provide a promising avenue by representing documents in a feature space, facilitating expansion or contraction based on similarity measures. However, existing approaches often fail to reconcile the advantages of both query reformulation and embedding-based models.

To address these shortcomings, we propose a novel method that integrates query reformulation and embedding-based techniques into a unified framework. Our method aims to augment (allowing for both expansion and contraction) document sets with desirable properties, including high intra-set document embedding similarity, fidelity to the initial document set, and simplicity of description through low-complexity queries. The problem is formalized as a combinatorial optimization problem that can be solved using Mixed Integer Linear Programming. Additionally, to overcome the large computational cost of solving the resulting MILP problem, a computationally efficient heuristic search algorithm is implemented and validated.

Keywords

Status: accepted


Back to the list of papers