A Comparative Study of Compressed, Learned, and Traditional Indexing Methods for Integer Data

The rapid evolution of learned data structures has revolutionized database indexing, particularly for sorted integer datasets. While learned indexes excel in static scenarios due to their low memory footprint, reduced storage requirements, and fast lookup times, benchmarks like SOSD and TLI have largely overlooked compressed indexes and SIMD-based implementations of traditional indexes. This paper addresses this gap by introducing a comprehensive benchmarking framework that (i) evaluates traditional, learned, and compressed indexes across 12 datasets (real and synthetic) of varying types and sizes; (ii) integrates state-of-the-art SIMD-enhanced B-Tree variants; and (iii) measures critical performance metrics such as memory usage, construction time, and lookup efficiency. Our findings reveal that while learned indexes minimize memory usage, a feature useful when internal memory constraints are mandatory, SIMD-enhanced B-Trees consistently achieve superior lookup times with comparable extra space. On the other hand, compressed indexes like LA-vector and EliasFano provide very effective compression of the indexed data with slower access speeds (2x-3x). Another contribution of this paper is a publicly available benchmarking framework (composed of code and datasets) that makes our experiments reproducible and extensible to other indexes and datasets.

A Comparative Study of Compressed, Learned, and Traditional Indexing Methods for Integer Data

Bellomo, Lorenzo;Cianci, Giuseppe;de Rosa, Luca;Ferragina, Paolo;Odorisio, Mattia

2025

Abstract

The rapid evolution of learned data structures has revolutionized database indexing, particularly for sorted integer datasets. While learned indexes excel in static scenarios due to their low memory footprint, reduced storage requirements, and fast lookup times, benchmarks like SOSD and TLI have largely overlooked compressed indexes and SIMD-based implementations of traditional indexes. This paper addresses this gap by introducing a comprehensive benchmarking framework that (i) evaluates traditional, learned, and compressed indexes across 12 datasets (real and synthetic) of varying types and sizes; (ii) integrates state-of-the-art SIMD-enhanced B-Tree variants; and (iii) measures critical performance metrics such as memory usage, construction time, and lookup efficiency. Our findings reveal that while learned indexes minimize memory usage, a feature useful when internal memory constraints are mandatory, SIMD-enhanced B-Trees consistently achieve superior lookup times with comparable extra space. On the other hand, compressed indexes like LA-vector and EliasFano provide very effective compression of the indexed data with slower access speeds (2x-3x). Another contribution of this paper is a publicly available benchmarking framework (composed of code and datasets) that makes our experiments reproducible and extensible to other indexes and datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Titolo alternativo
	
				A Comparative Study of Indexing Methods for Integer Data
			
	Anno di pubblicazione
	
				2025
			
	Settore Scientifico Disciplinare (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Titolo del Convegno
	
				23rd International Symposium on Experimental Algorithms, SEA 2025
			
	Luogo del Convegno
	
				Venezia
			
	Periodo del Convegno
	
				2025
			
	Titolo del Volume
	
				Leibniz International Proceedings in Informatics, LIPIcs
			
	Editore
	
				Schloss Dagstuhl- Leibniz-Zentrum fur Informatik, Dagstuhl
			
	ISBN
	
				978-3-95977-375-1
			
	DOI
	
				https://dx.doi.org/10.4230/lipics.sea.2025.5
			
	Parole chiave
	
				algorithm engineering; benchmark; compression; indexing data structures
			
	Informazioni sul finanziamento della ricerca
	
				ERC Starting grant No. 101039208
			
	Dati FAIR
	
	URL del dataset/codice sorgente collegato alla pubblicazione
	
									https://github.com/LorenzoBellomo/SortedStaticIndexBenchmark
								
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
LIPIcs.SEA.2025.5.pdf accesso aperto Tipologia: Published version Licenza: Creative Commons Dimensione 1.04 MB Formato Adobe PDF	1.04 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11384/155424

Citazioni

ND

0

ND

2

social impact