Beam-search SIEVE for low-memory speech recognition

Ciaperoni, Martino; Katsamanis, Athanasios; Gionis, Aristides; Karras, Panagiotis

doi:10.21437/Interspeech.2024-2457

A capacity to recognize speech offline eliminates privacy concerns and the need for an internet connection. Despite efforts to reduce the memory demands of speech recognition systems, these demands remain formidable and thus popular tools such as Kaldi run best via cloud computing. The key bottleneck arises form the fact that a bedrock of such tools, the Viterbi algorithm, requires memory that grows linearly with utterance length even when contained via beam search. A recent recasting of the Viterbi algorithm, SIEVE, eliminates the path length factor from space complexity, but with a significant practical runtime overhead. In this paper, we develop a variant of SIEVE that lessens this runtime overhead via beam search, retains the decoding quality of standard beam search, and waives its linearly growing memory bottleneck. This space-complexity reduction is orthogonal to decoding quality and complementary to memory savings in model representation and training.

Beam-search SIEVE for low-memory speech recognition

Ciaperoni, Martino;Katsamanis, Athanasios;Gionis, Aristides;Karras, Panagiotis

2024

Abstract

A capacity to recognize speech offline eliminates privacy concerns and the need for an internet connection. Despite efforts to reduce the memory demands of speech recognition systems, these demands remain formidable and thus popular tools such as Kaldi run best via cloud computing. The key bottleneck arises form the fact that a bedrock of such tools, the Viterbi algorithm, requires memory that grows linearly with utterance length even when contained via beam search. A recent recasting of the Viterbi algorithm, SIEVE, eliminates the path length factor from space complexity, but with a significant practical runtime overhead. In this paper, we develop a variant of SIEVE that lessens this runtime overhead via beam search, retains the decoding quality of standard beam search, and waives its linearly growing memory bottleneck. This space-complexity reduction is orthogonal to decoding quality and complementary to memory savings in model representation and training.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Settore Scientifico Disciplinare (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Titolo del Convegno
	
				25th Interspeech Conferece
			
	Luogo del Convegno
	
				Kos, Greece
			
	Periodo del Convegno
	
				1-5 settembre 2024
			
	Titolo del Volume
	
				25th Annual Conference of the International Speech Communication Associaton (INTERSPEECH 2024) : Kos, Greece, 1-5 September 2024
			
	Editore
	
				Curran Associates, Inc.
			
	ISBN
	
				979-8-3313-0506-2
			
	DOI
	
				https://dx.doi.org/10.21437/Interspeech.2024-2457
			
	Parole chiave
	
				Memory efficient algorithms; speech recognition
			
	Informazioni sul finanziamento della ricerca
	
				Amazon Science
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno