Working Memory Connections for LSTM

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.

Working Memory Connections for LSTM / Landi, Federico; Baraldi, Lorenzo; Cornia, Marcella; Cucchiara, Rita. - In: NEURAL NETWORKS. - ISSN 0893-6080. - 144:(2021), pp. 334-341. [10.1016/j.neunet.2021.08.030]

Working Memory Connections for LSTM

Landi, Federico;Baraldi, Lorenzo;Cornia, Marcella;Cucchiara, Rita

2021

Abstract

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2021
			
	Rivista
	
				NEURAL NETWORKS
			
	N° del Volume
	
				144
			
	Pagina iniziale
	
				334
			
	Pagina finale
	
				341
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.neunet.2021.08.030
			
	Codice WoS
	
				WOS:000709437000007
			
	Codice Scopus
	
				2-s2.0-85115133405
			
	Codice PubMed
	
				34547671
			
	Citazione
	
				Working Memory Connections for LSTM / Landi, Federico; Baraldi, Lorenzo; Cornia, Marcella; Cucchiara, Rita. - In: NEURAL NETWORKS. - ISSN 0893-6080. - 144:(2021), pp. 334-341. [10.1016/j.neunet.2021.08.030]
			
	Tutti gli autori
	
						Landi, Federico; Baraldi, Lorenzo; Cornia, Marcella; Cucchiara, Rita
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
2021_NN_LSTM.pdf Open Access dal 05/09/2023 Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 433.86 kB Formato Adobe PDF Visualizza/Apri	433.86 kB	Adobe PDF	Visualizza/Apri
1-s2.0-S0893608021003439-main (1).pdf Accesso riservato Tipologia: VOR - Versione pubblicata dall'editore Dimensione 1.01 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.01 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1252276

Citazioni

12

214

160

social impact