Embeddings hidden layers learning for neural network compression

Hu, Jia Cheng; Cavicchioli, R.; Capotondi, A.

doi:10.1016/j.neunet.2025.107794

Sequence modeling neural networks are being deployed over a growing number of applications, a phenomenon partially motivated by the advent of Large Language Models (LLMs), Neural Networks often characterized by billions of parameters. Such a size poses an obstacle to their deployment in constrained devices, which motivates the development of compression methods. In this work, we introduce a new parameter-sharing method that leverages the embedding matrix to learn the model's hidden layers. To demonstrate its effectiveness, we present a new architecture family called ShareBERT, which can preserve up to 95.5% of BERT accuracy performances, using only 5M parameters (21.9x fewer parameters) without the help of Knowledge Distillation. The evaluation of multiple linguistic benchmarks showcases that our compression method does not negatively affect the model's learning capabilities, instead, it can be beneficial for representation learning. The method is robust and flexible across different neural architecture types (such as Recurrent, Convolution, and Transformers), layers (e.g., encoder, decoder, autoregressive, and non-autoregressive modules), and tasks (e.g., translation, captioning, and language modeling). Our proposal pushes the model compression to a new level by enabling the design of near-zero architectures, and on top of that, it is orthogonal to most existing approaches, which can be further applied to ease the deployment in low-powered and embedded devices. Code is available at https://github.com/jchenghu/sharebert.

Embeddings hidden layers learning for neural network compression / Hu, Jia Cheng; Cavicchioli, R.; Capotondi, A.. - In: NEURAL NETWORKS. - ISSN 0893-6080. - 191:(2025), pp. 107794-107794. [10.1016/j.neunet.2025.107794]

Embeddings hidden layers learning for neural network compression

Hu J. C.;Cavicchioli R.;Capotondi A.

2025

Abstract

Sequence modeling neural networks are being deployed over a growing number of applications, a phenomenon partially motivated by the advent of Large Language Models (LLMs), Neural Networks often characterized by billions of parameters. Such a size poses an obstacle to their deployment in constrained devices, which motivates the development of compression methods. In this work, we introduce a new parameter-sharing method that leverages the embedding matrix to learn the model's hidden layers. To demonstrate its effectiveness, we present a new architecture family called ShareBERT, which can preserve up to 95.5% of BERT accuracy performances, using only 5M parameters (21.9x fewer parameters) without the help of Knowledge Distillation. The evaluation of multiple linguistic benchmarks showcases that our compression method does not negatively affect the model's learning capabilities, instead, it can be beneficial for representation learning. The method is robust and flexible across different neural architecture types (such as Recurrent, Convolution, and Transformers), layers (e.g., encoder, decoder, autoregressive, and non-autoregressive modules), and tasks (e.g., translation, captioning, and language modeling). Our proposal pushes the model compression to a new level by enabling the design of near-zero architectures, and on top of that, it is orthogonal to most existing approaches, which can be further applied to ease the deployment in low-powered and embedded devices. Code is available at https://github.com/jchenghu/sharebert.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Rivista
	
				NEURAL NETWORKS
			
	N° del Volume
	
				191
			
	Pagina iniziale
	
				107794
			
	Pagina finale
	
				107794
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.neunet.2025.107794
			
	Codice WoS
	
				WOS:001525840200002
			
	Codice Scopus
	
				2-s2.0-105009600851
			
	Codice PubMed
	
				40614456
			
	Citazione
	
				Embeddings hidden layers learning for neural network compression / Hu, Jia Cheng; Cavicchioli, R.; Capotondi, A.. - In: NEURAL NETWORKS. - ISSN 0893-6080. - 191:(2025), pp. 107794-107794. [10.1016/j.neunet.2025.107794]
			
	Tutti gli autori
	
						Hu, Jia Cheng; Cavicchioli, R.; Capotondi, A.
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
NN_Sharebert.pdf Open access Tipologia: VOR - Versione pubblicata dall'editore Dimensione 3.06 MB Formato Adobe PDF Visualizza/Apri	3.06 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris