Sequence modeling neural networks are being deployed over a growing number of applications, a phenomenon partially motivated by the advent of Large Language Models (LLMs), Neural Networks often characterized by billions of parameters. Such a size poses an obstacle to their deployment in constrained devices, which motivates the development of compression methods. In this work, we introduce a new parameter-sharing method that leverages the embedding matrix to learn the model's hidden layers. To demonstrate its effectiveness, we present a new architecture family called ShareBERT, which can preserve up to 95.5% of BERT accuracy performances, using only 5M parameters (21.9x fewer parameters) without the help of Knowledge Distillation. The evaluation of multiple linguistic benchmarks showcases that our compression method does not negatively affect the model's learning capabilities, instead, it can be beneficial for representation learning. The method is robust and flexible across different neural architecture types (such as Recurrent, Convolution, and Transformers), layers (e.g., encoder, decoder, autoregressive, and non-autoregressive modules), and tasks (e.g., translation, captioning, and language modeling). Our proposal pushes the model compression to a new level by enabling the design of near-zero architectures, and on top of that, it is orthogonal to most existing approaches, which can be further applied to ease the deployment in low-powered and embedded devices. Code is available at https://github.com/jchenghu/sharebert.

Embeddings hidden layers learning for neural network compression / Hu, Jia Cheng; Cavicchioli, R.; Capotondi, A.. - In: NEURAL NETWORKS. - ISSN 0893-6080. - 191:(2025), pp. 107794-107794. [10.1016/j.neunet.2025.107794]

Embeddings hidden layers learning for neural network compression

Hu J. C.;Cavicchioli R.;Capotondi A.
2025

Abstract

Sequence modeling neural networks are being deployed over a growing number of applications, a phenomenon partially motivated by the advent of Large Language Models (LLMs), Neural Networks often characterized by billions of parameters. Such a size poses an obstacle to their deployment in constrained devices, which motivates the development of compression methods. In this work, we introduce a new parameter-sharing method that leverages the embedding matrix to learn the model's hidden layers. To demonstrate its effectiveness, we present a new architecture family called ShareBERT, which can preserve up to 95.5% of BERT accuracy performances, using only 5M parameters (21.9x fewer parameters) without the help of Knowledge Distillation. The evaluation of multiple linguistic benchmarks showcases that our compression method does not negatively affect the model's learning capabilities, instead, it can be beneficial for representation learning. The method is robust and flexible across different neural architecture types (such as Recurrent, Convolution, and Transformers), layers (e.g., encoder, decoder, autoregressive, and non-autoregressive modules), and tasks (e.g., translation, captioning, and language modeling). Our proposal pushes the model compression to a new level by enabling the design of near-zero architectures, and on top of that, it is orthogonal to most existing approaches, which can be further applied to ease the deployment in low-powered and embedded devices. Code is available at https://github.com/jchenghu/sharebert.
2025
191
107794
107794
Embeddings hidden layers learning for neural network compression / Hu, Jia Cheng; Cavicchioli, R.; Capotondi, A.. - In: NEURAL NETWORKS. - ISSN 0893-6080. - 191:(2025), pp. 107794-107794. [10.1016/j.neunet.2025.107794]
Hu, Jia Cheng; Cavicchioli, R.; Capotondi, A.
File in questo prodotto:
File Dimensione Formato  
NN_Sharebert.pdf

Open access

Tipologia: VOR - Versione pubblicata dall'editore
Dimensione 3.06 MB
Formato Adobe PDF
3.06 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1389408
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact