Sequence modeling neural networks are being deployed over a growing number of applications, a phenomenon partially motivated by the advent of Large Language Models (LLMs), Neural Networks often characterized by billions of parameters. Such a size poses an obstacle to their deployment in constrained devices, which motivates the development of compression methods. In this work, we introduce a new parameter-sharing method that leverages the embedding matrix to learn the model's hidden layers. To demonstrate its effectiveness, we present a new architecture family called ShareBERT, which can preserve up to 95.5% of BERT accuracy performances, using only 5M parameters (21.9x fewer parameters) without the help of Knowledge Distillation. The evaluation of multiple linguistic benchmarks showcases that our compression method does not negatively affect the model's learning capabilities, instead, it can be beneficial for representation learning. The method is robust and flexible across different neural architecture types (such as Recurrent, Convolution, and Transformers), layers (e.g., encoder, decoder, autoregressive, and non-autoregressive modules), and tasks (e.g., translation, captioning, and language modeling). Our proposal pushes the model compression to a new level by enabling the design of near-zero architectures, and on top of that, it is orthogonal to most existing approaches, which can be further applied to ease the deployment in low-powered and embedded devices. Code is available at https://github.com/jchenghu/sharebert.
Embeddings hidden layers learning for neural network compression / Hu, Jia Cheng; Cavicchioli, R.; Capotondi, A.. - In: NEURAL NETWORKS. - ISSN 0893-6080. - 191:(2025), pp. 107794-107794. [10.1016/j.neunet.2025.107794]
Embeddings hidden layers learning for neural network compression
Hu J. C.;Cavicchioli R.;Capotondi A.
2025
Abstract
Sequence modeling neural networks are being deployed over a growing number of applications, a phenomenon partially motivated by the advent of Large Language Models (LLMs), Neural Networks often characterized by billions of parameters. Such a size poses an obstacle to their deployment in constrained devices, which motivates the development of compression methods. In this work, we introduce a new parameter-sharing method that leverages the embedding matrix to learn the model's hidden layers. To demonstrate its effectiveness, we present a new architecture family called ShareBERT, which can preserve up to 95.5% of BERT accuracy performances, using only 5M parameters (21.9x fewer parameters) without the help of Knowledge Distillation. The evaluation of multiple linguistic benchmarks showcases that our compression method does not negatively affect the model's learning capabilities, instead, it can be beneficial for representation learning. The method is robust and flexible across different neural architecture types (such as Recurrent, Convolution, and Transformers), layers (e.g., encoder, decoder, autoregressive, and non-autoregressive modules), and tasks (e.g., translation, captioning, and language modeling). Our proposal pushes the model compression to a new level by enabling the design of near-zero architectures, and on top of that, it is orthogonal to most existing approaches, which can be further applied to ease the deployment in low-powered and embedded devices. Code is available at https://github.com/jchenghu/sharebert.| File | Dimensione | Formato | |
|---|---|---|---|
|
NN_Sharebert.pdf
Open access
Tipologia:
VOR - Versione pubblicata dall'editore
Dimensione
3.06 MB
Formato
Adobe PDF
|
3.06 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris




