Efficient Compression of Large Language Models with Distillation and Fine-Tuning

Anda Kai; Lin Zhu; Jiangchuan Gong

doi:10.5281/zenodo.15165118

pdf

Published: Oct 1, 2023

DOI: https://doi.org/10.5281/zenodo.15165118

Anda Kai

Lin Zhu

Jiangchuan Gong

Abstract

With the widespread adoption of large language models (LLMs), their extensive parameter scale and high computational cost pose significant challenges for practical deployment. To address this issue, this study proposes a method that integrates Knowledge Distillation and Parameter-Efficient Fine-Tuning (PEFT) to reduce computational overhead while preserving high performance. In the knowledge distillation phase, experiments are conducted using different temperature parameters to analyze their impact on student model learning. The role of various feature distillation levels in model compression is also explored. Experimental results indicate that moderate temperature parameters enhance the distillation effect. Moreover, selecting an appropriate feature layer for distillation improves the generalization ability of the student model. In the fine-tuning phase, the performance of LoRA (Low-Rank Adaptation) is compared with full fine-tuning. Results show that LoRA offers significant advantages in inference speed and computational efficiency, whereas full-parameter fine-tuning achieves superior accuracy and language understanding. Comprehensive experimental findings confirm that a well-designed combination of knowledge distillation and fine-tuning can achieve effective model compression while maintaining performance. Future research can integrate additional compression techniques, such as pruning and quantization, to further enhance model adaptability and computational efficiency. This approach provides a promising solution for deploying large-scale language models in low-resource environments.

How to Cite

Kai, A., Zhu, L., & Gong, J. (2023). Efficient Compression of Large Language Models with Distillation and Fine-Tuning. Journal of Computer Science and Software Applications, 3(4), 30–38. https://doi.org/10.5281/zenodo.15165118

Issue

Vol. 3 No. 4 (2023)

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

Mind forge Academia also operates under the Creative Commons Licence CC-BY 4.0. This allows for copy and redistribute the material in any medium or format for any purpose, even commercially. The premise is that you must provide appropriate citation information.

Article Sidebar

Main Article Content

Abstract

Article Details

Most read articles by the same author(s)