Efficient Compression of Large Language Models with Distillation and Fine-Tuning
Main Article Content
Abstract
With the widespread adoption of large language models (LLMs), their extensive parameter scale and high computational cost pose significant challenges for practical deployment. To address this issue, this study proposes a method that integrates Knowledge Distillation and Parameter-Efficient Fine-Tuning (PEFT) to reduce computational overhead while preserving high performance. In the knowledge distillation phase, experiments are conducted using different temperature parameters to analyze their impact on student model learning. The role of various feature distillation levels in model compression is also explored. Experimental results indicate that moderate temperature parameters enhance the distillation effect. Moreover, selecting an appropriate feature layer for distillation improves the generalization ability of the student model. In the fine-tuning phase, the performance of LoRA (Low-Rank Adaptation) is compared with full fine-tuning. Results show that LoRA offers significant advantages in inference speed and computational efficiency, whereas full-parameter fine-tuning achieves superior accuracy and language understanding. Comprehensive experimental findings confirm that a well-designed combination of knowledge distillation and fine-tuning can achieve effective model compression while maintaining performance. Future research can integrate additional compression techniques, such as pruning and quantization, to further enhance model adaptability and computational efficiency. This approach provides a promising solution for deploying large-scale language models in low-resource environments.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
Mind forge Academia also operates under the Creative Commons Licence CC-BY 4.0. This allows for copy and redistribute the material in any medium or format for any purpose, even commercially. The premise is that you must provide appropriate citation information.