Improving Neural Machine Translation with Dynamic Data Expansion and Pseudo-Parallel Sentence Generation

Main Article Content

Zhongjie Gong

Abstract

A data enhancement method based on dynamic data expansion is proposed to address the data sparsity issue in machine translation, aiming to narrow the performance gap between neural machine translation (NMT) and human translation, and to tackle the problem of insufficient training corpus. This method involves training the translation model by introducing noise into the target-side sentences and generating new pseudo-parallel sentence pairs in conjunction with the source-side utterances. To produce a variety of final translations, a data augmentation approach is employed, which constructs pseudo-sentence pairs using a sampling decoding strategy during the decoding stage. Various baseline model approaches and decoding strategies are compared through experiments. The results demonstrate that the proposed data augmentation method effectively mitigates the problem of limited generalization ability in NMT models, thereby enhancing sentence representation and improving overall NMT performance.

Article Details

How to Cite
Gong, Z. (2023). Improving Neural Machine Translation with Dynamic Data Expansion and Pseudo-Parallel Sentence Generation. Journal of Computer Science and Software Applications, 3(4), 18–22. Retrieved from https://mfacademia.org/index.php/jcssa/article/view/142
Section
Articles