Improving Neural Machine Translation with Dynamic Data Expansion and Pseudo-Parallel Sentence Generation
Main Article Content
Abstract
A data enhancement method based on dynamic data expansion is proposed to address the data sparsity issue in machine translation, aiming to narrow the performance gap between neural machine translation (NMT) and human translation, and to tackle the problem of insufficient training corpus. This method involves training the translation model by introducing noise into the target-side sentences and generating new pseudo-parallel sentence pairs in conjunction with the source-side utterances. To produce a variety of final translations, a data augmentation approach is employed, which constructs pseudo-sentence pairs using a sampling decoding strategy during the decoding stage. Various baseline model approaches and decoding strategies are compared through experiments. The results demonstrate that the proposed data augmentation method effectively mitigates the problem of limited generalization ability in NMT models, thereby enhancing sentence representation and improving overall NMT performance.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
Mind forge Academia also operates under the Creative Commons Licence CC-BY 4.0. This allows for copy and redistribute the material in any medium or format for any purpose, even commercially. The premise is that you must provide appropriate citation information.