Abstract
Statistical Machine Translation (SMT) is making good progress in recent years. Since SMT systems are based on data-driven approach, they learn from millions or even billions of words from human-translated texts. The quality of SMT systems heavily depends on the data that we use for training step, not only its quality and amount, but also on how relevant it is for the texts that we wish to translate. However, human labeling is very costly and time consuming. In this article we develop a learning mechanism by proposing a round-trip training scenario as a reliable retraining approach through a communication framework for making effective use of monolingual text to tackle the training data scarcity, and improve translation quality. We present detailed experimental results using Spanish-English as a high-resource language pair, and Persian-Spanish as a low-resource language pair. We demonstrate that in all cases translation quality is improved.
Original language | English |
---|---|
Pages (from-to) | 167-185 |
Number of pages | 19 |
Journal | International Journal of Artificial Intelligence |
Volume | 17 |
Issue number | 1 |
Publication status | Published - 1 Mar 2019 |
Keywords
- Low-resource language pairs
- Natural language processing
- Round-tripping approach
- Statistical machine translation