Round-trip training approach for bilingually low-resource statistical machine translation systems

Benyamin Ahmadnia, Gholamreza Haffari, Javier Serrano

Research output: Contribution to journalArticleResearchpeer-review

2 Citations (Scopus)

Abstract

Statistical Machine Translation (SMT) is making good progress in recent years. Since SMT systems are based on data-driven approach, they learn from millions or even billions of words from human-translated texts. The quality of SMT systems heavily depends on the data that we use for training step, not only its quality and amount, but also on how relevant it is for the texts that we wish to translate. However, human labeling is very costly and time consuming. In this article we develop a learning mechanism by proposing a round-trip training scenario as a reliable retraining approach through a communication framework for making effective use of monolingual text to tackle the training data scarcity, and improve translation quality. We present detailed experimental results using Spanish-English as a high-resource language pair, and Persian-Spanish as a low-resource language pair. We demonstrate that in all cases translation quality is improved.

Original languageAmerican English
Pages (from-to)167-185
Number of pages19
JournalInternational Journal of Artificial Intelligence
Volume17
Issue number1
Publication statusPublished - 1 Mar 2019

Keywords

  • Low-resource language pairs
  • Natural language processing
  • Round-tripping approach
  • Statistical machine translation

Fingerprint Dive into the research topics of 'Round-trip training approach for bilingually low-resource statistical machine translation systems'. Together they form a unique fingerprint.

  • Cite this