Training an NMT system for legal texts of a low-resource language variety (South Tyrolean German - Italian)

Antoni Oliver, Sergi Álvarez, Egon W. Stemle, Elena Chiocchetti

Producció científica: Capítol de llibreCapítolRecercaAvaluat per experts

Resum

This paper illustrates the process of training and evaluating NMT systems for a language pair that includes a low-resource language variety. A parallel corpus of legal texts for Italian and South Tyrolean German has been compiled, with South Tyrolean German being the low-resourced language variety. As the size of the compiled corpus is insufficient for the training, we have combined the corpus with several parallel corpora using data weighting at sentence level. We then performed an evaluation of each combination and of two popular commercial systems.
Idioma originalAnglès
Títol de la publicacióResearch and Implementations and Case Studies
EditorsCarolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Rachel Bawden, V�ctor M. Sanchez-Cartagena, Patrick Cadwell, Ekaterina Lapshinova-Koltunski, Vera Cabarrao, Konstantinos Chatzitheodorou, Mary Nurminen, Diptesh Kanojia, Helena Moniz
EditorEuropean Association for Machine Translation
Pàgines573-579
Nombre de pàgines7
ISBN (electrònic)9781068690709
Estat de la publicacióPublicada - 2024

Sèrie de publicacions

NomProceedings of the 25th Annual Conference of the European Association for Machine Translation, EAMT 2024
Volum1

Fingerprint

Navegar pels temes de recerca de 'Training an NMT system for legal texts of a low-resource language variety (South Tyrolean German - Italian)'. Junts formen un fingerprint únic.

Com citar-ho