This article reports the process of building a bilingual (Spanish-Catalan) text corpus balanced in parallel taking into account prosodic features for both languages. We propose an expert guideline for text manipulation that in combination with greedy algorithms significantly improves the quality of the selected corpus. The application of this methodology to a radio news corpus empirically supports the proposed strategy.
|Title of host publication||5th International Conference on Speech Prosody 2010|
|Publication status||Published - 2010|
|Name||Proceedings of the International Conference on Speech Prosody|