®®®® SIIA Público

Título del libro: Proceedings Of The 1st Workshop On Natural Language Processing For Indigenous Languages Of The Americas, Americasnlp 2021
Título del capítulo: Ayuuk-Spanish Neural Machine Translator

Autores UNAM:
IVAN VLADIMIR MEZA RUIZ;
Autores externos:

Idioma:

Año de publicación:
2021
Palabras clave:

Neural machine translation; African languages; Automatic alignment; Current performance; Machine translator; Neural architectures; Parallel corpora; State of Mexico; Sub words; Tokenization; Word level; Computational linguistics


Resumen:

This paper presents the first neural machine translator system for the Ayuuk language. In our experiments we translate from Ayuuk to Spanish, and from Spanish to Ayuuk. Ayuuk is a language spoken in the Oaxaca state of Mexico by the Ayuukjä?äy people (in Spanish commonly known as Mixes). We use different sources to create a low-resource parallel corpus, more than 6, 000 phrases. For some of these resources we rely on automatic alignment. The proposed system is based on the Transformer neural architecture and it uses sub-word level tokenization as the input. We show the current performance given the resources we have collected for the San Juan Güichicovi variant, they are promising, up to 5 BLEU. We based our development on the Masakhane project for African languages. © 2021 Association for Computational Linguistics


Entidades citadas de la UNAM: