Development of a Syntax-Based Model for English - Igbo Statistical Machine Translation
Keywords:
Syntax, Religious, Language model, Translation model, Domain and CorporaAbstract
Semantic errors occurred due to syntactical difference between English and Igbo languages in existing statistical machine translators. Therefore, a syntax-based model was developed in this research for English-Igbo statistical machine translation.
Parallel corpus was obtained from the religious domain and word alignments were made on the English and Igbo corpora with
GIZA++. The Hidden Markov Model uses the word alignments produced by GIZA++ to estimate a maximum likelihood translation table. The Language model for the target language was built using IRSTLM toolkit and the model was tuned using Minimum error rate training (MERT). The developed SMT system was evaluated using BLEU and NIST and the results were compared to an existing related work. Results showed that the developed model outperformed the previous system by up to 0.3 BLEU score and 3.0 NIST scores respectively.