TY - GEN
T1 - Using sub-word n-gram models for dealing with OOV in large vocabulary speech recognition for Latvian
AU - Salimbajevs, Askars
AU - Strigins, Jevgenijs
N1 - Publisher Copyright:
© NODALIDA 2015.All right reserved.
PY - 2015
Y1 - 2015
N2 - In the Latvian language, one word can have tens or even hundreds of surface forms. This is a serious problem for large vocabulary speech recognition. Inclusion of every form in vocabulary will make it intractable, but, on the other hand, even with a vocabulary of 400K, the out-ofvocabulary (OOV) rate will be very high. In this paper, the authors investigate the possibility of using sub-word vocabularies where words are split into frequent and common parts. The results of our experiment show that this allows to significantly reduce the OOV rate.
AB - In the Latvian language, one word can have tens or even hundreds of surface forms. This is a serious problem for large vocabulary speech recognition. Inclusion of every form in vocabulary will make it intractable, but, on the other hand, even with a vocabulary of 400K, the out-ofvocabulary (OOV) rate will be very high. In this paper, the authors investigate the possibility of using sub-word vocabularies where words are split into frequent and common parts. The results of our experiment show that this allows to significantly reduce the OOV rate.
UR - https://www.scopus.com/pages/publications/84992577137
M3 - Conference paper
AN - SCOPUS:84992577137
SN - 9789175190983
T3 - Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015
SP - 281
EP - 285
BT - Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015
PB - Association for Computational Linguistics (ACL)
T2 - 20th Nordic Conference of Computational Linguistics, NODALIDA 2015
Y2 - 11 May 2015 through 13 May 2015
ER -