TY - GEN
T1 - Review of non-english corpora annotated for emotion classification in text
AU - Ļeonova, Viktorija
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - In this paper we try to systematize the information about the available corpora for emotion classification in text for languages other than English with the goal to find what approaches could be used for low-resource languages with close to no existing works in the field. We analyze the corresponding volume, emotion classification schema, language of each corresponding corpus and methods employed for data preparation and annotation automation. We’ve systematized twenty-four papers representing the corpora and found that corpora were mostly for the most spoken world languages: Hindi, Chinese, Turkish, Arabic, Japanese etc. A typical corpus contained several thousand of manually-annotated entries, collected from a social network, annotated by three annotators each and was processed by a few machine learning methods, such as linear SVM and Naïve Bayes and (more recent ones) a couple of neural networks methods, such as CNN.
AB - In this paper we try to systematize the information about the available corpora for emotion classification in text for languages other than English with the goal to find what approaches could be used for low-resource languages with close to no existing works in the field. We analyze the corresponding volume, emotion classification schema, language of each corresponding corpus and methods employed for data preparation and annotation automation. We’ve systematized twenty-four papers representing the corpora and found that corpora were mostly for the most spoken world languages: Hindi, Chinese, Turkish, Arabic, Japanese etc. A typical corpus contained several thousand of manually-annotated entries, collected from a social network, annotated by three annotators each and was processed by a few machine learning methods, such as linear SVM and Naïve Bayes and (more recent ones) a couple of neural networks methods, such as CNN.
KW - Emotion annotation
KW - Emotion classification
KW - Machine learning
KW - Review
KW - Text corpus
UR - https://link.springer.com/chapter/10.1007%252F978-3-030-57672-1_8
UR - https://www.scopus.com/pages/publications/85089720649
U2 - 10.1007/978-3-030-57672-1_8
DO - 10.1007/978-3-030-57672-1_8
M3 - Conference paper
SN - 9783030576714
VL - 1243 CCIS
T3 - Communications in Computer and Information Science
SP - 96
EP - 108
BT - Databases and Information Systems - 14th International Baltic Conference, DB and IS 2020, Proceedings
A2 - Robal, Tarmo
A2 - Haav, Hele-Mai
A2 - Penjam, Jaan
A2 - Matulevicius, Raimundas
ER -