Skip to main navigation Skip to search Skip to main content

Evaluating Open-Source LLMs in Low-Resource Languages: Insights from Latvian High School Exams

Research output: Chapter in Book/Report/Conference proceedingConference paperResearchpeer-review

4 Citations (Scopus)

Abstract

The latest large language models (LLM) have significantly advanced natural language processing (NLP) capabilities across various tasks.However, their performance in low-resource languages, such as Latvian with 1.5 million native speakers, remains substantially underexplored due to both limited training data and the absence of comprehensive evaluation benchmarks.This study addresses this gap by conducting a systematic assessment of prominent open-source LLMs on natural language understanding (NLU) and natural language generation (NLG) tasks in Latvian.We utilize standardized high school centralized graduation exams as a benchmark dataset, offering relatable and diverse evaluation scenarios that encompass multiple-choice questions and complex text analysis tasks.Our experimental setup involves testing models from the leading LLM families, including Llama, Qwen, Gemma, and Mistral, with OpenAI's GPT-4 serving as a performance reference.The results reveal that certain open-source models demonstrate competitive performance in NLU tasks, narrowing the gap with GPT-4.However, all models exhibit notable deficiencies in NLG tasks, specifically in generating coherent and contextually appropriate text analyses, highlighting persistent challenges in NLG for low-resource languages.These findings contribute to efforts to develop robust multilingual benchmarks and to improve LLM performance in diverse linguistic contexts.

Original languageEnglish
Title of host publicationNlp4dh 2024 4th International Conference on Natural Language Processing for Digital Humanities Proceedings of the Conference
EditorsMika Hamalainen, Emily Ohman, So Miyagawa, Khalid Alnajjar, Yuri Bizzoni
Place of PublicationStroudsburg
PublisherAssociation for Computational Linguistics
Pages289-293
Number of pages5
ISBN (Electronic)9798891761810
ISBN (Print)979-889176181-0, 9798891761810
DOIs
Publication statusPublished - 2024

Publication series

NameNLP4DH 2024 - 4th International Conference on Natural Language Processing for Digital Humanities, Proceedings of the Conference

Fingerprint

Dive into the research topics of 'Evaluating Open-Source LLMs in Low-Resource Languages: Insights from Latvian High School Exams'. Together they form a unique fingerprint.

Cite this