Skip to main navigation Skip to search Skip to main content

Vairākvārdu leksēmu klasificēšana elektroniskajā vārdnīcā „Tēzaurs”

Translated title of the contribution: Classification of multi-word units in the electronic dictionary „Tēzaurs”
  • Matemātikas un informātikas institūts
  • LU Matemātikas un informātikas institūts

Research output: Chapter in Book/Report/Conference proceedingConference paperResearchpeer-review

1 Citation (Scopus)

Abstract

„Tēzaurs” is the largest Latvian electronic dictionary that consists of more than 400,000 entries, including more than 71,000 multi-word expressions (MWEs) from a wide range of sources; these expressions are very varied in form and content. Over the recent years, there has been ongoing work of sorting the MWEs into several groups: phrasemes, idioms, collocations, complex terms, taxonomic group names and multi-word nouns. This article describes the current results of sorting the MWEs and the challenges associated with borderline cases or overlapping categories, as well as unclear criteria for MWE division. It was concluded that additional criteria were needed to distinguish between phrasemes and idioms, as well as between phrasemes and collocations. There are several advantages to dividing MWEs into smaller groups. For dictionary users, the additional information has improved the browsing of the dictionary contents. For the developers of „Tēzaurs”, this process has given more insight into the contents of the MWE data, which enables them to analyse an entire group of MWEs at once, prevent duplicates or discrepancies, amend their positioning in the dictionary entries, and improve the overall quality of the dictionary. For linguists, the newly assembled and structured language material allows for more in-depth studies of each MWE group and highlights new directions of research. The current system of MWE classification is the first step towards an organised system of MWE description and classification, which will require further revision and improvement in the future.

Translated title of the contributionClassification of multi-word units in the electronic dictionary „Tēzaurs”
Original languageLatvian
Title of host publicationValoda: nozime un forma 16. Gramatika un valodas elektroniskie resursi = Language: Meaning and Form 16. Grammar and Electronic Resources of Language
EditorsAndra Kalnaca, Ilze Lokmane, Daiki Horiguchi
Place of PublicationRīga
PublisherLU Akadēmiskais apgāds
Pages119-132
Volume16
ISBN (Electronic)978-993436494-5
DOIs
Publication statusPublished - 2025
Externally publishedYes
EventVALODA: Nozime un Forma 16. Gramatika un Valodas Elektroniskie Resursi - LANGUAGE: Meaning and Form 16. Grammar and Electronic Resources of Language - Riga, Latvia
Duration: 26 Dec 202526 Dec 2025

Publication series

NameValoda: Nozime un Forma
Volume16
ISSN (Print)2255-9256
ISSN (Electronic)2256-0602

Conference

ConferenceVALODA: Nozime un Forma 16. Gramatika un Valodas Elektroniskie Resursi - LANGUAGE: Meaning and Form 16. Grammar and Electronic Resources of Language
Country/TerritoryLatvia
CityRiga
Period26/12/2526/12/25

OECD Field of Science

  • 6.2 Languages and Literature

Fingerprint

Dive into the research topics of 'Classification of multi-word units in the electronic dictionary „Tēzaurs”'. Together they form a unique fingerprint.

Cite this