Pāriet uz galveno navigāciju Pāriet uz meklēšanu Pāriet uz galveno saturu

Latviešu valodas morfēmu un vārddarināšanas modeļu datubāzes lemmu atlase

Zinātniskās darbības rezultāts: Nodaļa grāmatā/enciklopēdijā/konferences krājumāKonferences zinātniskais rakstsPētniecībakoleģiāli recenzēts

1 Atsauce (Scopus)

Kopsavilkums

The article offers an overview of the first working stage for the project “Database of Latvian Morphemes and Derivational Models (DLMDM)” (No. LZP-2022/1-0013), during which a set of the lemmas database was created. The register of the lemmas was made from The Balanced Corpus of Moderns Latvian, dated to 2018. Originally, 165 090 lemmas had been obtained from corpus texts, and at the end of data revision, 77 124 lemmas were declared valid. The analysis of the lemmas took place in three steps: step 1 – automated selection of the lemmas database, step 2 – manual processing of the lemmas database, step 3 – one more automated checking of the lemmas database. A total of 30 009 lemmas (steps 1 and 3) were invalidated during the automated selection of the lemmas database. These were words that contained characters or symbols that were not letters of the Latvian alphabet, as well as various duplicate shapes. During the manual processing of the lemmas database, 78 518 lemmas were selected and tested for spelling and usage context. At this step, 57 957 lemmas were declared invalid – abbreviations, various words that do not exist in Latvian, etc. Other selected lemmas (total – 20 561) were divided into three groups: (1) lemmas that have been corrected, (2) lemmas that have been left with parallel forms, and (3) lemmas that have not been corrected. These lemmas were included in the database. The final lemmas amount is 77 124, but this number is variable because the process of data revision still proceeds during the next steps of the project.

Tulkotais devuma nosaukumsSelection of lemmas for the database of Latvian morphemes and derivational models
OriģinālvalodaLatviešu
Rīkotāja publikācijas nosaukumsValoda: nozīme un forma
Rīkotāja publikācijas apakšnosaukumsGramatika un valodas elektroniskie resursi
RedaktoriAndra Kalnača, Ilze Lokmane, Daiki Horiguchi
Publikācijas vietaRīga
Lapas225-237
Sējums16
ISBN (Elektroniski)978-9934-36-494-5
DOIs
Publikācijas statussPublicēts - 2025

Publikāciju sērijas

NosaukumsValoda: Nozime un Forma
Sējums16
ISSN (Drukātā versija)2255-9256
ISSN (Elektroniskā versija)2256-0602

Atslēgvārdi

  • datubāzes izstrāde
  • lemmu atlase
  • database’s design
  • database’s startup data
  • lemma selection
  • word formation
  • parallelism of lemmas
  • print error

OECD Zinātnes nozare

  • 6.2 Valodniecība un literatūrzinātne
  • 1.2 Datorzinātne un informātika

Nospiedums

Uzziniet vairāk par pētniecības tēmām “Latviešu valodas morfēmu un vārddarināšanas modeļu datubāzes lemmu atlase”. Kopā tie veido unikālu nospiedumu.

Citēt šo