Kopsavilkums
The Latvian National Corpora Collection (LNCC), accessible through Korpuss.lv, is an extensive and diverse collection of about 40 text and spoken corpora, totalling 2.8 billion tokens. These corpora represent a wide range of text types, such as news articles, blogs, scientific texts, parliamentary debates, and essays. Importantly, almost all the corpora in the LNCC have been re-annotated with a uniform morpho-syntactic annotation scheme, enabling federated search and consistent linguistic analysis across different text types and genres. This feature is especially valuable for computational linguistics and language technology development, offering objective data for studies in lexicography, terminology, grammar, semantics, and language learning. Thus, Korpuss.lv emerges as a critical tool in the digital humanities, helping to develop and refine language technologies and research methodologies.
| Oriģinālvaloda | Angļu |
|---|---|
| Lapas (no-līdz) | 636-645 |
| Lapu skaits | 10 |
| Žurnāls | Baltic Journal of Modern Computing |
| Sējums | 12 |
| Izdevuma numurs | 4 |
| DOIs | |
| Publikācijas statuss | Publicēts - 2024 |
OECD Zinātnes nozare
- 1.2 Datorzinātne un informātika
Nospiedums
Uzziniet vairāk par pētniecības tēmām “Korpuss.lv - a Versatile Platform for Digital Humanities”. Kopā tie veido unikālu nospiedumu.Citēt šo
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver