Skip to main navigation Skip to search Skip to main content

Korpuss.lv - a Versatile Platform for Digital Humanities

  • Roberts Darģis
  • , Baiba Saulīte
  • University of Latvia

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

The Latvian National Corpora Collection (LNCC), accessible through Korpuss.lv, is an extensive and diverse collection of about 40 text and spoken corpora, totalling 2.8 billion tokens. These corpora represent a wide range of text types, such as news articles, blogs, scientific texts, parliamentary debates, and essays. Importantly, almost all the corpora in the LNCC have been re-annotated with a uniform morpho-syntactic annotation scheme, enabling federated search and consistent linguistic analysis across different text types and genres. This feature is especially valuable for computational linguistics and language technology development, offering objective data for studies in lexicography, terminology, grammar, semantics, and language learning. Thus, Korpuss.lv emerges as a critical tool in the digital humanities, helping to develop and refine language technologies and research methodologies.

Original languageEnglish
Pages (from-to)636-645
Number of pages10
JournalBaltic Journal of Modern Computing
Volume12
Issue number4
DOIs
Publication statusPublished - 2024

Keywords

  • corpora collection
  • corpus linguistics
  • federated search
  • noSketch Engine
  • timeline

Fingerprint

Dive into the research topics of 'Korpuss.lv - a Versatile Platform for Digital Humanities'. Together they form a unique fingerprint.

Cite this