Abstract
The variable writing system in early Latvian texts is a bottleneck for non-linguists wishing to explore SENIE, the Corpus of early written Latvian texts. The writing system also poses many challenges for linguists. The Unicode version of SENIE, launched on the NoSketchEngine platform (https://nosketch.korpuss.lv/#dashboard?corpname=senie_unicode) in 2022, offers significant new possibilities. After the process of normalization of historical spelling the access to the Corpus has become more user-friendly. Queries made in the Latvian National Corpus Collection (LNCC) (https://korpuss.lv/search) display search results in the early texts as well.
| Original language | English |
|---|---|
| Pages (from-to) | 548-559 |
| Number of pages | 12 |
| Journal | Baltic Journal of Modern Computing |
| Volume | 12 |
| Issue number | 4 |
| DOIs | |
| Publication status | Published - 2024 |
Keywords
- diachronic corpora
- digitization
- historical writing
- Latvian
- normalization
- NoSketchEngine
- the Corpus of early written Latvian
OECD Field of Science
- 6.2 Languages and Literature
Fingerprint
Dive into the research topics of 'New Possibilities for Exploring Early Latvian Texts: Switching to the NoSketchEngine'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver