Skip to main navigation Skip to search Skip to main content

Latvian Newswire Information Extraction System and Entity Knowledge Base

  • University of Latvia

Research output: Chapter in Book/Report/Conference proceedingConference paperResearchpeer-review

6 Citations (Scopus)

Abstract

This paper describes an information extraction system designed for obtaining CV-style structured information about publicly mentioned persons, organizations and their relations by analyzing newswire archives in the Latvian language. The described text analysis pipeline consists of morphosyntactic analysis, NER and coreference resolution, and a semantic role labeling system based on FrameNet principles. We also implement an entity linking process, matching the entity mentions in each document to an entity knowledge base that is initially seeded with authoritative information on relevant people and organizations. The accuracy of automated frame extraction varies depending on specifics of each frame type, but the average accuracy currently is 53% F-score for frame target identification, and 61% for frame element role classification. The currently targeted volume of text is the total archives of Latvian newspapers, magazines and news portals, consisting of about 3.5 million articles.

Original languageEnglish
Title of host publicationHuman Language Technologies - The Baltic Perspective
Subtitle of host publicationProceedings of the 6th International Conference Baltic HLT 2014
EditorsAndrius Utka, Gintare Grigonyte, Jurgita Kapociute-Dzikiene, Jurgita Vaicenoniene
PublisherIOS Press BV
Pages119-125
Number of pages7
ISBN (Electronic)9781614994411
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event6th International Conference on Human Language Technologies - The Baltic Perspective, Baltic HLT 2014 - Kaunas, Lithuania
Duration: 26 Sept 201427 Sept 2014

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume268
ISSN (Print)0922-6389
ISSN (Electronic)1879-8314

Conference

Conference6th International Conference on Human Language Technologies - The Baltic Perspective, Baltic HLT 2014
Country/TerritoryLithuania
CityKaunas
Period26/09/1427/09/14

Keywords

  • Information extraction
  • knowledge base
  • text summarization

Fingerprint

Dive into the research topics of 'Latvian Newswire Information Extraction System and Entity Knowledge Base'. Together they form a unique fingerprint.

Cite this