TY - GEN
T1 - Latvian Newswire Information Extraction System and Entity Knowledge Base
AU - Paikens, Peteris
N1 - Publisher Copyright:
© 2014 The Authors and IOS Press.
PY - 2014
Y1 - 2014
N2 - This paper describes an information extraction system designed for obtaining CV-style structured information about publicly mentioned persons, organizations and their relations by analyzing newswire archives in the Latvian language. The described text analysis pipeline consists of morphosyntactic analysis, NER and coreference resolution, and a semantic role labeling system based on FrameNet principles. We also implement an entity linking process, matching the entity mentions in each document to an entity knowledge base that is initially seeded with authoritative information on relevant people and organizations. The accuracy of automated frame extraction varies depending on specifics of each frame type, but the average accuracy currently is 53% F-score for frame target identification, and 61% for frame element role classification. The currently targeted volume of text is the total archives of Latvian newspapers, magazines and news portals, consisting of about 3.5 million articles.
AB - This paper describes an information extraction system designed for obtaining CV-style structured information about publicly mentioned persons, organizations and their relations by analyzing newswire archives in the Latvian language. The described text analysis pipeline consists of morphosyntactic analysis, NER and coreference resolution, and a semantic role labeling system based on FrameNet principles. We also implement an entity linking process, matching the entity mentions in each document to an entity knowledge base that is initially seeded with authoritative information on relevant people and organizations. The accuracy of automated frame extraction varies depending on specifics of each frame type, but the average accuracy currently is 53% F-score for frame target identification, and 61% for frame element role classification. The currently targeted volume of text is the total archives of Latvian newspapers, magazines and news portals, consisting of about 3.5 million articles.
KW - Information extraction
KW - knowledge base
KW - text summarization
UR - https://www.scopus.com/pages/publications/84948687631
U2 - 10.3233/978-1-61499-442-8-119
DO - 10.3233/978-1-61499-442-8-119
M3 - Conference paper
AN - SCOPUS:84948687631
T3 - Frontiers in Artificial Intelligence and Applications
SP - 119
EP - 125
BT - Human Language Technologies - The Baltic Perspective
A2 - Utka, Andrius
A2 - Grigonyte, Gintare
A2 - Kapociute-Dzikiene, Jurgita
A2 - Vaicenoniene, Jurgita
PB - IOS Press BV
T2 - 6th International Conference on Human Language Technologies - The Baltic Perspective, Baltic HLT 2014
Y2 - 26 September 2014 through 27 September 2014
ER -