Skip to main navigation Skip to search Skip to main content

Multilingual clustering of streaming news

  • Priberam Labs
  • University of Edinburgh
  • Innovation Labs LETA

Research output: Chapter in Book/Report/Conference proceedingConference paperResearchpeer-review

33 Citations (Scopus)

Abstract

Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual story clusters. Unlike typical clustering approaches that consider a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. Our method is simple to implement, computationally efficient and produces state-of-the-art results on datasets in German, English and Spanish.

Original languageEnglish
Title of host publicationProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
EditorsEllen Riloff, David Chiang, Julia Hockenmaier, Jun'ichi Tsujii
Place of PublicationStroudsburg
PublisherAssociation for Computational Linguistics
Pages4535-4544
ISBN (Print)9781948087841
Publication statusPublished - 2018

Publication series

NameProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018

OECD Field of Science

  • 1.2 Computer and Information Sciences

Fingerprint

Dive into the research topics of 'Multilingual clustering of streaming news'. Together they form a unique fingerprint.

Cite this