Pāriet uz galveno navigāciju Pāriet uz meklēšanu Pāriet uz galveno saturu

Residual Shuffle-Exchange Network for Fast Processing of Long Sequences

  • University of Latvia

Zinātniskās darbības rezultāts: Nodaļa grāmatā/enciklopēdijā/konferences krājumāKonferences zinātniskais rakstsPētniecībakoleģiāli recenzēts

5 Atsauces (Scopus)

Kopsavilkums

Attention is a commonly used mechanism in sequence processing, but it is of O(n2) complexity which prevents its application to long sequences. The recently introduced neural Shuffle-Exchange network offers a computation-efficient alternative, enabling the modelling of long-range dependencies in O(n log n) time. The model, however, is quite complex, involving a sophisticated gating mechanism derived from the Gated Recurrent Unit. In this paper, we present a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization. The proposed architecture not only scales to longer sequences but also converges faster and provides better accuracy. It surpasses the Shuffle-Exchange network on the LAMBADA language modelling task and achieves state-of-the-art performance on the MusicNet dataset for music transcription while being efficient in the number of parameters. We show how to combine the improved Shuffle-Exchange network with convolutional layers, establishing it as a useful building block in long sequence processing applications.

OriģinālvalodaAngļu
Rīkotāja publikācijas nosaukums35th AAAI Conference on Artificial Intelligence, AAAI 2021
Lapas7245-7253
Lapu skaits9
Sējums8B
ISBN (Elektroniski)9781713835974
DOIs
Publikācijas statussPublicēts - 2021

Publikāciju sērijas

Nosaukums35th AAAI Conference on Artificial Intelligence, AAAI 2021
Sējums8B

OECD Zinātnes nozare

  • 1.2 Datorzinātne un informātika

Nospiedums

Uzziniet vairāk par pētniecības tēmām “Residual Shuffle-Exchange Network for Fast Processing of Long Sequences”. Kopā tie veido unikālu nospiedumu.

Citēt šo