TY - GEN
T1 - Residual Shuffle-Exchange Network for Fast Processing of Long Sequences
AU - Draguns, Andis
AU - Ozoliņš, Emīls
AU - Šostaks, Agris
AU - Apinis, Matīss
AU - Freivalds, Kārlis
N1 - Publisher Copyright:
Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved
PY - 2021
Y1 - 2021
N2 - Attention is a commonly used mechanism in sequence processing, but it is of O(n2) complexity which prevents its application to long sequences. The recently introduced neural Shuffle-Exchange network offers a computation-efficient alternative, enabling the modelling of long-range dependencies in O(n log n) time. The model, however, is quite complex, involving a sophisticated gating mechanism derived from the Gated Recurrent Unit. In this paper, we present a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization. The proposed architecture not only scales to longer sequences but also converges faster and provides better accuracy. It surpasses the Shuffle-Exchange network on the LAMBADA language modelling task and achieves state-of-the-art performance on the MusicNet dataset for music transcription while being efficient in the number of parameters. We show how to combine the improved Shuffle-Exchange network with convolutional layers, establishing it as a useful building block in long sequence processing applications.
AB - Attention is a commonly used mechanism in sequence processing, but it is of O(n2) complexity which prevents its application to long sequences. The recently introduced neural Shuffle-Exchange network offers a computation-efficient alternative, enabling the modelling of long-range dependencies in O(n log n) time. The model, however, is quite complex, involving a sophisticated gating mechanism derived from the Gated Recurrent Unit. In this paper, we present a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization. The proposed architecture not only scales to longer sequences but also converges faster and provides better accuracy. It surpasses the Shuffle-Exchange network on the LAMBADA language modelling task and achieves state-of-the-art performance on the MusicNet dataset for music transcription while being efficient in the number of parameters. We show how to combine the improved Shuffle-Exchange network with convolutional layers, establishing it as a useful building block in long sequence processing applications.
UR - https://ojs.aaai.org/index.php/AAAI/article/view/16890
UR - https://www.scopus.com/pages/publications/85130090401
U2 - 10.1609/aaai.v35i8.16890
DO - 10.1609/aaai.v35i8.16890
M3 - Conference paper
SN - 9781713835974
VL - 8B
T3 - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
SP - 7245
EP - 7253
BT - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
ER -