Last Fridays Talks: Signals and Decoding
Last Fridays Talks
Each last-Friday-of-the-month, we are hosting the Last Fridays Talks, where one of our seven Collaboratories will present insights from their current work. Join us for a discussion on results and recent papers, followed by some socializing afterwards for everyone who wish to attend.
Alternatives to global self-attention for self-supervised audio representation learning
Transformers, enabled by global self-attention, have become the deep learning architecture of choice for self-supervised representation learning, spanning multiple modalities and domains, such as vision, language, and audio. In this talk, we discuss:
- How explicitly modelling local-global attention using the Multi-Window Multi-Head Attention module enabled us to learn better audio representations within a Masked Autoencoder framework, as evaluated on 10 diverse audio tasks. (As featured in our recent ICLR 2024 paper).
- Discuss our recent (under review) work on Structured State Space Models for audio feature representation learning in a masked spectrogram modelling framework. We proposed Self-Supervised Audio Mamba (SSAM), which consistently yielded ~40% better performance across 10 diverse audio tasks over comparable transformer baselines.
Sarthak Yadav is a PhD Fellow at the Department of Electronic Systems, Aalborg University and the Pioneer Centre for Artificial Intelligence, Copenhagen. His research is focused on self-supervised audio representation learning, with an emphasis on approaches beyond Transformers and self-attention for modelling sequences.
The Talk will be streamed at DTU as well:
Building 321, room 227
Richard Petersens Plads, 2800 Lyngby
Join us online on ZOOM via this link (Meeting ID: 648 7986 6417).
Related content