This reading group is led by Paola and Najim. We read papers related to ASR, speaker identification/verification, and other speech-related areas. The papers/topics covered each week are listed below in reverse chronological order.

Starting Spring 2019, the reading group is listed as a 1-credit course, 520.671 ("Speech Technologies Reading Group"). Contact Paola to get on the mailing list.

Spring 2019

Monday 10-11:30am, Barton 225.

Apr 29 (Desh, Nanxin)
van den Oord et al. Representation Learning with Contrastive Predictive Coding (2018)
Child et al. Generating Long Sequences with Sparse Transformers (2019)
Apr 22 (Zili, Jiamin, Fei)
Zhang et al. Fully Supervised Speaker Diarization(2019)
Cai et al. Exploring the Encoding Layer and Loss Function in End-to-EndSpeaker and Language Recognition System(2018)
Apr 15 (Jonathan, Ashish)
Zeyer et al. Improved training of end-to-end attention models for speech recognition (2018)
Sennrich et al. Neural Machine Translation of Rare Words with Subword Units (2016)
Zhou et al. EAST: An Efficient and Accurate Scene Text Detector (2017)
Kozielski et al. Open vocabulary handwriting recognition using combined word-level and character-level language models (2013)
Cai et al. An Open Vocabulary OCR System with Hybrid Word-Subword Language Models (2017)
Arora et al. Using ASR methods for OCR (2019)
Smit et al. Improved subword modeling for WFST-based speech recognition (2017)
Apr 8 (Mahsa)
Introduction to WFSTs for speech recognition
Apr 1 (Saurabh, Raghu)
Rethage et al. A WaveNet for speech denoising
Baevski and Auli Adaptive input representations for neural LM
Mar 25 (Ruizhe, Jeff))
Van der Oord et al. WaveNet: A generative model for raw audio (2016)
Chorowski et al. Unsupervised speech representation learning using WaveNet autoencoders (2019)
Mar 18 (Jinyi)
Kim et al. Joint CTC-Attention based end-to-end speech recognition using multi-task learning (2016)
Xiao et al. Hybrid CTC-Attention based end-to-end speech recognition using subword units (2018)
Yuan et al. An improved hybrid CTC-Attention model for speech recognition (2018)
Mar 11 (Desh)
Povey et al. A time-restricted self-attention layer for ASR (2018)
Sperber at al. Self-attentional acoustic models (2018)
Mar 4 (Vimal, Hossein)
Povey et al. Purely sequence-trained neural networks for ASR based on lattice-free MMI (2016)
Hadian et al. End-to-end speech recognition using lattice-free MMI (2018)
Feb 25 (Raghu)
A walk through back-propagation algorithm
Hypothesis testing
Feb 18
A Primer on GANs (Phani)
Goodfellow et al. Generative Adversarial Networks (2014)
Radford et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (2015)
Mao et al. Least Squares Generative Adversarial Networks (2016)
Introduction to speech enhancement (Saurabh)
Feb 11
Introduction to OCR (Jonathan, Ashish)
Chung et al. (2018). Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces (Jaejin)
Feb 4
Introduction to ASR (Jinyi, Desh, Ruizhe)
Introduction to speaker verification (Fei, Jiamin, Zili)