List of Speech Corpora

From CLSP Wiki
Jump to: navigation, search
Category:Speech corpora

This page does not include an exhaustive list of the LDC corpora available at /export/corpora*/LDC. To find whether an LDC corpus is available, please refer to How to find LDC corpora on the CLSP network.

Speech samples with phonetic transcriptions

Telephone speech

LDC96S46
LDC96S47
LDC96S57
LDC96S58
LDC96S49
LDC96S34
LDC96S35
LDC96S37
LDC97S43
LDC97S45
NIST Speech Disc R63_1_1
NIST Speech Disc R76_1_1
LDC98T27
LDC98S69
LDC95S22
LDC93S2 NTIMIT
LDC94S18
LDC95S27
LDC93S8

LDC96S41

Broadcast

Broadcast Speech
All corpora are on CDs and can be found in Barton 320.

LIX97S44
LDC2001S91
LDC98S74
LDC2000S86
LDC2000S89
LDC98S74

Microphone speech

Microphone Speech
All corpora are on CDs and can be found in Barton 320.


LDC93S3B
LDC93S3C
LDC93S1

LDC93S6A
LDC94S13A

Mobile-radio speech

Mobile-Radio Speech
All corpora are on CDs and can be found in Barton 320.

LDC94S14B
LDC94S14C
LDC94S14D

Uncategorized speech corpora

  • AMI Meeting Corpus
/export/corpora4/ami
  • AURORA
/export/corpora5/AURORA
  • Automatic Speech recognition In Reverberant Environments (ASpIRE)
/export/corpora5/ASpIRE
  • Buckeye corpus
/export/corpora5/buckeye
  • CHiME Speech Separation and Recognition Challenge data
/export/corpora5/CHiME
/export/corpora/CHiME4
  • CSJ Corpus of Spontaneous Japanese
  • LILA Cellular Telephone Speech Databases
/export/corpora5/Appen
  • LRE 2009 and 2011 (Language Recognition Evaluation)
/export/corpora5/NIST
  • JHU Music Speech and Noise Corpus (MUSAN)
/export/corpora/JHU/musan
  • OpenSAT
/export/corpora5/
  • REVERB
/export/corpora5/REVERB_2014
  • SRE
/export/corpora5/SRE
  • Switchboard data
/export/corpora/MSU
  • Ted speeches corpus
/export/corpora/TEDSPEE
  • TEDLIUM
/export/corpora5/TEDLIUM_release1
/export/corpora5/TEDLIUM_release2
  • TIDIGITS (Studio Quality Speaker-Independent Connected-Digit Corpus)
/export/corpora4/NIST
  • VoxCeleb speaker ID corpus
/export/corpora/VoxCeleb
  • TransTac speech-speech translation
/export/corpora2/TRANSTAC