SYnthesizing SPeech in INdian languages (SYSPIN)

Empowering voice technology in nine Indian languages to foster AI innovation and accessibility.

The SYSPIN project offers publicly available, studio-recorded text-to-speech (TTS) datasets in multiple Indian languages. These validated speech and text files are designed for academic research, industrial development, and innovation in TTS synthesis. The corpus, created by IISc Bengaluru, is released under a CC-BY-4.0 license, ensuring open access for researchers and developers aiming to advance speech technology in Indian languages.

Total Duration(hr)

920

Total Languages

Total Speakers

Total Sentences

462311

Recording Specs

48kHz, 24-bits

Citations:

When this corpus is used, please cite the following reference:

Abhayjeet et al. 'SYSPIN_S1.0 Corpus - A TTS Corpus of 900+ hours in nine Indian Languages', 2025.

To download the SYSPIN dataset, please click the button below:

Download Dataset

Funded by

Terms and conditions