SYnthesizing SPeech in INdian languages (SYSPIN)

Empowering voice technology in nine Indian languages to foster AI innovation and accessibility.

The SYSPIN project offers publicly available, studio-recorded text-to-speech (TTS) datasets in multiple Indian languages. These validated speech and text files are designed for academic research, industrial development, and innovation in TTS synthesis. The corpus, created by IISc Bengaluru, is released under a CC-BY-4.0 license, ensuring open access for researchers and developers aiming to advance speech technology in Indian languages.

Total Duration(hr)

920

Total Languages

9

Total Speakers

18

Total Sentences

462311

Recording Specs

48kHz, 24-bits

Citations:

When this corpus is used, please cite the following reference:

Abhayjeet et al. ‘SYSPIN_S1.0 Corpus - A TTS Corpus of 900+ hours in nine Indian Languages’, 2025.

Filter and Download Speech Data

Customize and access speech data by selecting your preferences.

No matching dataset found for the selected filters.

IISC Logo
SpireLab Logo

Funded by

GIZ Logo

© 2024 SYnthesizing SPeech in INdian languages

Terms and conditions