Vaani Atypical Speech Corpus
Speech technology doesn't work for everyone yet.
The Gap in Speech Technology
Today's ASR systems are built for standard speech. But a large number of people communicate in ways that differ from these assumptions.
For them, voice interfaces break. Dictation fails. Assistive tools fall short.
This gap becomes even sharper in multilingual contexts like India.
A New Kind of Speech Dataset
The Vaani Atypical Speech Corpus is an early effort to address this gap. Built under Project Vaani, in collaboration with Project Euphonia (Google), this dataset focuses on atypical speech in Indic languages — a space that remains largely underrepresented.
- Real-world speech, not lab-controlled recordings
- Diverse speech patterns across conditions such as autism, cerebral palsy, Down syndrome, and speech & hearing impairments
- No reliance on clinical labels — focused on how people actually speak
For anyone working on ASR robustness, accessibility, or personalization — this is a dataset that is hard to find elsewhere.
Project EuphoniaBuilt in collaboration with Project Euphonia — a Google Research initiative working to make speech recognition work for everyone.
How the Data Was Collected
Data Collection
- Participants describe images in their own words
- ~20 recordings per participant
- 20–40 seconds each
- Data collected with Karya
Transcription
- Done by people familiar with the speaker
- Captures intended meaning, not just verbatim text
Validation
- Automated + manual checks
- Ensures audio quality
- Verifies natural speech and safe content
Expanding What "Speech Diversity" Means
Project Vaani has focused on capturing linguistic diversity across India.
This dataset expands that vision — from how language varies to how speech itself varies.
It is an early step toward building speech technologies that work not just across languages, but across people.
Explore and Build
Early-stage dataset — small today, but highly relevant for research and experimentation.
- Test robustness beyond standard benchmarks
- Fine-tune models for real-world inclusivity
- Explore personalization and adaptive systems
If you are working on similar problems or collecting related data, you may reach out to us at vaanicontact@gmail.com