How Startups Are Building on Vaani
Discover how innovators are leveraging the Vaani dataset to solve real-world problems in speech, AI, and beyond.
What Startups Are Saying
“Vaani's dataset, with its multiple languages, real-life scenarios, background noises, and multi-dialect conversations, is very well-suited for the speech models we train. We have seen good results from fine-tuning our models on these datasets. The work done by the Vaani team in gathering datasets to support real-life use cases for Bharat is truly commendable.”
Convozen.ai
Zaher Abdul, Senior Director AI & ML
“The Vaani Datasets have been invaluable in improving our Speech Models. The quality is excellent, with a great balance of gender variation, detailed metadata, and highly accurate transcripts with precise noise tagging.”
Reverie Language Technology LTD
Pranjal Nayak, Head of R&D
“At SandLogic, we believe India’s AI future must be sovereign, inclusive, and representative of our people. The Vaani dataset captures the richness of Indian speech and has helped us benchmark and enhance our models for stronger performance in both research and enterprise use cases.”
SandLogic Technologies
Dr. Kruthika K R, Founding Researcher
Case Studies
SandLogic
Fine-Tuning Hindi ASR for Real-World Call Analytics Leveraging Vaani Dataset
Problem Statement:
Generic ASR systems fail to accurately transcribe spoken Hindi in call centers, with baseline Word Error Rates (WER) over 55%. This major issue renders call analytics unreliable and negatively impacts compliance, customer experience, and agent performance.
Solution:
To solve this, SandLogic fine-tuned its proprietary 769M-parameter ASR model using a curated, multi-accent Hindi subset of the ARTPARK-IISc/Vaani dataset.