Latest news with #SoketAILabs


Mint
5 hours ago
- Business
- Mint
Beyond text: Why voice is emerging as India's next frontier for AI interaction
Voice is fast becoming the defining layer of human-AI interaction in India, despite being the most challenging to train. Artificial intelligence (AI) startups are sharpening their focus on sculpting this intgeraction with design, authentic emotion, and intent. Yet, India presents a unique challenge: the sheer diversity of its accents, languages, and tonalities. Unlike text, which is relatively uniform, spoken language is richly-layered—with cultural nuances, colloquialisms and emotion. Startups building voice-first AI models are now doubling down on one thing above all else: the depth and diversity of datasets. Why voice is emerging as the frontline interface In India, where oral tradition plays a pivotal role in communication, voice isn't just a convenience—it's a necessity. 'We're not an English-first or even a text-first country. Even when we type in Hindi, we often use the English script instead of Devanagari. That's exactly why we need to build voice-first models—because oral tradition plays such a vital role in our culture," said Abhishek Upperwal, chief executive officer (CEO) of Soket AI Labs. Voice is also proving critical for customer service and accessibility. 'Voice plays a crucial role in bridging accessibility gaps, particularly for users with disabilities," said Mahesh Makhija, leader, technology consulting, at EY. 'Many customers even prefer voicing complaints over typing, simply because talking feels more direct and human. Moreover, voice is far more frictionless than navigating mobile apps or interfaces—especially for users who are digitally-illiterate, older, or not fluent in English," said Makhija, adding that 'communicating in vernacular languages opens access to the next half a billion consumers, which is a major focus for enterprises." Startups like are already deploying voice systems across banking and financial services to streamline customer support, assist with loan applications, and eliminate virtual queues. 'The best way to reach people—regardless of literacy levels or demographics—is through voice in the local language, so it's very important to capture the tonality of the conversations," said Ganesh Gopalan, CEO of The hunt for rich, real-world data As of mid-2025, India's AI landscape shows a clear tilt toward text-based AI, with over 90 Indian companies active in the space, compared to 57 in voice-based AI. Text-based platforms tend to focus on document processing, chat interfaces, and analytics. In contrast, voice-based companies are more concentrated in customer service, telephony, and regional language access, according to data from Tracxn. In terms of funding, voice-first AI startups have attracted larger funding rounds at later stages, while text AI startups show broader distribution, especially at earlier stages. For example, a voice-first AI firm, raised a total of $47.6 million across five funding rounds. Similarly, has cumulatively secured around $102 million, including a major $78.15M Series C round in 2021, making it one of the top-funded startups in voice AI, data from Tracxn shows. However, data remains the foundational challenge for voice models. Voice AI systems need massive, diverse datasets that not only cover different languages, but also regional accents, slangs and emotional tonality. Chaitanya C., co-founder and chief technological officer of Ozonetel Communications, put it simply: 'The datasets matter the most—speaking as an AI engineer, I can say it's not about anything else; it's all about the data." IndiaAI Mission has allocated ₹199.55 crore for datasets—just about 2% of the mission's total ₹10,300 crore budget —while 44% has gone to compute. 'Investments solely in compute are inherently transient—their value fades once consumed. On the other hand, investments in datasets build durable, reusable assets that continue to deliver value over time," said Chaitanya. He also emphasized the scarcity of rich, culturally-relevant data in regional languages like Telugu and Kannada. 'The amount of data easily available in English, when compared with Telugu and Kannada or Hindi, it's not even comparable," he said. 'Somewhere it's just not perfect, it wouldn't be as good as an English story, which is why I wouldn't want it to tell a Telugu story for my kid." 'Some movie comes out, nobody's going to write it in government documents, but people are going to talk about it, and that is lost," he added, pointing out that government datasets often lack cultural nuance and everyday language. Gopalan of agreed. 'The colloquial language is often very different from the written form. Language experts have a great career path ahead of them because they not only understand the language technically, but also know how to converse naturally and grasp colloquial nuances." Startups are now employing creative methods to fill these gaps. 'First, we collect data directly from the field using multiple methods—and we're careful with how we handle that data. Second, we use synthetic data in some cases. Third, we augment that synthetic data further. In addition, we also leverage a substantial amount of open-source data available from universities and other sources," Gopalan said. Synthetic data is artificially-generated data that mimics real-world data for use in training, testing, or validating models. Upperwal added that Soket AI uses a similar approach: 'We start by training smaller AI models with the limited real voice data we have. Once these smaller models are reasonably accurate, we use them to generate synthetic voice data—essentially creating new, artificial examples of speech." However, some intend to consciously stay away from synthetic data. Ankush Sabarwal, CEO and founder of CoRover AI, said the company relies exclusively on real data, deliberately avoiding synthetic data, 'If I am a consumer and I am interacting with an AI bot, the AI bot will become intelligent by the virtue of it interacting with a human like me." The ethical labyrinth of voice AI As companies begin to scale their data pipelines, the new Digital Personal Data Protection (DPDP) Act will shape how they collect and use voice data. 'The DPDP law emphasizes three key areas: it mandates clear, specific, and informed consent before collecting data. Second, it enforces purpose limitation—data can only be used for legitimate, stated purposes like KYC or employment, not unrelated model training. Third, it requires data localization, meaning critical personal data must reside on servers in India," said Makhija. He added, 'Companies have begun including consent notices at the start of customer calls, often mentioning AI training. However, the exact process of how this data flows into model training pipelines is still evolving and will become clearer as DPDP rules are fully implemented." Outsourcing voice data collection raises red flags, too. 'For a deep-tech company like ours, voice data is one of the most powerful forms of IP (intellectual property) we have, and outsourcing it could compromise its integrity and ownership. What if someone is using copyrighted material?" said Gopalan.


Mint
16-06-2025
- Business
- Mint
Inside the making of India's default internet interface
Bengaluru: Geeta Nikam, 38, speaks into her smartphone in Marathi as she makes her way through a bustling vegetable market in Hiware Bazar, a village in Maharashtra, looking for seeds for her farm. A first-time internet and mobile phone user, Nikam has never typed a word. Keyboards, especially in Indic scripts, feel alien. In Ludhiana, a large textile manufacturer with crores of rupees in revenue spends his entire working day talking to people on his phone to get tasks done. A computer system loaded with the business softwares of the world is useless to him. Nikam and the textile manufacturer are part of the 'non-typing majority' among India's 900 million internet users. These are primarily people from Tier II, Tier III cities and villages, where English is uncommon and digital literacy is just emerging. Their preference for voice communication highlights a fundamental need for new interaction methods. 'Nobody's typing in Gujarati or Marathi," says Abhishek Upperwal, founder of Soket AI Labs. Founded in 2019 by Upperwal, Soket AI Labs is an AI research company developing multilingual large language models such as Pragna-1B for Indian languages. It is among the four startups selected by the government under its IndiaAI initiative to co-develop indigenous AI systems. Voice remains the 'primary interface' for most Indians, says Upperwal. These users, including rural entrepreneurs, gig workers and homemakers, are reshaping India's internet, demanding tools that listen and respond in their native languages. Those demands are slowly being addressed by AI startups. In the drought-prone villages of Maharashtra, for instance, farmers can now learn about crop insurance, credit eligibility and weather-resilient agriculture without reading a single word. They receive three-minute voice calls from bots deployed by a local non-banking financial company (NBFC), in partnership with Bengaluru-based conversational AI firm The bot speaks in their own dialect, poses simple questions and delivers tailored advice. In one pilot, more than 15,000 farmers across 120 villages received weekly updates and 38% of them adopted new crop diversification strategies using this information, according to Voice AI is improving accessibility and also reshaping how information is delivered at scale. In Tamil Nadu, the same technology was used to deliver financial literacy and health education to over 12,000 women in 85 villages. Following the initiative, 59% of the participants opened their first savings accounts and 41% reported improved medical savings behaviour, the company states. These pilots, in some ways, demonstrate how a voice-native internet might function. 'India's voice-first internet will likely be a dynamic blend of multilingual, context-aware, and highly personalized experiences," says Ganesh Gopalam, CEO and cofounder of From GUIs to voice In 2019, a Google report said that Hindi had become the second-most used language globally on Google Assistant, just behind English, indicating how voice was gaining ground in multilingual markets such as India. Around 60% of Indian users interact with voice assistants on their smartphones, making voice a core part of everyday digital life. A subsequent report by WATConsult found that 76% of Indian users were familiar with speech and voice-recognition technology, which reflects a natural shift in a country where smartphone access is high, but digital and linguistic literacy is not evenly distributed. A growing reliance on voice has emerged as a workaround to the limitations of Graphical User Interface (GUI)-based systems, which require users to be comfortable navigating English-language menus. A GUI lets you interact with computers and devices using visuals such as buttons, icons and menus. This method of using computers started evolving in the 1970s with Xerox PARC's Alto. GUIs gained popularity with Apple's Macintosh in 1984 and then expanded through Microsoft Windows, transforming computing from complex text-based command systems into a more visual experience. 'GUIs ruled for decades because Apple and Microsoft made screens and clicks the global standard, but they're a poor fit for India's chaotic and voice-driven markets," says Tushar Shinde, founder of Vaani Research, an enterprise voice AI startup. 'With millions of non-typing users juggling dialects and high-volume businesses, voice is the natural interface. It's how we've always connected," he adds. 'Many entrepreneurs are earning crores in revenue but barely use any software, because they spend most of their day just talking to people. The systems built for them were never in a format they found acceptable," says Shinde, referring to enterprise softwares built specifically for small and medium businesses. 'That's where voice comes in." His startup builds voice agents for insurance, banking and healthcare clients. Voice as an interface is now gaining prominence with advanced AI, as it offers hands-free convenience and challenges the long-standing dominance of GUIs in many everyday interactions. This is especially relevant in markets such as India, where GUI-based systems that come designed with dropdown menus, buttons, toggles and input fields require a level of literacy and linguistic ease that many users simply don't have. Though Indic keyboards were introduced as an alternative for users like Nikam, they remain clunky and unreliable. Autocorrect often yields incorrect results and filling out forms in Hindi or Tamil becomes a frustrating ordeal. Even targeted solutions such as Indus OS, founded by an IIT Bombay alumnus to create a multilingual app ecosystem, struggled to take off due to its heavy reliance on text navigation. In contrast, voice, which is rooted in India's oral culture, is now emerging as a more intuitive bridge to digital access. Unlike GUI-based apps, voice systems deliver content naturally in the user's own words. A 2025 study cited by Gnani suggests that educational content delivered in local dialects results in 47% higher retention than standardized language formats. Government push India's voice AI sector is experiencing massive growth, driven by the country's linguistic diversity and increasing demand for voice-first digital interactions. Under the government's IndiaAI mission, launched in 2024 with a five-year budget of ₹10,372 crore, four startups—Sarvam, Soket Labs, and been selected to build foundational AI models in India. Sarvam AI has developed Sarvam-M, a 24-billion-parameter multilingual large language model trained in 10 Indian languages, aiming to enhance reasoning tasks such as mathematics, coding and multilingual comprehension. Despite early criticism, the model is recognized for its technical achievements in building AI infrastructure within India. specializes in voice-first agentic AI solutions, and supports over 40 languages, including 12 Indian languages. The platform handles more than 30 million voice interactions daily, serving over 150 enterprises across India and the US. In Gurugram, Soket AI Labs is commercializing its Realtime Speech API, enabling AI agents to augment call centres with support for Hindi, Tamil and Marathi, addressing India's non-typing users. Last year, in one of its more ambitious endeavours, Soket AI Labs developed its foundational AI model, Pragna-1B (like Open AI's ChatGPT) with a focus on Indian languages. But in the absence of venture capital funding for research efforts, Soket pivoted to building monetizable voice APIs for customer support, marketing and sales, which proved to be a more immediate route to revenue, one that still serves India's non-typing majority. Startups building voice-based applications in India lack the kind of institutional support and infrastructure readily available to their western counterparts. As a result, most are building voice-first applications for business use-cases to continue to fund their foundational research journey and attract investments. Tech chops and challenges Building voice AI for India is as much a linguistic challenge as a technological one. 'If there are five tokens for English, there will be almost 15 to 20 for Hindi," says Upperwal, pointing to the extra compute required for Indic languages. A token is a unit of text, such as a word or character or subword, which language models use to process and generate language. The more tokens a sentence requires, the more memory and compute power the model needs to handle it, making Indian languages more resource intensive to train. Soket introduced a novel tokenization method that significantly reduces this burden, making AI systems faster and more cost-effective for Indian languages. Upperwal says that apart from core infrastructure, India's voice-first internet also relies on domain-specific intelligence tailored to Indian needs. 'Sectors like education and law are especially underserved. Western LLM models like ChatGPT don't work well with Indian legal systems. It often mixes up Indian and US laws," he says, recalling how a legal AI startup ran into challenges while fine-tuning an open-source model after it began hallucinating hybrid jurisprudence. Soket is now exploring collaborations with domain experts to build models from scratch, rooted in Indian context and vernacular data. 'We're not domain experts, we build systems. But if startups come in with their expertise, we can train models together and open-source them for the ecosystem," he explains. Indian startups still rely heavily on models like OpenAI and Deepgram, trained on Western datasets. These models often misinterpret names, accents or local nuances, especially in sectors such as healthcare or banking, where clarity is critical. To close this gap, the IndiaAI mission has allocated subsidies and compute access to startups such as Soket and Vaani, encouraging them to build speech systems trained on Indian datasets. 'These infrastructure breakthroughs don't just make AI cheaper. They make it accessible to speakers of Hindi, Gujarati or Marathi," says Shinde. 'When an AI enabled system can understand a manufacturer in a remote village asking for a loan, that's inclusion at the infrastructure level." For Global South Startups, meanwhile, are also rethinking how voice might reshape user behaviour. 'Imagine booking a cab or shopping online with a single voice command," says Shinde. In low-bandwidth towns where GUIs fail to load, voice interfaces will allow people to interact with the web through language and not literacy. 'We could go back to how people actually used to shop conversationally, through discovery, without filters and dropdowns," he adds. According to Prashanth Prakash, founding partner at Accel, voice will become the default interface for India's most critical sectors and redefine how people interact with information. 'In healthcare and education, voice won't replace doctors or teachers, but it will radically change explainability and access," he says. From appointment bookings to post-discharge summaries and classroom companions that personalize learning, voice interfaces offer a lower friction and highly intuitive alternative. 'Every doctor will have a co-pilot. Every workflow will start with voice," he adds. The government thinks voice tools will play a key role in frontline service delivery. From crop advisories to citizen helplines, voice is being positioned as the cornerstone of public digital infrastructure, says Abhishek Singh, CEO of IndiaAI mission. 'If a voice-enabled advisory tool trained on Indian datasets can help a farmer here, it'll likely work anywhere in the Global South," Singh adds. Limitations with voice Voice-based applications offer a natural bridge to India's non-typing internet users, but the path is riddled with complexity. India's diverse linguistic landscape acts as a hurdle in accurate speech recognition. While voice bots are changing habits, making digital interaction easier for those unfamiliar with typing, accuracy drops sharply in noisy environments and with dialectal variations. 'Replacing GUIs with voice at scale in India faces major design and infrastructure challenges, particularly for first-time digital users," says Gopalan of pointing out that ensuring accurate voice recognition across diverse accents, dialects and languages continues to remain a challenge. Farmer Geeta Nikam admits that she uses voice commands to browse ecommerce websites but has never made a purchase using her device yet. India needs a system that enables confidence, ease and convenience for someone like Nikam to not just browse but place an order successfully using voice commands. When a billion Indians like her are able to navigate the web effortlessly by speaking, just as others do by typing, India's internet will look and sound different. Read more on IndiaAI Mission in tomorrow's Long Story.