logo
Voice-To-Voice Models And Beyond Meat: Still Not Ready For Mass Consumption

Voice-To-Voice Models And Beyond Meat: Still Not Ready For Mass Consumption

Forbes18-06-2025
Arkadiy Telegin is the cofounder and CTO of Leaping AI, a conversational AI platform supporting customer experience departments worldwide.
I'm vegan. So when plant-based meat started going mainstream, I was elated. The tech was impressive, the marketing confident and, for a moment, it felt like we were on the cusp of a food revolution. Impossible Burgers hit Burger King. Beyond was everywhere. Investors poured in. The future, it seemed, had arrived.
Except it hadn't.
Today, plant-based meat is still a niche. Prices are high, availability is inconsistent and adoption is slower than expected. It's not that the products disappeared. They just haven't yet integrated into everyday life the way we imagined. This is a classic case of psychological distance: a cognitive bias where things that feel close because they're exciting or well-promoted turn out to be farther off than we think.
In voice AI, voice-to-voice model development is going through the same thing. Despite recent latency, reasoning and sound quality improvements, there's been a stubborn insistence on using older, more established technologies to build conversational AI platforms. Why is that?
After LLMs appeared, the first commercial voice AI applications all used a 'cascading' approach following a three-step sequence:
• Speech-To-Text (STT): Transcribe the user's speech to text.
• Large Language Model (LLM): Use an LLM to respond to the transcribed user's speech.
• Text-To-Speech (TTS): Synthesize speech from your response and play it back.
This is a standard, time-tested approach that's been in use even before LLMs came around, primarily for language translation.
Then, last fall, OpenAI launched its Realtime API, which promised a one-step speech-to-speech AI model capable of parsing audio directly to generate real-time responses, resulting in agents that sound much more human, can natively detect emotions and can be more 'tone aware.' OpenAI's entry into the space was the most commercially significant development yet, leading many to anticipate a new era for single-step voice-to-voice AI models that could feasibly be used in real-world applications.
Over six months later, while Realtime API's launch has created a lot of excitement around direct speech-to-speech AI models—the recently announced Nova Sonic model from Amazon and Sesame's base model for its Maya assistant are just a few examples—when it comes to production-level applications, my industry colleagues and customers alike are still more comfortable using the status quo of multi-step pipelines, with no plans to change that any time soon.
There are a few key reasons why that is the case.
Working with audio presents inherent difficulties. Text is clean, modular and easily manipulated. It allows for storage, searchability and mid-call edits. Audio, in contrast, is less forgiving. Even post-call tasks like analysis and summarization often necessitate transcription. In-call operations, such as managing state or editing messages, are more cumbersome with audio.
Function calling is crucial in production use-cases—fetching data, triggering workflows, querying APIs. Currently, one-step voice-to-voice models lag in this area. Stanford computer science professor and DeepLearning.ai founder Andrew Ng, who also cofounded the Google Brain project, has publicly shared some of these limitations.
It is much easier to create and curate a good function-calling dataset for a text-based model than for a multimodal model. As a result of this, the function-calling capabilities of text-first models will always outperform those of voice-to-voice models. Considering that function calling is not perfect even for text models yet and is a crucial requirement for commercial applications, it will take some time until voice-to-voice catches up to meet production standards.
Ng shares the example of gut-checking responses like "Yes, I can issue you a refund" to ensure refunds are allowable against the current company policy and how an API can be called to issue that refund if the customer requests one. That's more doable to build in a cascading workflow but not as reliable for one-step pipelines for the reasons stated above.
Since OpenAI launched its Realtime API, there have been a number of complaints that have made developers uneasy about using it in production, including audio cutting off unexpectedly and hallucinations interrupting live conversations. Others have complained of hallucinations that don't get captured in the transcript, making it challenging to catch and debug them.
This isn't to say one-step voice-to-voice AI is a dead end. Far from it. The potential for enhanced user experience—handling interruptions, conveying emotion, capturing tone—is immense. Many in the industry, our team included, are actively experimenting, preparing for the moment when it matures. Startups and major players alike continue to invest in speech-native approaches as they anticipate a more emotionally resonant, real-time future.
In other words: It's a matter of when, not if.
In the meantime, multi-step pipelines for voice-to-voice AI models continue to win on reliability and production-readiness. With steady improvements, particularly in behavior and function calling, the moment for single-step models will come. Until then, the trusted cascading approach will carry the load, and I'm still not eating at Burger King.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Here's Why Aehr Test Systems Surged in June (Hint: It's AI related)
Here's Why Aehr Test Systems Surged in June (Hint: It's AI related)

Yahoo

time34 minutes ago

  • Yahoo

Here's Why Aehr Test Systems Surged in June (Hint: It's AI related)

Aehr Test Systems is not a one-trick pony, and the company is impressively diversifying its revenue streams. The market is speculating that its new customers could prove lucrative and lasting. 10 stocks we like better than Aehr Test Systems › Aehr Test Systems (NASDAQ: AEHR) stock rose by 35.5% in June, according to data provided by S&P Global Market Intelligence. The move comes as positive developments in the end markets that Aehr is targeting helped support the idea that the company can diversify its revenue streams and improve its growth in the process. The company is best known for its test and burn-in equipment for the silicon carbide (SiC) wafer-level burn-in (WLBI) market. As management notes, this end market contributed 90% of Aehr's revenue in 2024. Unfortunately, at least from a near-term perspective, the key driver for Aehr's SiC WLBI solutions is the electric vehicle (EV) market and its associated charging infrastructure. Aehr's key customers in this market, which include ON Semiconductor, have come under pressure due to the ongoing relatively high interest rate negatively impacting EV sales. For reference, and by way of example, Wall Street analysts expect ON Semiconductor's sales to decline by 16.5% in 2025. Consequently, Aehr needs to develop other markets to help offset weakness in its core SiC WLBI market. And the good news is it's not only doing that, but those markets are also in growth mode. The progress was noted in the third-quarter earnings presentation in April with CEO Gayn Erickson outlining that SiC WLBI revenue is now tracking to "less than 40%, with artificial intelligence (AI) processors burn-in representing over 35% of our business in just the first year." Furthermore, in the third quarter, Aehr had four customers accounting for 10% of its revenue, with three of them coming from new markets for Aehr: WLBI for AI processors Packaged part burn-in (PPBI) "for qualification and ongoing process monitoring of AI processors" "WLBI of gallium nitride (GaN) semiconductors" Fast forward to June, and the positive momentum in AI and GaN WLBI spending has continued with Nvidia's well-received earnings report coming at the end of May. Moreover, Nvidia is working on developing a data center architecture for the next generation of data centers, starting in 2027, and has named GaN semiconductor company Navitas Semiconductor as a partner. As such, the market is speculating that Navitas could be a potential customer of Aehr Test Systems. The growth of alternative revenue streams is a significant plus for Aehr's investment case, and demand for SiC WLBI is likely to improve over time as EV investment is expected to increase. All told, the company's revenue and earnings remain highly cyclical, but its new end markets are helping to reduce its heavy reliance on EV spending, which has benefited the stock in June. Before you buy stock in Aehr Test Systems, consider this: The Motley Fool Stock Advisor analyst team just identified what they believe are the for investors to buy now… and Aehr Test Systems wasn't one of them. The 10 stocks that made the cut could produce monster returns in the coming years. Consider when Netflix made this list on December 17, 2004... if you invested $1,000 at the time of our recommendation, you'd have $699,558!* Or when Nvidia made this list on April 15, 2005... if you invested $1,000 at the time of our recommendation, you'd have $976,677!* Now, it's worth noting Stock Advisor's total average return is 1,060% — a market-crushing outperformance compared to 180% for the S&P 500. Don't miss out on the latest top 10 list, available when you join . See the 10 stocks » *Stock Advisor returns as of June 30, 2025 Lee Samaha has no position in any of the stocks mentioned. The Motley Fool has positions in and recommends Nvidia. The Motley Fool has a disclosure policy. Here's Why Aehr Test Systems Surged in June (Hint: It's AI related) was originally published by The Motley Fool

Nvidia, Microsoft, Palantir Lead Wedbush's Top Tech Bets on $2T AI Boom
Nvidia, Microsoft, Palantir Lead Wedbush's Top Tech Bets on $2T AI Boom

Yahoo

time36 minutes ago

  • Yahoo

Nvidia, Microsoft, Palantir Lead Wedbush's Top Tech Bets on $2T AI Boom

July 4 - Wedbush Securities projects a more than 10% gain for major tech stocks in the second half of 2025, driven by a surge in enterprise and government AI spending. The top five picks are Nvidia (NASDAQ:NVDA), Meta Platforms (META), Microsoft (NASDAQ:MSFT), Palantir (NASDAQ:PLTR) and Tesla (NASDAQ:TSLA). Analysts estimate roughly $2 trillion will flow into AI initiatives over the next three years, unlocking new use cases and boosting demand for both software and semiconductors. Warning! GuruFocus has detected 4 Warning Signs with NVDA. Wedbush points to recent strength in tech amid tariff and geopolitical headwinds as a prelude to further market outperformance. They highlight the rollout of large language models and the true adoption of generative AI in corporate settings as key catalysts for renewed rallies. With enterprise consumption set to accelerate, we believe software and chip leaders are well positioned to lead this AI Revolution' through 2026, the note added. Investors will watch second?half earnings and AI deployment updates closely to see if these leaders can sustain the momentum. This article first appeared on GuruFocus. Sign in to access your portfolio

Insider Alert: Amazon Exec Offloads Shares Worth Over Half a Million
Insider Alert: Amazon Exec Offloads Shares Worth Over Half a Million

Yahoo

time36 minutes ago

  • Yahoo

Insider Alert: Amazon Exec Offloads Shares Worth Over Half a Million

July 4 - Douglas J. Herrington, Amazon's Senior Vice President and head of Worldwide Stores, sold 2,500 shares worth $550,144 on July 1, according to a Form 4 filing with the Securities and Exchange Commission. The sales, executed under a pre?arranged Rule 10b5?1 plan set up on Nov. 7, 2024, ranged from $219.17 to $220.90 per share. Market watchers say the move appears driven by routine portfolio rebalancing rather than any fresh company insight. Warning! GuruFocus has detected 4 Warning Sign with AMZN. (NASDAQ:AMZN) stock ticked up about 2% on Thursday, closing at $223.41, hovering near its 52?week midpoint. The e?commerce powerhouse has delivered roughly a 13% gain over the past year, underpinned by $650.3 billion in trailing?twelve?month revenue. Post?sale, Herrington still directly controls 514,550 shares and holds another 6,592.5 through Amazon's 401(k) plan, signaling sustained faith in the retailer's growth story. Investors will keep a close eye on insider trades and the upcoming Q2 earnings report later this month to see if executive moves hint at broader sentiment inside the world's largest online marketplace. This article first appeared on GuruFocus. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store