4 days ago
Why AI Is A Double Edged Sword—And What Companies Can Do About It
Shivam Shorewala, CEO of Rimble, is a globally sought-after speaker and business advisor specializing in AI and analytics.
Business leaders all around the globe are clamoring about the benefits of AI in earnings calls. I don't think there are many enterprises out there not thinking about incorporating AI in their workflows. For all the hype around AI, it does deliver some real, outstanding value. Around 30% of code at Microsoft is now written by AI.
Great engineers can now be more productive, and some business leaders are slowing down hiring, trying to make their teams leaner. At the same time, call centers—a previously tough process to automate—now have AI agents triaging important calls and only escalating the ones where human intervention is required, reducing costs by around 50% in some situations.
While these benefits are promising and there is real value to be delivered, there is a storm brewing under the waters, which, if not addressed, could lead to a loss of trust by the end user.
Hallucinations
We know LLMs hallucinate or get things wrong—sometimes incorrectly stating a certain fact or making up stories that could have national impact. The non-deterministic aspect of LLMs is what makes them personalized but also what can make them dangerous. While it might seem like common intuition that as models become more powerful, hallucinations become fewer and sparser—and therefore models can be trusted more—but that is not always what the data says.
During recent benchmarks conducted by the OpenAI team, they found that on the PersonQA benchmark, the o3 model hallucinated 33% of the time when asked a question about public figures, while the o4-mini hallucinated 48% of the time. The report mentioned that even OpenAI needs to conduct further research to better understand this behavior.
The entire point of LLM-driven chatbots is to reduce repetitive work, allow users to quickly get the information they need and enable businesses to operate more efficiently. But if every single output of AI needs to be cross-validated and rechecked for accuracy, business leaders might actually find significant bottlenecks in regular business processes.
Software engineering—a poster boy for LLM-generated output—might also not be saved from this onslaught of hallucinations. While everyone might be aware of the hallucinated variable name or the pesky bugs caused by incompatibility with the existing codebase, a recent report shared by Ars Technica suggests LLMs consistently hallucinate package names. These aren't one-off errors but rather persistent issues. Serious attackers can develop queries that exploit these hallucination patterns, making it dangerous.
This seriousness has even caused Lloyd's of London insurers to introduce a policy to specifically cover losses caused by hallucinations and mistakes from AI tools. These hallucinations can't be simply chalked up to quirks—they can lead to loss in customer trust. Recently, a small claims court ruled that Air Canada must compensate a customer who was misled by the airline's chatbot.
What Companies Can Do
AI hallucinations are definitely scary, but at the same time, they are top of mind for LLM makers, with active research and new benchmarks to track and mitigate AI hallucination coming out every day. As the AI wave hits every department, from engineering to customer support, leaders need to walk the fine line between scaling intelligently and maintaining trust.
To truly walk that fine line, leaders need to build in guardrails from day one. This means setting up strong human in the loop review systems—not after problems show up but right from the start. Just like no team ships a product without QA, no team should rely on AI outputs without layers of validation.
Leaders should push for tracking hallucination rates across real workflows, not just in benchmark tests but in live business environments. If an AI assistant answers hundreds of customer queries a day, how often is it getting things wrong, and more importantly, what's the cost when it does?
Another thing leaders need to think about is transparency. If a system is AI-powered, users should know. If there's a confidence score or uncertainty behind the scenes, teams should explore ways to surface that. Giving users the right signals helps them decide whether to trust the output or double check it.
And inside the company, AI governance can't sit in one corner. It needs buy-in from engineering, product, legal, operations and leadership.
This isn't just about using AI. It's about using it with intention. I think the companies that figure this out won't just move faster. They'll build something much harder to copy: trust.
Forbes Business Council is the foremost growth and networking organization for business owners and leaders. Do I qualify?