logo
Mint Explainer: Is OpenAI exaggerating the powers of its new ChatGPT Agent?

Mint Explainer: Is OpenAI exaggerating the powers of its new ChatGPT Agent?

Mint5 hours ago
Leslie D'Monte OpenAI has flagged the agent as high-risk under its safety framework. Is this just marketing hype or a sign that AI is genuinely becoming more powerful and autonomous? OpenAI CEO Sam Altman. Photo AFP
Gift this article
On Thursday, OpenAI launched its autonomous ChatGPT Agent, a tool that's capable of finding and buying things online, managing your calendar, and booking you an appointment with a doctor. It's essentially a digital assistant that doesn't just provide information but complete actual tasks.
On Thursday, OpenAI launched its autonomous ChatGPT Agent, a tool that's capable of finding and buying things online, managing your calendar, and booking you an appointment with a doctor. It's essentially a digital assistant that doesn't just provide information but complete actual tasks.
That being said, OpenAI has flagged the agent as high-risk under its safety framework, warning it could potentially be used to create dangerous biological or chemical substances. Is this just marketing hype, timed to build momentum for the launch of GPT-5, or a sign that AI agents are genuinely becoming more powerful and autonomous, akin to the agents who protect the computer-generated world of The Matrix? What is ChatGPT Agent?
Say you want to rearrange your calendar, find a doctor and schedule an appointment, or research competitors and deliver a report. ChatGPT Agent can now do it for you. Also Read | Deep research with AI is days' worth of work in minutes
The agent can browse websites, run code, analyse data, and even create slide decks or spreadsheets—all based on your instructions. It combines the strengths of OpenAI's earlier tools—operator (which could navigate the web) and deep research (which could analyse and summarise information)—into a single system. You stay in control throughout: ChatGPT asks for permission before doing anything important, and you can stop or take over at any time. This new capability is available to Pro, Plus, and Team users through the tools dropdown. How does it work?
ChatGPT Auses a powerful set of tools to complete tasks, including a visual browser to interact with websites like a human, a text-based browser for reasoning-heavy searches, a terminal for code execution, and direct application programming interface (API) access.
It can also connect to apps such as Gmail or GitHub to fetch relevant information. You can log in to websites within the agent's browser, allowing it to dig deeper into personalised content. All of this runs on its own virtual computer, which keeps track of context even across multiple tools.
The agent can switch between browsers, download and edit files, and adapt its methods to complete tasks quickly and accurately. It's built for back-and-forth collaboration—you can step in anytime to guide or change the task, and ChatGPT can ask for more input when needed. If a task takes time, you'll get updates and a notification on your phone once it's done. Has OpenAI tested its performance?
OpenAI said on Humanity's Last Exam (HLE), which tests expert-level reasoning across subjects, ChatGPT Agent achieved a new high score of 41.6, rising to 44.4 when multiple attempts were run in parallel and the most confident response was selected. On FrontierMath, the toughest known math benchmark, the agent scored 27.4% using tools such as a code-executing terminal—far ahead of previous models.
In real-world tasks, ChatGPT agent performs at or above human levels in about half of the cases, based on OpenAI's internal evaluations. These tasks include building financial models, analysing competitors, and identifying suitable sites for green hydrogen projects.
ChatGPT Agent also outperforms others on specialised tests such as DSBench for data science, and the SpreadsheetBench for spreadsheet editing (45.5% vs Copilot Excel's 20.0%). On BrowseComp and WebArena, which test browsing skills, the agent achieves the highest scores to date, according to OpenAI. What are some of the things it can do?
Consider the case of travel planning. The agent won't just suggest ideas but navigate booking websites, fill out forms, and even make reservations one you give it permission.
You can also ask it to read your emails, find meeting invitations, and automatically schedule appointments in your calendar, or even draft and send follow-up emails. This level of coordination typically required juggling between apps, but the agent manages it in a single conversational flow.
Another example involves shopping and price comparison. You can tell the agent to 'order the best-reviewed smartphone under ₹ 15,000", and it can search online stores, compare prices and reviews, and proceed to checkout on a preferred platform. Customer support and task automation are other examples, where the agent is used to troubleshoot an issue, log into support portals, and even file return or refund requests. How are AI agents typically built?
Unlike basic chat bots, AI agents are autonomous systems that can plan, reason, and complete complex, multi-step tasks with minimal input—such as coding, data analysis, or generating reports.
They are built by combining ways to take in information, think, and take action. Developers begin by deciding what the agent should do, following which the agent collects data like such as or images from its environment. AI agents use large language models (LLMs) like GPT-4 as their core 'brain", which allows them to understand and respond to natural language instructions.
To allow AI agents to take action, developers connect the LLM to things like a web browser, code editor, calculator, and APIs for services such as Gmail or Slack. Frameworks like LangChain help integrate these parts, and keep track of information. Some AI agents learn from experience and get better over time. Testing and careful setup make sure they work well and follow rules. Does ChatGPT Agent have credible competition?
Google's Project Astra, part of its Gemini AI line, is developing a multimodal assistant that can see, hear, and respond in real time. Gemini CLI is an open-source AI agent that brings Google's Gemini model directly to the terminal for fast, lightweight access. It integrates with Gemini Code Assist, offering developers on all plans AI-powered coding in both VS Code and the command line.
Microsoft is embedding Copilot into Windows, Office, and Teams, giving its agent access to workflows, system controls, and productivity tools, soon enhanced by a dedicated Copilot Runtime.
Meta is building more socially focused agents within messaging and the metaverse, which could evolve into utility tools.
Apple is revamping Siri through Apple Intelligence, combining GPT-level reasoning with strict privacy features and deep on-device integration.
Other smart agents include Oracle's Miracle Agent, IBM's Watson tools, Agentforce from Salesforce Anthropic's Claude 3.5, and Perplexity AI's action-oriented agents through its Comet project, blending search with agentic behaviour.
The competitive advantage, though, may go to companies that can integrate these AI agents into everyday applications and call for action with a single, unified tool – a task that ChatGPT Agent has demonstrated. Why did OpenAI warn that ChatGPT Agent could be used to trigger biological warfare?
OpenAI claimed ChatGPT Agent's superior capabilities could, in theory, be misused to help someone create dangerous biological or chemical substances. However, it clarified that there was no solid evidence it could actually do so.
Regardless, OpenAI is activating the highest level of safety measures under its internal 'preparedness framework'. These include thorough threat modeling to anticipate potential misuse, special training to ensure the model refuses harmful requests, and constant monitoring using automated systems that watch for risky behaviour. There are also clear procedures in place for suspicious activity. Should we take this risk seriously?
Ja-Nae Duane, AI expert and MIT Research Fellow and co-author of SuperShifts, said the more autonomous the agent, the more permissions and access rights it would require. For example, buying a dress requires wallet access; scheduling an event requires calendar and contact list access.
'While standard ChatGPT already presents privacy risks, the risks from ChatGPT Agent are exponentially higher because people will be granting it access rights to external tools containing personal information (like calendar, email, wallet, and more). There's a significant gap between the pace of AI development and AI literacy; many people haven't even fully understood ChatGPT's existing privacy risks, and now they're being introduced to a feature with exponentially more risks," he said. Also Read | Google's Veo 3 brings the era of video on command
Duane added that the key risks included data leaks, mistaken actions, prompt injection, and account compromise, especially when handling sensitive information. Malicious actors, he warned, could exploit them by manipulating inputs, abusing tool access, stealing credentials, or poisoning data to bias outputs. Poor third-party integration and an over-reliance of them could worsen the impact, while the agent's 'black box" nature would make it hard to trace errors, he added. In the wrong hands, these agents could be weaponised for fraud, phishing, or even to generate malware. What are the other concern areas for enterprises?
Developers are increasingly deploying AI agents across IT, customer service, and enterprise workflows. According to Nasscom, 46% of Indian firms are experimenting with these agents, particularly in IT, HR, and finance, while manufacturing leads in robotics, quality control, and automation.
Beyond concerns around hallucinations, security, privacy, and copyright or intellectual property (IP) violations, a key challenge for businesses is ensuring a return on investment. Gartner noted that many so-called agentic use cases could be handled by simpler tools and predicted that more than 40% of such projects would be scrapped by 2027 over high costs, unclear value, or inadequate risk controls.
Of the thousands of vendors in this space, only around 130 are seen as credible; many engage in 'agent washing" by repackaging chatbots, robotic process automation (RPA), or basic assistants as autonomous agents. Nasscom corroborated these concerns, highlighting that 62% of enterprises were still only testing agents in-house. Why is 'humans-in-the-loop' a must?
OpenAI CEO Sam Altman advised granting agents only the minimum access needed for each task, not blanket permissions. Nasscom believes that to scale responsibly, enterprises must prioritise human-AI collaboration, trust, and data readiness. It has recommended firms adopt AI agents with a 'human-in-the-loop" approach, reflecting the need for oversight and contextual judgment.
According to Duane, users must understand both the tool's strengths and its limits, especially when handling sensitive data. Caution is key, as misuse could have serious consequences. She also emphasised the importance of AI literacy, noting that AI was evolving far faster than most people's understanding of how to use it responsibly. Also Read | Mint Primer: Are firms wasting their money on AI agents? Topics You May Be Interested In
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Human Coder Beats AI In Epic 10-Hour Showdown: 'Humanity Prevails, For Now'
Human Coder Beats AI In Epic 10-Hour Showdown: 'Humanity Prevails, For Now'

NDTV

timean hour ago

  • NDTV

Human Coder Beats AI In Epic 10-Hour Showdown: 'Humanity Prevails, For Now'

Machines might not be able to dominate humans despite the rise of artificial intelligence (AI) -- at least yet, after a Polish programmer defeated an OpenAI model in a head-to-head coding competition. Programmer Przemyslaw Debiak, better known as Psycho, emerged victorious after a 10-hour marathon coding stint at the AtCoder World Tour Finals 2025 Heuristic contest in Tokyo. The contest might have been the first time where an AI model competed directly against top human programmers in a major onsite world championship, according to a report in Arstechnia. Having already competed in several events prior to the big showdown against AI, Debiak, a former OpenAI employee, managed to coast to victory despite being 'completely exhausted'. "Humanity has prevailed (for now!) I'm completely exhausted. I figured, I had 10 hours of sleep in the last 3 days and I'm barely alive," wrote Debiak on X (formerly Twitter). "The results are official now and my lead over AI increased from 5.5 per cent to 9.5 per cent. Honestly, the hype feels kind of bizarre. Never expected so many people would be interested in programming contests. Guess this means I should drop in here more often," he added. The competition required the contestants to solve a single complex optimisation problem over 10 hours. The solution lies in using clever, often imperfect strategies to reach the best possible solutions within strict time constraints. While Debiak may have emerged as the winner, the AI model still managed to outperform the remaining elite human programmers, who had each qualified for the competition through year-long rankings. See the post here: Humanity has prevailed (for now!) I'm completely exhausted. I figured, I had 10h of sleep in the last 3 days and I'm barely alive. I'll post more about the contest when I get some rest. (To be clear, those are provisional results, but my lead should be big enough) — Psyho (@FakePsyho) July 16, 2025 Also Read | What Is Baby Grok? Musk's xAI Announces Kid-Friendly AI Chatbot After Companion Controversy "Last human to defeat AI?" Social media users congratulated Debiak on his victory and asked him to share the solution which helped him upstage the AI model. "Congrats! Looking forward to your analysis on the solutions if you get time," said one user, while another added: "This is much more interesting than basketball, tennis or even chess championships." A third commented: "You are likely one of the last humans to defeat an AI in a programming contest. This is a huge deal for humanity as a whole, even those who don't care about programming contests." With experts predicting that AI models will reach human-level consciousness, more popularly known as Artificial General Intelligence (AGI), in the next few years, Debiak's victory could perhaps be the last few instances where humans are able to beat the machines.

Jensen Huang, AI visionary in a leather jacket
Jensen Huang, AI visionary in a leather jacket

Economic Times

time2 hours ago

  • Economic Times

Jensen Huang, AI visionary in a leather jacket

Agencies Unknown to the general public just three years ago, Jensen Huang is now one of the most powerful entrepreneurs in the world as head of chip giant Nvidia. The unassuming 62-year-old draws stadium crowds of more than 10,000 people as his company's products push the boundaries of artificial intelligence. Chips designed by Nvidia, known as graphics cards or GPUs (Graphics Processing Units), are essential in developing the generative artificial intelligence powering technology like ChatGPT. Big tech's insatiable appetite for Nvidia's GPUs, which sell for tens of thousands of dollars each, has catapulted the California chipmaker beyond $4 trillion in market valuation, the first company ever to surpass that mark. Nvidia's meteoric rise has boosted Huang's personal fortune to $150 billion -- making him one of the world's richest people -- thanks to the roughly 3.5 percent stake he holds in the company he founded three decades ago with two friends in a Silicon Valley diner. In a clear demonstration of his clout, he recently convinced President Donald Trump to lift restrictions on certain GPU exports to China, despite the fact that China is locked in a battle with the United States for AI supremacy. "That was brilliantly done," said Jeffrey Sonnenfeld, a governance professor at Yale was able to explain to Trump that "having the world using a US tech platform as the core protocol is definitely in the interest of this country" and won't help the Chinese military, Sonnenfeld said. Early life Born in Taipei in 1963, Jensen Huang (originally named Jen-Hsun) embodies the American success story. At nine years old, he was sent away with his brother to boarding school in small-town uncle recommended the school to his Taiwanese parents believing it to be a prestigious institution, when it was actually a school for troubled young to be a student, Huang boarded there but attended a nearby public school alongside the children of tobacco farmers. With his poor English, he was bullied and forced to clean toilets -- a two-year ordeal that transformed him."We worked really hard, we studied really hard, and the kids were really tough," he recounted in an interview with US broadcaster "the ending of the story is I loved the time I was there," Huang said. Leather jacket and tattoo Brought home by his parents, who had by then settled in the northwestern US state of Oregon, he graduated from university at just 20 and joined AMD, then LSI Logic, to design chips -- his he wanted to go further and founded Nvidia in 1993 to "solve problems that normal computers can't," using semiconductors powerful enough to handle 3D graphics, as he explained on the "No Priors" podcast. Nvidia created the first GPU in 1999, riding the intersection of video games, data centers, cloud computing, and now, generative AI. Always dressed in a black T-shirt and leather jacket, Huang sports a Nvidia logo tattoo and has a taste for sports it's his relentless optimism, low-key personality and lack of political alignment that sets him apart from the likes of Elon Musk and Mark Zuckerberg. Unlike them, Huang was notably absent from Trump's inauguration ceremony."He backpedals his own aura and has the star be the technology rather than himself," observed Sonnenfeld, who believes Huang may be "the most respected of all today's tech titans."One former high-ranking Nvidia employee described him to AFP as "the most driven person" he'd ever met. Street food On visits to his native Taiwan, Huang is treated like a megastar, with fans crowding him for autographs and selfies as journalists follow him to the barber shop and his favorite night market."He has created the phenomena because of his personal charm," noted Wayne Lin of Witology Market Trend Research Institute."A person like him must be very busy and his schedule should be full every day meeting big bosses. But he remembers to eat street food when he comes to Taiwan," he said, calling Huang "unusually friendly."Nvidia is a tight ship and takes great care to project a drama-free image of Huang. But the former high-ranking employee painted a more nuanced picture, describing a "very paradoxical" individual who is fiercely protective of his employees but also capable, within Nvidia's executive circle, of "ripping people to shreds" over major mistakes or poor choices. Elevate your knowledge and leadership skills at a cost cheaper than your daily tea. What's keeping real retail investors out of the Nvidia rally If data is the new oil, are data centres the smokestacks of the digital age? The hybrid vs. EV rivalry: Why Maruti and Mahindra pull in different directions. What's best? Instagram and YouTube make billions off creators. Should they pay up for their mental health? Trent trips on the ramp. Is it still worth the splurge or time to change brands? Best way to deal with volatility, just ' Hold' for wealth creation: 7 large-cap stocks with an upside potential of up to 41% Stock picks of the week: 5 stocks with consistent score improvement with an upside potential of 16 to 38% in 1 year Headwinds, yes, but long-term story intact. 7 stocks from the engineering sector with upside potential from 21 to 42%

Putin meets Khamenei's top adviser Larijani for nuclear talks
Putin meets Khamenei's top adviser Larijani for nuclear talks

Time of India

time2 hours ago

  • Time of India

Putin meets Khamenei's top adviser Larijani for nuclear talks

Russian President Vladimir Putin held a surprise meeting with Ali Larijani, top adviser to Iran's supreme leader on nuclear issues, to discuss Tehran's nuclear programme in the Kremlin on Sunday. Moscow has a cordial relationship with Iran's clerical leadership and provides crucial backing for Tehran but did not swing forcefully behind its partner even after the United States joined Israel's massive bombing campaign on Iran in June. Explore courses from Top Institutes in Select a Course Category PGDM Data Analytics Operations Management MBA MCA Healthcare Artificial Intelligence Public Policy Design Thinking Digital Marketing Degree Project Management Cybersecurity Leadership Management Finance Others Data Science CXO healthcare others Data Science Technology Product Management Skills you'll gain: Financial Analysis & Decision Making Quantitative & Analytical Skills Organizational Management & Leadership Innovation & Entrepreneurship Duration: 24 Months IMI Delhi Post Graduate Diploma in Management (Online) Starts on Sep 1, 2024 Get Details Larijani "conveyed assessments of the escalating situation in the Middle East and around the Iranian nuclear programme", Kremlin spokesman Dmitry Peskov said of the unannounced meeting. by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like The Simple Morning Habit for a Flatter Belly After 50! Lulutox Undo Putin had expressed Russia's "well-known positions on how to stabilize the situation in the region and on the political settlement of the Iranian nuclear programme", he added. Separately, a German diplomatic source told AFP on Sunday that Britain, France and Germany are planning to hold fresh talks with Iran on its nuclear programme in the coming days. Live Events Iran's Tasnim news agency also reported that Tehran had agreed to hold talks with the three European countries, citing an unnamed source. Last week, Russia had slammed a story by US news outlet Axios citing anonymous sources that said Putin had "encouraged" Iran to accept a deal with the United States that would prevent the Islamic republic from enriching uranium. Iran has consistently denied seeking a nuclear weapon, while defending its "legitimate rights" to the peaceful use of atomic energy.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store