OpenAI's ex-policy lead accuses the company of ‘rewriting' its AI safety history
A former policy lead at OpenAI is accusing the company of rewriting its history with a new post about the company's approach to safety and alignment.
Miles Brundage, the former head of policy research at OpenAI, criticized a recent post published by the company titled "How we think about safety and alignment."
In it, the company described the road to artificial general intelligence (AGI)—an AI system that can perform all the cognitive tasks as well or better than a person—as a continuous evolution, rather than a sudden leap. It also emphasized the value of "iterative deployment," which involves releasing AI systems, learning from how users interact with them, and then refining safety measures based on this evidence.
While Brundage praised the "bulk" of the post, he criticized the company for rewriting the "history of GPT-2 in a concerning way."
GPT-2, released in February 2019, was the second iteration of OpenAI's flagship large language model. At the time, it represented a much larger and more capable model than its successor, GPT-1, and was trained on a much broader dataset. But compared to subsequent GPT models, particularly GPT-3.5, the model that powered ChatGPT, GPT-2 was not particularly capable. It could write poetry and several coherent paragraphs of prose, but ask it to generate more than that, and its outputs often descended into strange nonsequiturs or gibberish. It was particularly good at answering factual questions or summarization or coding, or most of the tasks that people are now addressing using LLMs.
Nonetheless, OpenAI initially withheld GPT-2's full release and source code, citing concerns about the potential for dangerous misuse of the model. Instead, it gave a select number of news outlets limited access to a demo version of the model.
At the time, critics, including many AI researchers in academia, argued OpenAI's claims that the model presented a significantly increased risk of misuse were overblown or disingenuous. Some questioned whether OpenAI's claims were a publicity stunt—an underhanded way of hyping the unreleased model's capabilities and of ensuring that OpenAI's announcement were generate lots of headlines.
One AI-focused publication even penned an open letter urging OpenAI to release GPT-2, arguing its importance outweighed the risks. Eventually, OpenAI rolled out a partial version, followed by a full release months later.
In its recent safety post, OpenAI said the company didn't release GPT-2 due to "concerns about malicious applications." But it then essentially argued that some of OpenAI's former critics had been right and that the company's concerns about misuse had proved overblown and unnecessary. And it tried to argue that some of that excess of concern came from the fact that many of the company's AI safety researchers and policy staff assumed AGI would emerge suddenly, with one model suddenly leaping over the threshold to human-like intelligence, instead of emerging gradually.
"In a discontinuous world, practicing for the AGI moment is the only thing we can do, and safety lessons come from treating the systems of today with outsized caution relative to their apparent power. This is the approach we took for GPT‑2," OpenAI wrote.
However, Brundage, who was at the company when the model was released and was intimately involved with discussion about how the company would handle its release, argued that GPT-2's launch "was 100% consistent + foreshadowed OpenAI's current philosophy of iterative deployment."
'The model was released incrementally, with lessons shared at each step. Many security experts at the time thanked us for this caution," Brundage wrote on X.
He dismissed the idea that OpenAI's caution with GPT-2 was unnecessary or based on outdated assumptions about AGI. 'What part of that was motivated by or premised on thinking of AGI as discontinuous? None of it,' he wrote.
Brundage argued that the post's revisionist history serves to subtly bias the company in the direction of dismissing the concerns of AI safety researchers and releasing AI models, unless there is incontrovertible evidence that they present an immediate danger.
"It feels as if there is a burden of proof being set up in this section where concerns are alarmist + you need overwhelming evidence of imminent dangers to act on them - otherwise, just keep shipping," he said. "That is a very dangerous mentality for advanced AI systems."
"If I were still working at OpenAI, I would be asking why this blog post was written the way it was, and what exactly OpenAI hopes to achieve by poo-pooing caution in such a lop-sided way," he wrote.
OpenAI's blog post introduced two new ideas: the importance of iterative deployment and a slightly different approach to testing it's AI models.
Robert Trager, the co-director of the Oxford Martin AI Governance Initiative, told Fortune that the company appeared to be distancing itself from relying heavily on theory when testing it's models.
"It was like they were saying, we're not going to rely on math proving that the system is safe. We're going to rely on testing the system in a secure environment," he said.
"It makes sense to rely on all the tools that we have," he added. "So it's strange to say we're not going to rely so much on that tool."
Trager also said that iterative deployment works best when models are being deployed very often with minor changes between each release. However, he noted that this kind of approach may not be practical for OpenAI as some systems could be significantly different from what was deployed in the past.
"Their argument that there really won't be much of an impact, or a differential impact, from one system to the next, it doesn't seem quite to be convincing," he said.
Hamza Chaudhry, the AI and National Security lead at the Future of Life Institute, a non-profit that has raised concerns about AI's potential risk to humanity, also said that "relying on gradual rollouts may mean that potentially harmful capabilities and behaviors are exposed to the real world before being fully mitigated."
OpenAI also did not mention "staged deployment" in its blog post, which generally means releasing a model in various stages and evaluating it along the way. For example, allowing a small group of internal testers to access an AI model and accessing the results before releasing it to a larger set of users.
"The impression that it makes is that they're offering potential future justifications for actions that aren't necessarily consistent with what their safety standards have been in the past. And I would say that overall, they haven't made the case that new standards are better than earlier standards," Trager said.
Chowdhry said that OpenAI's approach to safety amounted "reckless experimenting on the public"—something that would not be allowed in any other industry. He also said this was "part and parcel of a broader push from OpenAI to minimize real government oversight over advanced high-stakes AI systems."
The post has been criticized by other prominent figures in the industry. Gary Marcus, professor emeritus of psychology and neural science at New York University, told Fortune the blog felt like "marketing" rather than an attempt to explain any new safety approaches.
"It's a way to hype AGI," he said. "And it's an excuse to dump stuff in the real world rather than properly sandboxing it before releasing and making sure it is actually ok. The blog is certainly not an actual solution to the many challenges of AI safety."
Over the past year, OpenAI has faced criticism from some AI experts for prioritizing product development over safety.
Several former OpenAI employees have quit over internal AI safety disputes, including prominent AI researcher Jan Leike.
Leike left with the company last year at the same time as OpenAI co-founder Ilya Sutskever. He openly blamed the lack of safety prioritization at the company for his departure, claiming that over the past few years, 'safety culture and processes have taken a backseat to shiny products.' Leike and Sutskever were co-leading the company's Superalignment team at the time, which was focused on the long-term risks of superpowerful artificial intelligence that would be more capable than all humanity. After the pair parted ways with the company, the team was dissolved.
Internally, employees said that OpenAI had failed to give safety teams the compute it had promised. In May last year a half-dozen sources familiar with the functioning of the Superalignment team told Fortune that OpenAI never fulfilled an earlier commitment to provide the safety team with 20% of its computing power.
The internal disagreements over AI safety have also resulted in an exodus of safety-focused employees. Daniel Kokotajlo, a former OpenAI governance researcher, told Fortune in August that nearly half of the company's staff that once focused on the long-term risks of superpowerful AI had left the company.
Marcus said that OpenAI had failed to live up to its purported principles and "instead they have repeatedly prioritized profit over safety (which is presumably part of why so many safety-conscious employees left)."
"For years, OpenAI has been pursuing a "black box" technology that probably can't ever be properly aligned, and done little to seriously consider alternative, more transparent technologies that might be less short-term profitable but safer for humanity in the long run," he said.
Representatives for OpenAI did not respond to Fortune's request for comment. Brundage declined to provide further comments.
This story was originally featured on Fortune.com

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
44 minutes ago
- Yahoo
AI referrals to top websites were up 357% year-over-year in June, reaching 1.13B
AI referrals to websites still have a way to go to catch up to the traffic that Google Search provides, but they're growing quickly. According to new data from market intelligence provider Similarweb, AI platforms in June generated over 1.13 billion referrals to the top 1,000 websites globally, a figure that's up 357% since June 2024. However, Google Search still accounts for the majority of traffic to these sites, accounting for 191 billion referrals during the same period of June 2025. One particular category of interest these days is news and media. Online publishers are seeing traffic declines and are preparing for a day they're calling 'Google Zero,' when Google stops sending traffic to websites. For instance, The Wall Street Journal recently reported on data that showed how AI overviews were killing traffic to news sites. Plus, a Pew Research Center study out this week found that in a survey of 900 U.S. Google users, 18% of some 69,000 searches showed AI Overviews, which led to users clicking links 8% of the time. When there was no AI summary, users clicked links nearly twice as much, or 15% of the time. Similarweb found that June's AI referrals to news and media websites were up 770% since June 2024. Some sites will naturally rank higher than others that are blocking access to AI platforms, as The New York Times does, as a result of its lawsuit with OpenAI over the use of its articles to train its models. In the news media category, Yahoo led with 2.3 million AI referrals in June 2025, followed by Yahoo Japan (1.9M), Reuters (1.8M), The Guardian (1.7M), India Times (1.2M), and Business Insider (1.0M). In terms of methodology, Similarweb counts AI referrals as web referrals to a domain from an AI platform like ChatGPT, Gemini, DeepSeek, Grok, Perplexity, Claude, and Liner. ChatGPT dominates here, accounting for more than 80% of the AI referrals to the top 1,000 domains. The company's analysis also looked at other categories beyond news, like e-commerce, science and education, tech/search/social media, arts and entertainment, business, and others. In e-commerce, Amazon was followed by Etsy and eBay when it came to those sites seeing the most referrals, at 4.5M, 2.0M, and 1.8M, respectively, during June. Among the top tech and social sites, Google, not surprisingly, was at the top of the list, with 53.1 million referrals in June, followed by Reddit (11.1M), Facebook (11.0M), Github (7.4M), Microsoft (5.1M), Canva (5.0M), Instagram (4.7M), LinkedIn (4.4M), Bing (3.1M), and Pinterest (2.5M). The analysis excluded the OpenAI website because so many of its referrals were from ChatGPT, pointing to its services. Across all other domains, the No. 1 site by AI referrals for each category included YouTube (31.2M), Research Gate (3.6M), Zillow (776.2K), (992.9K), Wikipedia (10.8M), (5.2M), (1.2M), Home Depot (1.2M), Kayak (456.5K), and Zara (325.6K). Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data


Tom's Guide
an hour ago
- Tom's Guide
GPT-5 could be OpenAI's most powerful model yet — here's what early testing reveals
The next major language model for ChatGPT may be closer than we think, and early feedback suggests GPT-5 could be a serious upgrade. According to a new report from The Information, someone who's tested the unreleased model described it as a significant step forward in performance. While OpenAI hasn't confirmed when GPT-5 will launch inside ChatGPT or its API platform, CEO Sam Altman recently acknowledged using the model and enjoying the experience. That alone hints that OpenAI is preparing to roll out a more powerful assistant; one designed to improve in areas where earlier versions have started to plateau. The report suggests GPT-5 blends OpenAI's traditional GPT architecture with elements from its reasoning-focused 'o' models. That would give it the flexibility to adjust how much effort it puts into different tasks, doing quick work on easy queries, but applying deeper reasoning to complex problems. This approach mirrors Anthropic's Claude models, which already let users fine-tune how much 'thinking' the model does. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. In GPT-5's case, this could mean faster responses when you're asking something simple, and more thoughtful output for challenges like debugging code or solving abstract math problems. One of GPT-5's biggest reported strengths is software engineering. According to The Information, the model handles both academic coding challenges and real-world tasks, such as editing complex, outdated codebases, more effectively than previous GPT versions. That could make it especially appealing to developers, many of whom currently rely on competitors like Anthropic's Claude. A person who tested GPT-5 told The Information it outperformed Claude Sonnet 4 in side-by-side comparisons. That's just one data point and Claude Opus 4 is still considered Anthropic's most advanced model, but it signals OpenAI is serious about reclaiming ground in this space. Here's where things get a little murky. Some researchers speculate GPT-5 might not be a single, brand-new model, but instead a routing system that dynamically selects the best model, GPT-style or reasoning-based, depending on your prompt. If that's true, it could signal a shift away from scaling traditional LLMs toward optimizing post-training performance through reinforcement learning and synthetic data. That's where models are fine-tuned using expert feedback after training and it's an area where OpenAI has been investing heavily. If GPT-5 lives up to early reports, it could help OpenAI win back developer mindshare and chip away at Anthropic's dominance in coding assistants; a market that could be worth hundreds of millions annually. It would also strengthen OpenAI's pitch to enterprise users and give its chip suppliers, like Nvidia, another reason to celebrate. For users of ChatGPT, the biggest change could be more efficient and accurate answers across the board, especially for bigger tasks that current models still struggle with. We'll have to wait and see what OpenAI officially announces in the coming weeks, but if GPT-5 is as strong as it sounds, the next wave of AI tools could be the most capable yet. Follow Tom's Guide on Google News to get our up-to-date news, how-tos, and reviews in your feeds. Make sure to click the Follow button.


TechCrunch
an hour ago
- TechCrunch
Meta names Shengjia Zhao as chief scientist of AI superintelligence unit
Meta CEO Mark Zuckerberg announced Friday that former OpenAI researcher Shengjia Zhao will lead research efforts at the company's new AI unit, Meta Superintelligence Labs (MSL). Zhao contributed to several of OpenAI's largest breakthroughs, including ChatGPT, GPT-4, and the company's first AI reasoning model, o1. 'I'm excited to share that Shengjia Zhao will be the Chief Scientist of Meta Superintelligence Labs,' Zuckerberg said in a post on Threads Friday. 'Shengjia co-founded the new lab and has been our lead scientist from day one. Now that our recruiting is going well and our team is coming together, we have decided to formalize his leadership role.' Zhao will set a research agenda for MSL under the leadership of Alexandr Wang, the former CEO of Scale AI who was recently hired to lead the new unit. We are excited to announce that @shengjia_zhao will be the Chief Scientist of Meta Superintelligence Labs! Shengjia is a brilliant scientist who most recently pioneered a new scaling paradigm in his research. He will lead our scientific direction for our team. Let's go 🚀 — Alexandr Wang (@alexandr_wang) July 25, 2025 Wang, who does not have a research banckground, was viewed as a somewhat unconventional choice to lead an AL lab. The addition of Zhao, who is a reputable research leader known for developing frontier AI models, rounds out the leadership team. To further fill out the unit, Meta has hired several high-level researchers from OpenAI, Google DeepMind, Safe Superintelligence, Apple, and Anthropic, as well as pulling researchers from Meta's existing FAIR and GenAI units. Zuckerberg notes in his post that Zhao has pioneered several breakthroughs, including a 'new scaling paradigm.' The Meta CEO is likely referencing Zhao's work on OpenAI's reasoning model, o1, in which he is listed as a foundational contributor alongside OpenAI co-founder Ilya Sutskever. Meta currently doesn't offer a competitor to o1, so AI reasoning models are a key area of focus for MSL. The Information reported in June that Zhao would be joining Meta Superintelligence Labs, alongside three other influential OpenAI researchers, including Jiahui Yu, Shuchao Bi, and Hongyu Ren. Meta has also recruited Trapit Bansal, another OpenAI researcher who worked on AI reasoning models with Zhao, as well as three employees from OpenAI's Zurich office that worked on multimodality. Zuckerberg has gone to great lengths to set MSL up for success. The Meta CEO has been on a recruiting spree to staff up his AI superintelligence labs, which has entailed sending personal emails to researchers and inviting prospects to his Lake Tahoe estate. Meta has reportedly offered some researcher eight and nine figure compensation packages, some of which are 'exploding offers' that expire in a matter of days. Techcrunch event Tech and VC heavyweights join the Disrupt 2025 agenda Netflix, ElevenLabs, Wayve, Sequoia Capital — just a few of the heavy hitters joining the Disrupt 2025 agenda. They're here to deliver the insights that fuel startup growth and sharpen your edge. Don't miss the 20th anniversary of TechCrunch Disrupt, and a chance to learn from the top voices in tech — grab your ticket now and save up to $675 before prices rise. Tech and VC heavyweights join the Disrupt 2025 agenda Netflix, ElevenLabs, Wayve, Sequoia Capital — just a few of the heavy hitters joining the Disrupt 2025 agenda. They're here to deliver the insights that fuel startup growth and sharpen your edge. Don't miss the 20th anniversary of TechCrunch Disrupt, and a chance to learn from the top voices in tech — grab your ticket now and save up to $675 before prices rise. San Francisco | REGISTER NOW Meta has also upped its investment in cloud computing infrastructure, which should help MSL conduct the massive training runs required to create competitive frontier AI models. By 2026, Zhao and MSL's researchers should have access to Meta's one gigawatt cloud computing cluster, Prometheus, located in Ohio. Once online, Meta will be one of the first technology companies with an AI training cluster of Prometheus' size — one gigawatt is enough energy to power more than 750,000 homes. That should help Meta conduct the massive training runs required to create frontier AI models. With the addition of Zhao, Meta now has two chief AI scientists, including Yann LeCun, the leader of Meta's FAIR. Unlike MSL, FAIR is designed to focus on long-term AI research — techniques that may be used five to 10 years from now. How exactly Meta's three AI units will work together remains to be seen. Nevertheless, Meta now seems to have a formidable AI leadership team to compete with OpenAI and Google.