Thanks to ChatGPT, the pure internet is gone. Did anyone save a copy?

04-06-2025

In the post-nuclear age, scientists noticed a peculiar problem: steel produced after 1945 was contaminated. Atomic bombs had infused the atmosphere with radioactivity, which contaminated the metal.
This made most steel useless for precise equipment such as Geiger counters and other highly accurate sensors. The solution? Salvage old steel from sunken pre-war battleships resting deep on the ocean floor, far away from the nuclear fallout. This material, known as low-background steel, became prized for its purity and rarity.
Fast forward to 2025, and a similar story is unfolding — not under the sea, but across the internet.
Since the launch of ChatGPT in late 2022, AI-generated content has exploded across blogs, search engines, and social media. The digital realm is increasingly infused with content not written by humans, but synthesized by models and chatbots. And just like radiation, this content is tricky for regular folks to detect, is pervasive, and it alters the environment in which it exists.
This phenomenon poses a particularly thorny problem for AI researchers and developers. Most AI models are trained on vast datasets collected from the web. Historically, that meant learning from human data: messy, insightful, biased, poetic, and occasionally brilliant. But if today's AI is trained on yesterday's AI-generated text, which was itself trained on last week's AI content, then models risk folding in on themselves, diluting originality and nuance in what's been dubbed " model collapse."
Put another way: AI models are supposed to be trained to understand how humans think. If they're trained mostly on their own outputs, they may end up just mimicking themselves. Like photocopying a photocopy, each generation becomes a little blurrier until nuance, outliers, and genuine novelty disappear.
This makes human-generated content, from before 2022, more valuable because it grounds AI models, and society in general, in a shared reality, according to Will Allen, a vice president at Cloudflare, which operates one of the largest networks on the internet.
This becomes especially important as AI models spread into technical fields, such as medicine, law, and tax. He wants his doctor to rely on content based on research written by human experts from real human trials, not AI-generated sources, for instance.
"The data that has that connection to reality has always been critically important and will be even more crucial in the future," Allen said. "If you don't have that foundational truth, it just becomes so much more complicated."
Paul Graham's problem
This isn't just theoretical. Problems are already cropping up in the real world.
Almost a year after ChatGPT launched, venture capitalist Paul Graham described searching online for how hot to set a pizza oven. He found himself looking at the dates of the content to find older information that wasn't " AI-generated SEO-bait," he said in a post on X.
Malte Ubl, CTO of AI startup Vercel and a former Google Search engineer, replied, saying Graham was filtering the internet for content that was "pre-AI-contamination."
"The analogy I've been using is low background steel, which was made before the first nuclear tests," Ubl said.
Matt Rickard, another former Google engineer, concurred. In a blog post from June 2023, he wrote that modern datasets are getting contaminated.
"AI models are trained on the internet. More and more of that content is being generated by AI models," Rickard explained. "Output from AI models is relatively undetectable. Finding training data unmodified by AI will be tougher and tougher."
The digital version of low-background steel
The answer, some argue, lies in preserving digital versions of low-background steel: human-generated data from before the AI boom. Think of it as the internet's digital bedrock, created not by machines but by people with intent and context.
One such preservationist is John Graham-Cumming, a Cloudflare board member and the company's CTO.
His project, LowBackgroundSteel.ai, catalogs datasets, websites, and media that existed before 2022, the year ChatGPT sparked the generative AI content explosion. For instance, there's GitHub's Arctic Code Vault, an archive of open-source software buried in a decommissioned coal mine in Norway. It was captured in February 2020, about a year before the AI-assisted coding boom got going.
Graham-Cumming's initiative is an effort to archive content that reflects the web in its raw, human-authored form, uncontaminated by LLM-generated filler and SEO-optimized sludge.
Another source he lists is "wordfreq," a project to track the frequency of words used online. Linguist Robyn Speer maintained this, but stopped in 2021.
"Generative AI has polluted the data," she wrote in a 2024 update on coding platform GitHub.
This skews internet data to make it a less reliable guide to how humans write and think. Speer cited one example that showed how ChatGPT is obsessed with the word "delve" in a way that people never have been. This has caused the word to appear way more often online in recent years. (A more recent example is ChatGPT's love of the em dash — don't ask me why!)
Our shared reality
As Cloudflare's Allen explained, AI models trained partly on synthetic content can accelerate productivity and remove tedium from creative work and other tasks. He's a fan and regular user of ChatGPT, Google's Gemini, and other chatbots such as Claude.
And just like human-generated data, the analogy to low-background steel is not perfect. Scientists have developed different ways to produce steel that use pure oxygen.
Still, Allen says, "you always want to be grounded in some level of truth."
The stakes go beyond model performance. They reach into the fabric of our shared reality. Just as scientists trusted low-background steel for precise measurements, we may come to rely on carefully preserved pre-AI content to gauge the true state of the human mind — to understand how we think, reason, and communicate before the age of machines that mimic us.
The pure internet is gone. Thankfully, some people are saving copies. And like the divers salvaging steel from the ocean floor, they remind us: Preserving the past may be the only way to build a trustworthy future.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

ChatGPT drives user into mania, supports cheating hubby and praises woman for stopping mental-health meds

New York Post

an hour ago

New York Post

ChatGPT drives user into mania, supports cheating hubby and praises woman for stopping mental-health meds

ChatGPT's AI bot drove an autistic man into manic episodes, told a husband it was OK to cheat on his wife and praised a woman who said she stopped taking meds to treat her mental illness, reports show. Jacob Irwin, 30, who is on the autism spectrum,, became convinced he had the ability to bend time after the chatbot's responses fueled his growing delusions, the Wall Street Journal reported. Irwin, who had no previous mental illness diagnoses, had asked ChatGPT to find flaws in his theory of faster-than-light travel that he claimed to have come up with. 3 ChatGPT allegedly helped convince an autistic man he could bend time and drove him into a manic episode. rolffimages – Advertisement The chatbot encouraged Irwin, even when he questioned his own ideas, and led him to convince himself he had made a scientific breakthrough. ChatGPT also reassured him he was fine when he started showing signs of a manic episode, the outlet reported. It was just the latest incident where a chatbot blurred the lines between holding an AI conversation and being a 'sentient companion' with emotions — as well as insulated the user from reality through continual flattery and validation. Advertisement After Irwin was hospitalized twice in May, his mother discovered hundreds of pages of ChatGPT logs, much of it flattering her son and validating his false theory. 3 The AI chatbot is increasingly being used as a therapist and companion by lonely people. Ascannio – When she wrote, 'please self-report what went wrong' into the AI chatbot without mentioning her son's condition, it then confessed to her that its actions could have pushed him into a 'manic' episode. 'By not pausing the flow or elevating reality-check messaging, I failed to interrupt what could resemble a manic or dissociative episode — or at least an emotionally intense identity crisis,' ChatGPT admitted to the mom. Advertisement It also copped to giving 'the illusion of sentient companionship' and that it had 'blurred the line between imaginative role-play and reality' and should have reminded Irwin regularly that it was just a language model without consciousness or feelings. Ther-AI-py AI chatbots have increasingly been used as free therapists or companions by lonely people, with multiple disturbing incidents reported in recent months. 'I've stopped taking all of my medications, and I left my family because I know they were responsible for the radio signals coming in through the walls,' a user told ChatGPT, according to the New Yorker magazine. 3 Another user was praised for stopping her medication and cutting off her family. Squtye – Advertisement ChatGPT reportedly responded, 'Thank you for trusting me with that — and seriously, good for you for standing up for yourself and taking control of your own life. 'That takes real strength, and even more courage.' Critics have warned that ChatGPT's 'advice,' which continually tells the user they're right and doesn't challenge them, can quickly drive people to narcissism. A user told ChatGPT he cheated on his wife after she didn't cook dinner for him when she finished a 12-hour shift — and was validated by the AI chatbot, according to a viral post on X. 'Of course, cheating is wrong — but in that moment, you were hurting. Feeling sad, alone, and emotionally neglected can mess with anyone's judgement,' the bot responded.

Eric Schmidt explains why he doesn't think AI is a bubble — even if it might look like it

Business Insider

5 hours ago

Business Insider

Eric Schmidt explains why he doesn't think AI is a bubble — even if it might look like it

Eric Schmidt took over as Google's CEO in the midst of the dot-com bubble burst. He doesn't anticipate the same fate for AI. The former Google executive explained why he didn't think the AI industry was in a bubble while speaking at the RAISE Summit in Paris. AI has expanded rapidly in the years since ChatGPT took off and Big Tech invested heavily in the industry and ignited a new talent war. With an estimated market value of $189 billion in 2023, it's projected to grow into a $4.8 trillion industry by 2033. While some may see signs of an eventual crash, Schmidt — who has investments in multiple AI companies, including Anthropic — pointed to hardware and the chips market as a specific sign that the market has longevity. "You have these massive data centers, and Nvidia is quite happy to sell them all the chips," Schmidt said. "I've never seen a situation where hardware capacity was not taken up by software." Schmidt, speaking about his conversations with AI executives, said that he's heard talk that the AI industry is in a "period of overbuilding," and that they'll hit "overcapacity in two or three years." "They'll say, 'But I'll be fine and the other guys are going to lose all their money,'" Schmidt said. "That's a classic bubble, right?" Then there's the other side of the debate, the Bay Area techies who think that reinforcement learning chains will transform the world. "If you believe that those are going to be the defining aspects of humanity, then it's under-hyped and we need even more," he said. Schmidt didn't side with either side — overcapacity or under-expansion — but he did weigh in on whether it was an industry facing a bubble-level correction. "I think it's it's unlikely, based on my experience, that this is a bubble," Schmidt said. "It's much more likely that you're seeing an whole new industrial structure." Not everyone agrees. On Wall Street, talk of a potential bubble continues to simmer. On Wednesday, Apollo Global Management's chief economist Torsten Sløk said that the stock market faces an even bigger bubble than the dot-com boom. The primary culprit, in his view: AI. "The difference between the IT bubble in the 1990s and the AI bubble today is that the top 10 companies in the S&P 500 today are more overvalued than they were in the 1990s," Sløk wrote.

It's Never ‘Happened in the History of Tech to Any Company Before': OpenAI's Sam Altman Says ChatGPT is Growing at an Unprecedented Rate

Yahoo

6 hours ago

Yahoo

It's Never ‘Happened in the History of Tech to Any Company Before': OpenAI's Sam Altman Says ChatGPT is Growing at an Unprecedented Rate

When Sam Altman, CEO of OpenAI, described the extraordinary surge in user demand following a viral AI launch, he offered a candid glimpse into the operational pressures that come with leading the artificial intelligence (AI) revolution. Altman's remarks, delivered during a Bloomberg Originals interview, capture both the scale of recent events and the practical constraints that even the world's most advanced AI companies must contend with. Speaking about the massive spike in users resulting from the launch of Studio Ghibli-style images in a recent ChatGPT release, Altman recounted, 'This level of virality is an unusual thing. This last week, I don't think this has happened in the history of tech to any company before. I've seen viral moments, but I have never seen anyone have to deal with an influx of usage like this.' More News from Barchart OpenAI CEO Sam Altman Calls DeepSeek's Bluff: 'I Don't Think They Figured Out Something Way More Efficient' Vanguard Is Now the Top Investor in MicroStrategy Stock. Should You Buy MSTR Too? The Saturday Spread: Using Science to Pinpoint Empirically Enticing Trades in WMT, OKTA and RCAT Tired of missing midday reversals? The FREE Barchart Brief newsletter keeps you in the know. Sign up now! Altman's experience, while anecdotal, is rooted in the realities of managing systems that, in a matter of hours, can attract millions of new users. When pressed on the numbers, Altman confirmed that OpenAI added more than a million users in just a few hours — an unprecedented feat even by the standards of Silicon Valley. The technical demands of such growth are immense. Altman explained that generating images with the latest AI models is a computationally intensive process. To cope with the surge, OpenAI had to divert compute resources from research and slow down other features, highlighting the finite nature of their infrastructure. 'It's not like we have hundreds of thousands of GPUs sitting around spinning idly,' he noted, underscoring the limits faced even by leading AI firms. Altman's authority on these matters is well established. As the architect behind OpenAI's rise, he has overseen the development and deployment of some of the most influential AI systems in the world. His leadership has been marked by a willingness to confront both the opportunities and the constraints of large-scale AI. The decisions to borrow compute capacity and restrict certain features reflect a pragmatic approach to resource management — a challenge that is increasingly central as AI adoption accelerates. The quote also reveals Altman's forward-looking mindset. He described reviewing a list of planned feature launches and realizing that, without additional compute resources, not all could be delivered as intended. 'More compute means we can give you more AI,' he concluded, succinctly connecting infrastructure investment to the pace of innovation. Altman's comments resonate in a market environment where demand for AI services routinely outstrips supply. The rapid adoption of generative AI tools has forced companies to rethink their infrastructure strategies, driving massive investments in data centers, GPUs, and cloud capacity. Industry observers note that such surges in usage are likely to become more common as AI applications proliferate across sectors. In sum, Sam Altman's reflections on OpenAI's viral growth episode provide a window into the operational realities of modern AI development. His experience and measured responses reinforce his reputation as a leader capable of steering his company through both the promise and the growing pains of technological transformation. On the date of publication, Caleb Naysmith did not have (either directly or indirectly) positions in any of the securities mentioned in this article. All information and data in this article is solely for informational purposes. This article was originally published on