Can ChatGPT pass the Turing Test yet?

Yahoo11-05-2025

Artificial intelligence chatbots like ChatGPT are getting a whole lot smarter, a whole lot more natural, and a whole lot more…human-like. It makes sense — humans are the ones creating the large language models that underpin AI chatbots' systems, after all. But as these tools get better at "reasoning" and mimicking human speech, are they smart enough yet to pass the Turing Test?
For decades, the Turing Test has been held up as a key benchmark in machine intelligence. Now, researchers are actually putting LLMs like ChatGPT to the test. If ChatGPT can pass, the accomplishment would be a major milestone in AI development.
So, can ChatGPT pass the Turing Test? According to some researchers, yes. However, the results aren't entirely definitive. The Turing Test isn't a simple pass/fail, which means the results aren't really black and white. Besides, even if ChatGPT could pass the Turing Test, that may not really tell us how 'human' an LLM really is.
Let's break it down.
The concept of the Turing Test is actually pretty simple.
The test was originally proposed by British mathematician Alan Turing, the father of modern computer science and a hero to nerds around the world. In 1949 or 1950, he proposed the Imitation Game — a test for machine intelligence that has since been named for him. The Turing Test involves a human judge having a conversation with both a human and a machine without knowing which one is which (or who is who, if you believe in AGI). If the judge can't tell which one is the machine and which one is the human, the machine passes the Turing Test. In a research context, the test is performed many times with multiple judges.
Of course, the test can't necessarily determine if a large language model is actually as smart as a human (or smarter) — just if it's able to pass for a human.
Large language models, of course, do not have a brain, consciousness, or world model. They're not aware of their own existence. They also lack true opinions or beliefs.
Instead, large language models are trained on massive datasets of information — books, internet articles, documents, transcripts. When text is inputted by a user, the AI model uses its "reasoning" to determine the most likely meaning and intent of the input. Then, the model generates a response.
At the most basic level, LLMs are word prediction engines. Using their vast training data, they calculate probabilities for the first 'token' (usually a single word) of the response using their vocabulary. They repeat this process until a complete response is generated. That's an oversimplification, of course, but let's keep it simple: LLMs generate responses to input based on probability and statistics. So, the response of an LLM is based on mathematics, not an actual understanding of the world.
So, no, LLMs don't actually think in any sense of the word.
Joseph Maldonado / Mashable Composite by Rene Ramos Credit: Mashable
There have been quite a few studies to determine if ChatGPT has passed the Turing test, and many of them have had positive findings. That's why some computer scientists argue that, yes, large language models like GPT-4 and GPT-4.5 can now pass the famous Turing Test.
Most tests focus on OpenAI's GPT-4 model, the one that's used by most ChatGPT users. Using that model, a study from UC San Diego found that in many cases, human judges were unable to distinguish GPT-4 from a human. In the study, GPT-4 was judged to be a human 54% of the time. However, this still lagged behind actual humans, who were judged to be human 67% of the time.
Then, GPT-4.5 was released, and the UC San Diego researchers performed the study again. This time, the large language model was identified as human 73% of the time, outperforming actual humans. The test also found that Meta's LLaMa-3.1-405B was able to pass the test.
Other studies outside of UC San Diego have also given GPT passing grades, too. A 2024 University of Reading study of GPT-4 had the model create answers for take-home assessments for undergraduate courses. The test graders weren't told about the experiment, and they only flagged one of 33 entries. ChatGPT received above-average grades with the other 32 entries.
So, are these studies definitive? Not quite. Some critics (and there are a lot of them) say these research studies aren't as impressive as they seem. That's why we aren't ready to definitively say that ChatGPT passes the Turing Test.
We can say that while previous-gen LLMs like GPT-4 sometimes passed the Turing test, passing grades are becoming more common as LLMs get more advanced. And as cutting-edge models like GPT-4.5 come out, we're fast headed toward models that can easily pass the Turing Test every time.
OpenAI itself certainly envisions a world in which it's impossible to tell human from AI. That's why OpenAI CEO Sam Altman has invested in a human verification project with an eyeball-scanning machine called The Orb.
We decided to ask ChatGPT if it could pass the Turing Test, and it told us yes, with the same caveats we've already discussed. When we posed the question, "Can ChatGPT pass the Turing Test?" to the AI chatbot (using the 4o model), it told us, "ChatGPT can pass the Turing Test in some scenarios, but not reliably or universally." The chatbot concluded, "It might pass the Turing Test with an average user under casual conditions, but a determined and thoughtful interrogator could almost always unmask it."
AI-generated image. Credit: OpenAI
Some computer scientists now believe the Turing test is outdated, and that it's not all that helpful in judging large language models. Gary Marcus, an American psychologist, cognitive scientist, author, and popular AI prognosticator, summed it up best in a recent blog post, where he wrote, 'as I (and many others) have said for years, the Turing Test is a test of human gullibility, not a test of intelligence."
It's also worth keeping in mind that the Turing Test is more about the perception of intelligence rather than actual intelligence. That's an important distinction. A model like ChatGPT 4o might be able to pass simply by mimicking human speech. Not only that, but whether or not a large language model passes the test will vary depending on the topic and the tester. ChatGPT could easily ape small talk, but it could struggle with conversations that require true emotional intelligence. Not only that, but modern AI systems are used for much more than chatting, especially as we head toward a world of agentic AI.
None of that is to say that the Turing Test is irrelevant. It's a neat historical benchmark, and it's certainly interesting that large language models are able to pass it. But the Turing Test is hardly the gold-standard benchmark of machine intelligence. What would a better benchmark look like? That's a whole other can of worms that we'll have to save for another story.
Disclosure: Ziff Davis, Mashable's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

China's humanoid robots generate more soccer excitement than their human counterparts

Associated Press

29 minutes ago

Associated Press

China's humanoid robots generate more soccer excitement than their human counterparts

BEIJING (AP) — While China's men's soccer team hasn't generated much excitement in recent years, humanoid robot teams have won over fans in Beijing based more on the AI technology involved than any athletic prowess shown. Four teams of humanoid robots faced off in fully autonomous 3-on-3 soccer matches powered entirely by artificial intelligence on Saturday night in China's capital in what was touted as a first in China and a preview for the upcoming World Humanoid Robot Games, set to take place in Beijing. According to the organizers, a key aspect of the match was that all the participating robots operated fully autonomously using AI-driven strategies without any human intervention or supervision. Equipped with advanced visual sensors, the robots were able to identify the ball and navigate the field with agility They were also designed to stand up on their own after falling. However, during the match several still had to be carried off the field on stretchers by staff, adding to the realism of the experience. China is stepping up efforts to develop AI-powered humanoid robots, using sports competitions like marathons, boxing, and football as a real-world proving ground. Cheng Hao, founder and CEO of Booster Robotics, the company that supplied the robot players, said sports competitions offer the ideal testing ground for humanoid robots, helping to accelerate the development of both algorithms and integrated hardware-software systems. He also emphasized safety as a core concern in the application of humanoid robots. 'In the future, we may arrange for robots to play football with humans. That means we must ensure the robots are completely safe,' Cheng said. 'For example, a robot and a human could play a match where winning doesn't matter, but real offensive and defensive interactions take place. That would help audiences build trust and understand that robots are safe.' Booster Robotics provided the hardware for all four university teams, while each school's research team developed and embedded their own algorithms for perception, decision-making, player formations, and passing strategies—including variables such as speed, force, and direction, according to Cheng. In the final match, Tsinghua University's THU Robotics defeated the China Agricultural University's Mountain Sea team with a score of 5–3 to win the championship. Mr. Wu, a supporter of Tsinghua, celebrated their victory while also praising the competition. 'They (THU) did really well,' he said. 'But the Mountain Sea team (of Agricultural University) was also impressive. They brought a lot of surprises.' China's men have made only one World Cup appearance and have already been knocked out of next years' competition in Canada, Mexico and the United States.

BBAI vs. CRWV vs. APP: Which Growth Stock Is the Best Pick, According to Wall Street Analysts?

Business Insider

an hour ago

Business Insider

BBAI vs. CRWV vs. APP: Which Growth Stock Is the Best Pick, According to Wall Street Analysts?

Macro uncertainties, geopolitical tensions, and news on the tariff front have kept the stock market volatile. Despite ongoing uncertainties, analysts remain optimistic about several growth stocks and their potential to generate attractive returns over the long term. Using TipRanks' Stock Comparison Tool, we placed BigBear. ai Holdings (BBAI), CoreWeave (CRWV), and AppLovin (APP) against each other to find the best growth stock, according to Wall Street analysts. Confident Investing Starts Here: Holdings (NYSE:BBAI) Stock Holdings stock has risen more than 31% so far in 2025 and 292% over the past year, as investors are optimistic about the prospects of the data analytics company. BBAI offers artificial intelligence (AI)-powered decision intelligence solutions, mainly focused on national security, defense, and critical infrastructure. The company ended Q1 2025 with a backlog of $385 million, reflecting 30% year-over-year growth. However, there have been concerns about low revenue growth rate and high levels of debt. Looking ahead, the company is pursuing further growth through international expansion and strategic partnerships, while continuing to secure attractive government business. What Is the Price Target for BBAI Stock? Last month, Northland Securities analyst Michael Latimore reaffirmed a Hold rating on BBAI stock but lowered his price target to $3.50 from $4 after the company missed Q1 estimates due to further delays in government contracts. On the positive side, the 4-star analyst noted the solid growth in backlog and management's statement that their strategy is 'beginning to resonate.' On TipRanks, Holdings stock is assigned a Moderate Buy consensus rating, backed by two Buys and two Holds. The average BBAI stock price target of $4.83 indicates a possible downside of 17.3% from current levels. CoreWeave (NASDAQ:CRWV) Stock CoreWeave, a cloud provider specializing in AI infrastructure, is seeing robust adoption for its products. The company, which provides customers access to Nvidia's (NVDA) GPUs (graphics processing units), went public in March. CRWV stock has risen about 300% to $159.99, compared to its IPO (initial public offering) price of $40. Remarkably, CoreWeave delivered a 420% jump in its Q1 2025 revenue to $981.6 million. Moreover, the company ended the first quarter of 2025 with a robust backlog of $25.9 billion. Meanwhile, CoreWeave has entered into lucrative deals, including an expanded agreement of up to $4 billion with ChatGPT-maker OpenAI and a collaboration to power the recently announced cloud deal between Alphabet's Google (GOOGL) and OpenAI. Is CRWV a Good Stock to Buy? Recently, Bank of America analyst Bradley Sills downgraded CoreWeave stock to Hold from Buy, citing valuation concerns following the strong rally after the company's Q1 results. Also, the 4-star analyst expects $21 billion of negative free cash flow through 2027, due to elevated capital expenditure ($46.1 billion through 2027). However, Sills raised the price target for CRWV stock to $185 from $76, noting several positives, including the OpenAI deal and strong revenue momentum. Overall, Wall Street has a Moderate Buy consensus rating on CoreWeave stock based on six Buys, 11 Holds, and one Sell recommendation. At $78.53, the average CRWV stock price target indicates a substantial downside risk of about 51%. AppLovin (NASDAQ:APP) Stock Adtech company AppLovin has witnessed a 301% jump in its stock price over the past year. The company provides end-to-end software and AI solutions for businesses to reach, monetize, and grow their global audiences. Notably, AppLovin's strong growth rates have impressed investors. In Q1 2025, AppLovin's revenue grew 40% and earnings per share (EPS) surged by 149%. Investors have also welcomed the company's decision to sell its mobile gaming business to Tripledot Studios. The move is expected to enable AppLovin to focus more on its AI-powered ad business. However, APP stock has declined more than 12% over the past month due to the disappointment related to its non-inclusion in the S&P 500 Index (SPX) and accusations by short-seller Casper Research. Nonetheless, most analysts remain bullish on AppLovin due to its strong fundamentals and demand for the AXON ad platform. Is APP a Good Stock to Buy Recently, Piper Sandler analyst James Callahan increased the price target for AppLovin stock to $470 from $455 and reaffirmed a Buy rating. While Piper Sandler's checks suggest some weakness in AppLovin's supply-side trends, it remains a buyer of APP stock, with the tech company growing well above its digital ad peers and expanding into new verticals. With 16 Buys and three Holds, AppLovin stock scores a Strong Buy consensus rating. The average APP stock price target of $504.18 indicates 51% upside potential from current levels. Conclusion Wall Street is sidelined on stock, cautiously optimistic on CoreWeave, and highly bullish on AppLovin stock. Analysts see higher upside potential in APP stock than in the other two growth stocks. Wall Street's bullish stance on AppLovin stock is backed by solid fundamentals and strong momentum in its AI-powered ad business. According to TipRanks' Smart Score System, APP stock scores a 'Perfect 10,' indicating that it has the ability to outperform the broader market over the long run.

AI is learning to lie, scheme, and threaten its creators

Yahoo

2 hours ago

Yahoo

AI is learning to lie, scheme, and threaten its creators

The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals. In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair. Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed. These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work. Yet the race to deploy increasingly powerful models continues at breakneck speed. This deceptive behavior appears linked to the emergence of "reasoning" models -AI systems that work through problems step-by-step rather than generating instant responses. According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts. "O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems. These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives. - 'Strategic kind of deception' - For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios. But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception." The concerning behavior goes far beyond typical AI "hallucinations" or simple mistakes. Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up." Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder. "This is not just hallucinations. There's a very strategic kind of deception." The challenge is compounded by limited research resources. While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed. As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception." Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS). - No rules - Current regulations aren't designed for these new problems. The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving. In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules. Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread. "I don't think there's much awareness yet," he said. All this is taking place in a context of fierce competition. Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein. This breakneck pace leaves little time for thorough safety testing and corrections. "Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.". Researchers are exploring various approaches to address these challenges. Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach. Market forces may also provide some pressure for solutions. As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it." Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm. He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability. tu/arp/md