logo
New Grok 4 Takes on ‘Humanity's Last Exam' as the AI Race Heats Up

New Grok 4 Takes on ‘Humanity's Last Exam' as the AI Race Heats Up

Elon Musk released the newest artificial intelligence model from his company xAI on Wednesday night. In an hour-long public reveal session, he called the model, Grok 4, 'the smartest AI in the world' and claimed it was capable of getting perfect SAT scores and near-perfect GRE results in every subject, from the humanities to the sciences.
During the online launch, Musk and members of his team described testing Grok 4 on a metric called Humanity's Last Exam (HLE)—a 2,500-question benchmark designed to evaluate an AI's academic knowledge and reasoning skill. Created by nearly 1,000 human experts across more than 100 disciplines and released in January 2025, the test spans topics from the classics to quantum chemistry and mixes text with images. Grok 4 reportedly scored 25.4 percent on its own. But given access to tools (such as external aids for code execution or Web searches), it hit 38.6 percent. That jumped to 44.4 percent with a version called Grok 4 Heavy, which uses multiple AI agents to solve problems. The two next best-performing AI models are Google's Gemini-Pro (which achieved 26.9 percent with the tools) and OpenAI's o3 model (which got 24.9 percent, also with the tools). The results from xAI's internal testing have yet to appear on the leaderboard for HLE, however, and it remains unclear whether this is because xAI has yet to submit the results or because those results are pending review. Manifold, a social prediction market platform where users bet play money (called 'Mana') on future events in politics, technology and other subjects, predicted a 1 percent chance, as of Friday morning, that Grok 4 would debut on HLE's leaderboard with a 45 percent score or greater on the exam within a month of its release. (Meanwhile xAI has claimed a score of only 44.4.)
During the launch, the xAI team also ran live demonstrations showing Grok 4 crunching baseball odds, determining which xAI employee has the 'weirdest' profile picture on X and generating a simulated visualization of a black hole. Musk suggested that the system may discover entirely new technologies by later this year—and possibly 'new physics' by the end of next year. Games and movies are on the horizon, too, with Musk predicting that Grok 4 will be able to make playable titles and watchable films by 2026. Grok 4 also has new audio capabilities, including a voice that sang during the launch, and Musk said new image generation and coding tools are soon to be released. The regular version of Grok 4 costs $30 a month; SuperGrok Heavy—the deluxe package with multiple agents and research tools—runs at $300.
On supporting science journalism
If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
Artificial Analysis, an independent benchmarking platform that ranks AI models, now lists Grok 4 as highest on its Artificial Analysis Intelligence Index, slightly ahead of Gemini 2.5 Pro and OpenAI's o4-mini-high. And Grok 4 appears as the top-performing publicly available model on the leaderboards for the Abstraction and Reasoning Corpus, or ARC-AGI-1, and its second edition, ARC-AGI-2 —benchmarks that measure progress toward 'humanlike' general intelligence. Greg Kamradt, president of ARC Prize Foundation, a nonprofit organization that maintains the two leaderboards, says that when the xAI team contacted the foundation with Grok 4's results, the organization then independently tested Grok 4 on a dataset to which the xAI team did not have access and confirmed the results. 'Before we report performance for any lab, it's not verified unless we verify it,' Kamradt says. 'We approved the [testing results] slide that [the xAI team] showed in the launch.'
According to xAI, Grok 4 also outstrips other AI systems on a number of additional benchmarks that suggest its strength in STEM subjects (read a full breakdown of the benchmarks here). Alex Olteanu, a senior data science editor at AI education platform DataCamp, has tested it. 'Grok has been strong on math and programming in my tests, and I've been impressed by the quality of its chain-of-thought reasoning, which shows an ingenious and logically sound approach to problem-solving,' Olteanu says. 'Its context window, however, isn't very competitive, and it may struggle with large code bases like those you encounter in production. It also fell short when I asked it to analyze a 170-page PDF, likely due to its limited context window and weak multimodal abilities.' (Multimodal abilities refer to a model's capacity to analyze more than one kind of data at the same time, such as a combination of text, images, audio and video.)
On a more nuanced front, issues with Grok 4 have surfaced since its release. Several posters on X —owned by Musk himself—as well as tech-industry news outlets have reported that when Grok 4 was asked questions about the Israeli-Palestinian conflict, abortion and U.S. immigration law, it often searched for Musk's stance on these issues by referencing his X posts and articles written about him. And the release of Grok 4 comes after several controversies with Grok 3, the previous model, which issued outputs that included antisemitic comments, praise for Hitler and claims of 'white genocide'—incidents that xAI publicly acknowledged, attributing them to unauthorized manipulations and stating that the company was implementing corrective measures.
At one point during the launch, Musk commented on how making an AI smarter than humans is frightening, though he said he believes the ultimate result will be good—probably. 'I somewhat reconciled myself to the fact that, even if it wasn't going to be good, I'd at least like to be alive to see it happen,' he said.
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Tesla disappoints on earnings but wins on one major front
Tesla disappoints on earnings but wins on one major front

Yahoo

time9 minutes ago

  • Yahoo

Tesla disappoints on earnings but wins on one major front

Tesla disappoints on earnings but wins on one major front originally appeared on TheStreet. Tesla didn't mention Bitcoin once in its second-quarter 2025 financial filing, even as investors and analysts scanned the company's balance sheet for any sign of movement in its crypto treasury. The silence isn't new. Tesla hasn't added or sold any Bitcoin for eight straight quarters, and the company's digital asset holdings remain unchanged at $184 million, according to the 10-Q form filed with the SEC on July 23. That's the same value it reported in the first quarter of 2024, with no impairment losses or gains noted this time either. Tesla had initially bought $1.5 billion worth of Bitcoin in early 2021. Since then, it's sold off the majority, with the last major sale happening in Q2 2022 when it offloaded roughly 75% of its BTC stash. Tesla holds 9,720 BTC as of its last disclosure. At today's Bitcoin price of $118,000, that stash is worth approximately $1.15 declines Beyond crypto, Tesla's earnings disappointed on several fronts. The company reported revenue of $22.5 billion — missing analyst estimates of $22.74 billion — and adjusted earnings per share of $0.40, below the expected $0.43. Automotive revenue fell 16% year-over-year, the second straight quarterly decline. In early July, Tesla had already reported a 14% drop in Q2 vehicle deliveries to 384,000 units. The stock is down roughly 18% this year, marking the worst performance among big tech names. By comparison, the Nasdaq Composite is up about 9% in 2025. Meanwhile, Tesla has delayed its affordable 'Model 2' EV, leaving the field open for rivals. Chinese EV makers are aggressively pushing cheaper, tech-laden vehicles that are eating into Tesla's global market share. Still holding the bag Despite the financial and political turbulence, Tesla appears to be holding firm on its crypto position—for now. But with mounting pressure from declining revenues and reputational hits, investors are watching closely for any future changes to the company's digital asset strategy. As of now, though, the Bitcoin line in Tesla's earnings reports remains quiet. No buys. No sells. Just HODLing. Tesla disappoints on earnings but wins on one major front first appeared on TheStreet on Jul 23, 2025 This story was originally reported by TheStreet on Jul 23, 2025, where it first appeared. Sign in to access your portfolio

Tesla says it started building initial versions of an affordable car, posts a steep sales decline
Tesla says it started building initial versions of an affordable car, posts a steep sales decline

Yahoo

time9 minutes ago

  • Yahoo

Tesla says it started building initial versions of an affordable car, posts a steep sales decline

WASHINGTON (Reuters) -Tesla said on Wednesday it has built initial versions of an affordable car, a move likely meant to stem the steep decline in sales the company has experienced in markets across the world. Elon Musk's electric vehicle maker posted the worst quarterly sales decline in more than a decade and profit that missed Wall Street targets, but its profit margin on making cars was better than many feared. MARKET REACTION: Tesla shares were down nearly 5% in after-hours trading. COMMENTS: JACOB BOURNE, ANALYST, EMARKETER, NEW YORK: "Tesla's disappointing results aren't surprising given the rocky road it's traveled recently. But the company maintains a strong foundation in the key growth sectors of energy storage, robotics, and AI-powered transportation. While traditional automakers like GM are gaining EV market share, we can expect that Tesla will continue pushing the needle on innovation if it can get a better handle on its current leadership distractions. Key challenges that will linger are supply chain risks related to its reliance on China and fierce competition from Chinese EV makers in that market. Tesla doesn't need to choose between cars and future tech. There are technical synergies between EVs, robotaxis, energy systems, and robotics that could accelerate innovation across all fronts. The question is whether leadership can execute on this integrated vision in this fast-moving market." ANDREW ROCCO, STOCK STRATEGIST, ZACKS INVESTMENT RESEARCH, CHICAGO: "Tesla delivered earnings Wednesday night that fell short of top-and-bottom line expectations slightly. Despite the earnings miss, Tesla shares were buoyed early by the fact that the bleeding in gross margins has seemingly come to a halt, with gross margins coming in at 17.2% versus Wall Street estimates of 16.5%. The gross margin beat is especially impressive when investors consider that Tesla has been offering generous incentives and lower prices to its consumers. Coming in to the report, Wall Street expectations were dire amid a slowing core EV business and CEO Elon Musk's reputational damage on both sides of the political aisle. Nevertheless, Tesla has limped over the low bar that was a dramatic earnings miss and delivered earnings that were better than most feared. Additionally, news broke right before the market closed that Tesla is in "early talks" with the state of Nevada to expand its robotaxi service. If Tesla can convince investors that it will scale its robotaxi service rapidly, shareholders will be more forgiving regarding the core EV business, as Musk and his team look to expand into new verticals, transitioning and diversifying the business." THOMAS MONTEIRO, SENIOR ANALYST, "Although still far from what fundamentals would suggest for a trillion-dollar company, Tesla's latest numbers do spark some optimism, indicating that the worst is likely behind it—at least in terms of the core auto business. Given the plethora of headwinds faced during the difficult Q2 - both internally and on the macro front - margin deterioration appears to have come in at the lower end of the curve. When combined with improving cyclical demand dynamics in markets like China and parts of the US, this suggests that full-year results might not be as dire as previously expected following the disastrous first half of the year. This also shows that the company has weathered the tariff storm somewhat better than initially projected, optimizing the production/delivery equation in the US. While it remains unclear how much of a hit regulatory credits will take in Q3, it's evident the company will need to continue refining its production strategy elsewhere to better navigate the second half. From this perspective, recent product announcements are aligned with those strategic needs—particularly around the highly anticipated Model 2. With Tesla entering the Indian market and working to regain ground in China, we view this as a potential game-changer for H2. All things considered—and while we're still a ways off from seeing true fundamental support for the current share price—the outlook for the core business is looking somewhat better. This could continue to support Tesla's long-term transition into a fully AI/robotics-driven company, which appears to be where Musk is placing his bets." (Compiled by Reuters NewsEditing by Matthew Lewis)

Trump's AI Action Plan Is Here: 5 Key Takeaways
Trump's AI Action Plan Is Here: 5 Key Takeaways

CNET

time11 minutes ago

  • CNET

Trump's AI Action Plan Is Here: 5 Key Takeaways

The Trump administration on Wednesday laid out the steps it plans to take to ensure "global AI dominance" for the US, with an AI Action Plan that calls for cutting regulations to speed up the development of artificial intelligence tools and the infrastructure to power them. Critics said the plan is a handout to tech and fossil fuel companies, slashing rules that could protect consumers, prevent pollution and fight climate change. Though the plan itself isn't binding (it includes dozens of policy recommendations), Trump did sign three executive orders to put some of these steps into action. The changes and proposals follow how the Trump administration has approached AI and technology over the past six months -- giving tech companies a largely free hand; focusing on beating China; and prioritizing the construction of data centers, factories and fossil fuel power plants over environmental regulations. It's seizing on the moment created by the arrival of ChatGPT less than three years ago and the ensuing wave of generative AI efforts by Google, Meta and others. "My administration will use every tool at our disposal to ensure that the United States can build and maintain the largest and most powerful and advanced AI infrastructure anywhere on the planet," Trump said during remarks Wednesday evening at a summit presented by the Hill and Valley Forum and the All-In Podcast. He signed the three executive orders at the event. The administration and tech industry groups touted the plan as a framework for US success in a race against China. "President Trump's AI Action Plan presents a blueprint to usher in a new era of US AI dominance," Jason Oxman, president and CEO of the tech industry trade group ITI, said in a statement. Consumer groups said the plan focuses on deregulation and would hurt consumers by reducing the rules that could protect them. "Whether it's promoting the use of federal land for dirty data centers, giving the FTC orders to question past cases, or attempting to revive some version of the soundly defeated AI moratorium by tying federal funds to not having 'onerous regulation' according to the FCC, this is an unwelcome distraction at a critical time for government to get consumer protection right with increasing AI use and abuse," Ben Winters, director of AI and privacy at the Consumer Federation of America, said in a statement. Here's a look at the proposals in the plan. Slashing regulations for AI infrastructure The plan says AI growth will require infrastructure, including chip factories, data centers and more energy generation. And it blames environmental regulations for getting in the way. In response, it proposes exemptions for AI-related construction from certain environmental regulations, including those aimed at protecting clean water and air. It also suggests making federal lands available for data center construction and related power plants. To provide energy for all those data centers, the plan calls for steps to prevent the "premature decommissioning of critical power generation resources." This likely refers to keeping coal-fired power plants and other mostly fossil-fuel-driven infrastructure online for longer. In his remarks, Trump specifically touted his support for coal and nuclear power plants. The administration also called to prioritize the connection of new "reliable, dispatchable power sources" to the grid and specifically named nuclear fission and fusion and advanced geothermal generation. Earlier this month, the president signed a bill that would end many tax credits and incentives for renewable energy -- wind and solar -- years earlier than planned. Wind and solar make up the bulk of the new energy generation being added to the US grid right now. "This US AI Action Plan doesn't just open the door for Big Tech and Big Oil to team up, it unhinges and removes any and all doors -- it opens the floodgates, continuing to kneecap our communities' rights to protect ourselves," KD Chavez, executive director of the Climate Justice Alliance, said in a statement. "With tech and oil's track records on human rights and their role in the climate crisis, and what they are already doing now to force AI dominance, we need more corporate and environmental oversight, not less." Fewer rules around AI technology Congress ended up not including a moratorium on state AI rules in the recently passed tax and spending bill but efforts to cut regulations around AI continue from the executive branch in the action plan. "AI is far too important to smother in bureaucracy at this early stage, whether at the state or Federal level," the plan says. The plan recommends that several federal agencies review whether existing or proposed rules would interfere with the development and deployment of AI. The feds would consider whether states' regulatory climate is favorable for AI when deciding to award funding. Federal Trade Commission investigations and orders would be reviewed to determine that they don't "advance theories of liability that unduly burden AI innovation." Those rule changes could undermine efforts to protect consumers from problems caused by AI, critics said. "Companies -- including AI companies -- have a legal obligation to protect their products from being used for harm," Justin Brookman, director of tech policy at Consumer Reports, said in a statement. "When a company makes design choices that increase the risk their product will be used for harm, or when the risks are particularly serious, companies should bear legal responsibility." Ideology and large language models The plan proposes some steps around ensuring AI "protects free speech and American values," further steps in the Trump administration's efforts to roll back federal policies around what it refers to as "diversity, equity and inclusion," along with references to the problems of misinformation and climate change. It calls for eliminating references to those items in the National Institute of Standards and Technology's AI Risk Management Framework. Federal agencies would only be allowed to contract with AI developers who "ensure that their systems are objective and free from top-down ideological bias." The Trump administration has recently announced contracts of up to $200 million each to developers Anthropic, Google, OpenAI and xAI. Grok, the model from Elon Musk's xAI, has recently come under fire for spouting antisemitism and hate speech. Dealing with workforce challenges The plan acknowledges that AI will "transform how work gets done across all industries and occupations, demanding a serious workforce response to help workers navigate that transition" and recommends actions by federal agencies including the Department of Labor intended to mitigate the harms of AI-driven job displacement. The plan calls for the Bureau of Labor Statistics, Census Bureau and Bureau of Economic Analysis to monitor how AI affects the labor market using data already collected. An AI Workforce Research Hub under the Department of Labor would lead monitoring and issue policy recommendations. Most of the actual plans to help workers displaced by AI involve retraining those workers for other jobs or to help states do the same. Other jobs-related recommendations are aimed at boosting the kinds of jobs needed for all those data centers and chip manufacturing plants -- like electricians and HVAC technicians. These plans and others to encourage AI literacy and AI use in education drew praise from the Software & Information Industry Association, a tech industry trade group. "These are key components for building trust and ensuring all communities can participate in and benefit from AI's potential," Paul Lekas, SIIA's senior vice president of global public policy, said in a statement. More AI in government The plan envisions more use of AI by the federal government. A talent exchange program would allow employees with experience or talent in AI to be detailed to other agencies in need. The General Services Administration would create a toolbox of AI models that would help agencies see models to choose from and use cases in other parts of the government. Every government agency would also be required to ensure employees who could use AI in their jobs have access to and training for AI tools. Many recommendations focus specifically on the Department of Defense, including creating a virtual proving ground for AI and autonomous systems. AI companies have already been signing contracts with the DOD to develop AI tools for the military.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store