logo
Humans beat AI at annual math Olympiad, but the machines are catching up

Humans beat AI at annual math Olympiad, but the machines are catching up

Yahoo2 days ago
Sydney — Humans beat generative AI models made by Google and OpenAI at a top international mathematics competition, but the programs reached gold-level scores for the first time, and the rate at which they are improving may be cause for some human introspection. Neither of the AI models scored full marks — unlike five young people at the International Mathematical Olympiad (IMO), a prestigious annual competition where participants must be under 20 years old. Google said Monday that an advanced version of its Gemini chatbot had solved five out of the six math problems set at the IMO, held in Australia's Queensland this month. "We can confirm that Google DeepMind has reached the much-desired milestone, earning 35 out of a possible 42 points - a gold medal score," the U.S. tech giant cited IMO president Gregor Dolinar as saying. "Their solutions were astonishing in many respects. IMO graders found them to be clear, precise and most of them easy to follow." Around 10% of human contestants won gold-level medals, and five received perfect scores of 42 points. U.S. ChatGPT maker OpenAI said its experimental reasoning model had also scored a gold-level 35 points on the test. The result "achieved a longstanding grand challenge in AI" at "the world's most prestigious math competition," OpenAI researcher Alexander Wei said in a social media post. "We evaluated our models on the 2025 IMO problems under the same rules as human contestants," he said. "For each problem, three former IMO medalists independently graded the model's submitted proof." Google achieved a silver-medal score at last year's IMO in the city of Bath, in southwest England, solving four of the six problems. That took two to three days of computation — far longer than this year, when its Gemini model solved the problems within the 4.5-hour time limit, it said. The IMO said tech companies had "privately tested closed-source AI models on this year's problems," the same ones faced by 641 competing students from 112 countries. "It is very exciting to see progress in the mathematical capabilities of AI models," said IMO president Dolinar. Contest organizers could not verify how much computing power had been used by the AI models or whether there had been human involvement, he noted.
In an interview with CBS' 60 Minutes earlier this year, one of Google's leading AI researchers predicted that within just five to 10 years, computers would be made that have human-level cognitive abilities — a landmark known as "artificial general intelligence."
Google DeepMind CEO Demis Hassabis predicted that AI technology was on track to understand the world in nuanced ways, and to not only solve important problems, but even to develop a sense of imagination, within a decade, thanks to an increase in investment.
"It's moving incredibly fast," Hassabis said. "I think we are on some kind of exponential curve of improvement. Of course, the success of the field in the last few years has attracted even more attention, more resources, more talent. So that's adding to the, to this exponential progress."
Detroit lawnmower gang still going strong after 15 years
Legendary singer Ozzy Osbourne dies at 76
Sneak peek: The Case of the Black Swan (Part 1)
Solve the daily Crossword
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

I tried two new TECNO phones for the first time, and I was pleasantly surprised
I tried two new TECNO phones for the first time, and I was pleasantly surprised

Android Authority

time25 minutes ago

  • Android Authority

I tried two new TECNO phones for the first time, and I was pleasantly surprised

Ryan Haines / Android Authority As a US-based tech reviewer, I might get access to a lot of different devices, but I often find myself changing between the same few companies because of the limited number of brands that launch here. I'll jump from Samsung to Motorola to Google and back again, over and over, only sometimes getting lucky with a new launch from Nothing to spice things up. So, when I was offered the chance to check out a few recent launches from TECNO — a company I've never explored before — I jumped. I waited (mostly patiently) for the TECNO Spark 40 Pro Plus and Pova 7 Ultra to arrive, and I'm glad I did. For less than $250 each, they're more clever than I thought they'd be, and they live up to the old Android slogan, 'Be together, not the same,' much better than I expected, even if there are some concessions to hit their low, low price tags. The Spark 40 Pro Plus is a simple, smooth, cheap starting point Ryan Haines / Android Authority The tricky part about hopping into a pair of new TECNO launches is that I didn't know what to expect. They showed up, I unboxed them, and I reached for what looked like the more familiar phone first. That turned out to be the TECNO Spark 40 Pro Plus — a 4G-only budget model with shades of everything from the Galaxy S25 to the Motorola Edge (2022) baked into its slim plastic shell and a software experience that reminds me of OxygenOS from just a few years back. At a glance, none of those borrowed ideas might sound all that exciting, but hear me out — those phones all cost significantly more than the $150-180 Spark 40 Pro Plus (price depends on region). So, for TECNO to find ways to bring everything to a budget segment is exciting. The Spark 40 Pro Plus also brings back a waterfall display — a 6.78-inch AMOLED panel with a crisp 144Hz refresh rate — that's almost good enough for me to question why everyone abandoned it in the first place. Just kidding, I still struggled with covering it in fingerprints and accidental presses until I put the phone in its included silicone case, but it adds what I remember of a flagship touch without a flagship price, as this one starts at the equivalent of about $150. The Spark 40 Pro Plus looks a little bit like a lot of my favorite phones without the price tag. I'm also perhaps slightly surprised by how many AI-powered features TECNO packed into its Spark 40 Pro Plus. It essentially has its own version of many of Google's photo editing tools, from an AI Eraser 2.0 to an AI Extender, and supports Circle to Search, suggesting that it's not just TECNO's tools at play. I still prefer Google's editing tools — they have more punch to work with from the recent Tensor chips — but I like that TECNO is trying to bring tools to a more accessible price point. That said, there are a few bits of the Spark 40 Pro Plus that remind me that this is… well, a very cheap Android phone. Its 4G-only Helio G200 processor is only a slight upgrade over the previous Helio G100, and its single 50MP rear sensor feels like a dated choice, mainly because it's flanked by two other rings that look like additional camera sensors, but don't house anything. I would have loved even a simple ultrawide sensor for a bit more flexibility, but I've been pretty pleased by the results at 1x and 2x zoom. Of course, the bright side of the Spark 40 Pro Plus's power-sipping processor is that it can make the most out of the 5,200mAh battery. I've only had to reach for a charger once or twice while exploring the phone, and I've been pleased with the peak 45W speeds. They're the same as Samsung pushes to its flagship Galaxy S25 Ultra, while 30W wireless charging is quicker than I'm used to from most flagship phones, though I wouldn't have hit top speeds without TECNO's proprietary charging pad. I might actually keep using the Pova 7 Ultra as a cheap gaming phone Ryan Haines / Android Authority Once I felt like I had a pretty good feel for the Spark 40 Pro Plus, I decided it was time to switch gears to the TECNO phone I was more excited about in the first place: The Pova 7 Ultra. Dedicated gaming phones are few and far between here in the US — I usually have to hope for the best with something like the OnePlus 13 or Pixel 9 Pro — so I was curious how TECNO's dedicated hardware at a budget-friendly price point of around $230 would handle a few of my favorite titles. Put simply, it handled them pretty well with help from its 12-layer cooling architecture. Unfortunately, due to limited band support, I couldn't pop my personal Verizon SIM into the Pova 7 Ultra to make it a go-to option for Pokémon Go, but I had no problems racing through games while I was at home on Wi-Fi. It conducted its way through Railbound without a problem, let me take control of Warhammer 40,000 Tacticus as smoothly as I'm used to, and kept me from getting wrecked in PUBG Mobile. I'm still not good at the game, but at least it ran pretty well. Ryan Haines / Android Authority However, what caught me about the Pova 7 Ultra was its design. I know what they say about imitation and flattery, and I'm not even mad about it this time. There's no way to avoid the fact that the Pova 7 Ultra looks like a distant Nothing cousin, from its pseudo-transparent back panel to the Status Light that wraps around its triangular camera bump. Like the Spark 40 Pro Plus, that camera bump plays a trick on the eye by housing two cameras (a 108MP primary sensor and an 8MP ultrawide backup) while looking like it has room for a third. Although I'm a little disappointed that I couldn't give the Pova 7 Ultra its full run due to band support, I'm still impressed by what it packs under the hood. It pairs the Dimensity 8350 Ultimate with 256GB of storage and either 8GB or 12GB of RAM, though TECNO likes to claim it has 16GB or 24GB by converting a dash of its storage to serve as extended RAM. Ryan Haines / Android Authority And then, there's the battery. In true gaming phone fashion, TECNO packed its Pova 7 Ultra with a hefty 6,000mAh cell backed by 70W wired Ultra Charge and the same 30W wireless charging as the Spark 40 Pro Plus. I've had a tough time draining the cell through most of my in-home gaming sessions, but I've done my best to get it there so I could use the included 3,000mAh magnetic power bank. Before you get your Qi2 hopes up, though, know that you need a magnetic case to use the power bank, but thankfully, one comes in the box. Honestly, I didn't realize that TECNO was such a chameleon After about a week with the TECNO Pova 7 Ultra and the Spark 40 Pro Plus, I still have to say I'm impressed. No, I won't say that either phone is about to replace the Pixel that has a permanent place in my pocket, but I can see how they punch above their price tags. More impressively, they do it in different ways. Although they both run a very slightly Pixel-like HiOS 15 based on Android 15, the overall software experience is entirely different. It's a pretty standard affair on the Spark, pairing a somewhat iOS-like quick settings menu with a very colorful app drawer full of a mix of Google apps and in-house versions like Game Space, Hi Translate, and the Hola Browser. I'll still probably skew towards Google Translate and Chrome since I've been using them for years, but the Hola Browser interface is cuter than expected. On the Pova 7 Ultra, the same HiOS experience is completely different. Taking another page out of Nothing's book, the team at TECNO rebuilt a few hundred custom icons in a white, black, and orange color scheme and adopted a customized new font that reminds me of a certain other Android skin. It still has essentially the same slate of in-house TECNO apps for you to explore, they just have a bit more of a gaming edge. Both of these phones run HiOS, but the day-to-day experience is different as can be. Of course, we still have to talk about the budget-minded elephant in the room. Although I've had a lot of fun exploring TECNO for the last little while, and I appreciate that it still includes goodies like cases, wireless power banks, and even chargers in the box (with UK pins), it's still tough to hop on board in the US. Limited band support means that you'll mostly have to hunt for a Wi-Fi connection to use your Pova 7 Ultra or Spark 40 Pro Plus, which is a hard sell for a smartphone. I'm also wary of TECNO's update commitment. I wasn't expecting it to rival Google or Samsung with a seven-year promise. Still, the Pova 7 Ultra's two Android updates and three years of security support, and the Spark 40 Pro Plus's two years of security patches are more than a little behind the times. But, like I said, the Pova 7 Ultra starts at just $210 while the Spark 40 Pro Plus is even more approachable at $150 (though prices may vary by region), so either of these devices could find a role as your backup phone once it's run out of updates. Then again, Nothing finally has solid carrier support after three generations of launches, so maybe TECNO is next.

A new type of dealmaking is unnerving startup employees. Here are the questions to ask to make sure you don't get left out.
A new type of dealmaking is unnerving startup employees. Here are the questions to ask to make sure you don't get left out.

Business Insider

time27 minutes ago

  • Business Insider

A new type of dealmaking is unnerving startup employees. Here are the questions to ask to make sure you don't get left out.

As a new kind of dealmaking is sweeping Silicon Valley, forcing employees to be vigilant about how much trust they are willing to put in startup founders. Over the past two years, instead of acquiring AI startups outright, Big Tech companies have been licensing their technology or making deals for top talent, with startup employees sometimes getting divided into separate camps of haves and have-nots. Those with the most desirable AI skills reap a windfall while those who remain are shrouded in uncertainty. That recently happened to Windsurf employees after the AI coding company was on the verge of being acquired by OpenAI for $3 billion, but was instead split in half. Google paid billions to hire Windsurf's CEO and top talent, and the hundreds of employees who remained were bought by another startup, Cognition. Unfortunately for startup employees, many investors expect these kinds of novel transactions to continue as the velocity of developments in AI makes companies unlikely to want to wait months or years for regulatory approval. Candidates need to ask tough questions about the founder Given traditional M&A has mostly gone out the window, it is more important than ever for startup employees to do their homework, advises Steve Brotman, managing partner at Alpha Partners. "In light of what we just saw with Windsurf, it's crucial to understand the ownership dynamics," Brotman said. " You don't want to be working 100-hour weeks only to realize your options are underwater or your exit upside is capped. And remember: companies that are transparent and deliberate about governance tend to be better long-term bets, both for your career and your equity." "Ask hard questions about runway, revenue, burn, and investor syndicate quality," Brotman continued. "Who's on the board? Are they structured for long-term growth or a quick flip?" The most important thing candidates should assess is how much they trust the founder, according to Deedy Das, an investor at Menlo Ventures. "Nobody wants to talk about the fact that founders control almost everything that happens in a company, including how you get paid, when you get paid, how the equity vests, and when you can sell the equity," said Das. "It's everything, so having trust in your founder to do the right thing by the team is extremely important." Just as investors would typically research a founder before writing a check to one of their many portfolio companies, prospective employees should ask around about founders whom they could be tied to for years, said Hari Raghavan, cofounder and CEO of Autograph. "They should be doing diligence on whether this is a standup person," said Raghavan. "Do your best to suss out, 'Are these guys going to take care of me?'" Raghavan suggests that founders should sign a written pledge agreeing to treat employees well in terms of stock options and exit scenarios. "These are things that any good founder should be doing, and the vast majority of good ones do, but I think even just establishing that set of rules is a good idea," he said. Prospective employees should not be afraid to "interrogate" a founder on how they are thinking about an exit, according to Jake Saper, a general partner at Emergence Capital. "Ask founders how they would weigh staying independent, a classic acquisition, or a licensing deal that carves out key people," Saper said. "Their answer tells you a lot about the journey you're signing up for." Scrutinizing the fine print has also become more important, said Saper. "Make sure offer letters and stock agreements spell out vesting acceleration, treatment of options, and retention bonuses if only 'substantially all' of the team moves," Saper said. "Those clauses mattered at Inflection and Windsurf, and they will matter again." In 2024, Microsoft hired the founder of Inflection AI, Mustafa Suleyman, and some of the startup's staff to help lead its AI efforts. In June, Meta paid $14 billion for a 49 percent stake in the data labeling company Scale AI and hired its founder, Alexandr Wang, to run its Superintelligence group. Meta also hired some of the startup's researchers. Last week, Scale AI laid off 14% of its workforce, or 200 employees, and revealed it is unprofitable. Finally, Saper says to take a hard look at the underlying business model of a startup to make sure it can last. "Startups with unique data feeds, embedded distribution or clear recurring revenue have leverage to stay independent," Saper said. "If a company's main asset is a brilliant but portable research team, you should assume Big Tech will come knocking."

I used Alexa+ vs ChatGPT to generate 5 AI images — and the results surprised me
I used Alexa+ vs ChatGPT to generate 5 AI images — and the results surprised me

Tom's Guide

timean hour ago

  • Tom's Guide

I used Alexa+ vs ChatGPT to generate 5 AI images — and the results surprised me

You probably know Alexa as the voice that sets timers, dims the lights and plays music, but after testing Alexa+ for a few weeks, I'm discovering new features nearly every I discovered that upgraded Alexa has a creative side and can now generate realistic images on the fly. In my testing, I was impressed by the speed and ease of generating the images, that I couldn't help but do a side-by-side image generation comparison with ChatGPT. Whether you ask for a watercolor of a cozy cabin, a photorealistic puppy or a design mockup, Alexa+ responds in seconds, then texts the image directly to your phone. It's surprisingly intuitive, shockingly fast and potentially a game-changer for everyday what happened when I tested Alexa+ versus ChatGPT and the images each bot generated. Prompt: 'A friendly robot mom, dad and two robot kids sitting at a kitchen table eating spaghetti, surrounded by retro 1950s-style decor, with warm lighting and a dog robot under the table.'ChatGPT automatically went with a cartoonish image of a robot family. It nailed almost everything except for the robot dog and opted for a realistic-looking dog presented a modern-looking image of a robot family also with a 'real' dog. It missed aspects of the prompt such as 'two robot kids' and the style of the ChatGPT wins for accurately following the prompt. Prompt: 'People browsing colorful fruit and vegetable stands at an outdoor farmers market. A woman holds a bouquet of sunflowers, a man samples cheese, and kids eat popsicles. Background includes food trucks and string lights.'ChatGPT once again hit every detail of the prompt, even if it's hard to tell the man is sampling offered a more diverse and more realistic look at a farmer's market, though it's not completely obvious if the man in the image is sampling Alexa+ wins for an image that better captures the energy and diversity of a farmer's market. Prompt: 'Cars lined up outside an elementary school on a gray rainy morning. Parents holding umbrellas hustle their kids to the entrance. Backpacks, rain boots, and puddles all around. View from a car window with raindrops.'ChatGPT created an image that is storybook and highlights the mood of a rainy school day. The backpacks in the puddles is a glaringly unrealistic delivered a photorealistic image that also captures the mood, but looks less like it is a view from a car and doesn't capture the hustle that the prompt asks for. Winner: ChatGPT wins for following the prompt and better storytelling with this image. Prompt: "A golden retriever sitting next to a toddler on a cozy living room rug. The toddler is offering a cracker, and the dog gently takes it. Toys and a sippy cup are scattered around. Natural window light, soft and heartwarming." ChatGPT used soft, warm lighting in an image that evokes a heartwarming, almost storybook-like feel, which best fits in line with the prompt request 'soft and heartwarming' made a great photo that looks very natural and realistic, but it's missing details such as the sippy cup and the dog is not taking the cracker. While it may arguably be a better image, it loses because of missed aspects of the prompt. Winner: ChatGPT wins for the best composition match, setting and details. Prompt: "A diverse group of people crossing a city street in the early morning — a woman in heels with a coffee, a jogger with headphones, a dad pushing a stroller, and a teen on a scooter. Background includes traffic lights, brick buildings, and steam rising from a manhole. Overcast sky, everyday realism."ChatGPT captured all of the details within the prompt but delivered a messy and cluttered image that does not look real and one I wouldn't use for delivered a polished image that highlights the prompt, but misses several Draw. ChatGPT hit every detail of the prompt, but at the cost of an unrealistic crosswalk situation. Alexa+ made an image that better captures the vibe of a busy city but misses out on key details of the prompt. After five rounds of head-to-head testing, one thing is clear: ChatGPT might take a more whimsical or storybook approach, but it consistently nails the specifics. However, I was pleasantly surprised by Alexa+. It generated refined images that often looked more realistic than ChatGPT that are instantly shareable. And it was faster than ChatGPT, too. But in most cases, Alexa+ fell short when it came to following the actual prompt. Whether it's missing a sippy cup, skipping key characters, or glossing over a specific setting, the details matter, especially when you're generating visuals with a purpose. So, while Alexa+ has potential and impressive speed, it still loses out to ChatGPT when prompt accuracy counts. My suggestion to users is to use Alexa+ for speed and realism, but be ready to tweak any resulting image with follow up prompts.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store