logo
The LIGO Lab Is Pushing the Boundaries of Gravitational-Wave Research

The LIGO Lab Is Pushing the Boundaries of Gravitational-Wave Research

Rachel Feltman: For Scientific American 's Science Quickly, I'm Rachel Feltman.
Today we're leaving the podcast studio to take you on a field trip to the LIGO Lab at the Massachusetts Institute of Technology. We're going to chat with Matthew Evans, MIT's MathWorks professor of physics, all about the hunt for gravitational waves.
You'll notice that the sound quality isn't up to our usual standard, but that's because we were right there in the lab, surrounded by big, loud science machines. If you want to see all that cool stuff for yourself, head over to our YouTube channel for an extended video version of this episode.
On supporting science journalism
If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
Here's our conversation with Matt.
Thanks so much for joining us.
Matt Evans: Thank you for having me.
Feltman: So a few years ago we heard a lot about gravitational waves all of a sudden—many of us had not heard of them before that.
Evans: Mm-hmm.
Feltman: Could you remind us what they are and what happened that was so exciting?
Evans: Yeah, so I guess that was almost 10 years ago now, so ...
Feltman: Well, that's wild. I don't want to think about that [laughs].
Evans: [Laughs]2016 was when the announcement was made; 2015 was the discovery. And that was the first time that we had detected gravitational waves, despite the fact that we'd been working for many years on the detectors. That was the moment when we were upgrading to the Advanced LIGO detectors, and our first detection of gravitational waves was back in 2015.
Feltman: And what is a gravitational wave?
Evans: What is a gravitational wave? Well, the, the, like, really concise answer is: it's a ripple in spacetime. And then one could ask, 'Why would we care about a ripple in spacetime? How can we even detect such a thing?' You don't think of your life as going around measuring spacetime. But it turns out that for us that just means th at things move around, and so our detectors are made with big mirrors, which are heavy masses, and when these gravitational waves pass by they move the mirrors in our detectors. So fundamentally, it's a wiggling of, of space, a wiggling of our detector, that we don't explain by anything else going on around.
Feltman: And so what is LIGO? How did it make it possible for us to finally detect gravitational waves?
Evans: So LIGO is an interferometer. It's based on a concept from, what, the 1800s of interferometry, where you can make a very sensitive measurement of the position of some object by using light waves, and the LIGO gravitational-wave detectors are basically gigantic interferometers. And what we're interfering, in our case, are two laser beams, and they look for a change in the position of the mirrors that are far away from a beam splitter—so far away in this case is two and a half miles, or four kilometers—and a passing gravitational wave will move our mirrors around, and we're looking for that motion.
So we start out with a laser, which is at our corner building—it's sort of the, sort of central location of LIGO—and we send that laser down to two buildings that are far away; these are the end stations. They're each two and a half miles away from the corner, and they're L-shaped, like this vacuum system you see behind us.
Those two laser beams return back to the central station, and the two laser beams are made of electromagnetic waves, and those waves interfere on a beam splitter when they meet on that mirror. This mirror reflects half of the light in this direction and half of the light in that direction. And depending on the relative phase, or relative timing, of these two waves, the light will either go that way or go this way. And we're just detecting the amount of light that comes out one side of our detector, and that's our interferometer allowing us to measure the distance, but that measurement is on the scale of the wavelength of light, so micron scale.
Feltman: And so what are we in front of right now?
Evans: Yeah, so this is a prototype here, here at MIT, where we test components before they go to the LIGO observatories, and this is like a little mini LIGO here. So we have a large chamber for putting our isolation systems and our mirrors; that's where we test out the first suspension systems. These tubes [are] where we propagate our laser beams. We have a smaller chamber down there, which you'll see is not very small, but it's for testing the smaller suspension systems where we hang mirrors.
Our suspensions and isolation systems are all to keep our mirrors from moving by the ground shaking, essentially, 'cause we want them to be as still as possible so that when they do move we'll know that it's from a gravitational wave and not from a truck or the Red Line or whatever else.
Feltman: Yeah, can you give us a sense of how sensitive these instruments need to be to avoid picking up noise and actually find gravitational-wave ripples in spacetime?
Evans: Yeah, so the answer is mind-blowingly sensitive, and I'll try to put this in, in scale.
So the LIGO detectors should be able to measure a motion of the, the mirrors that are four kilometers away from the central building on a scale of about 1,000th the size of a proton, so this is—10 -18 meters is roughly the, the scale here. And it's beyond microscopic; it's [a] subatomic level of measurement.
The only way that we get away with that is [we're] measuring a large surface of the mirror and we're averaging over many, many atoms, and that's how we can measure the average position to a level that's much smaller than the atomic size.
Feltman: And the MIT LIGO is not the only LIGO. Can you remind us why that is?
Evans: Ah, yeah, so, so first, just to be super clear, this is a place where we prototype stuff ...
Feltman: Right, yeah.
Evans: We don't detect gravitational waves here. So the same sort of operation is at Caltech; there's the Caltech LIGO Lab. And it's where a lot of the engineering and administrative staff are. They also have a big research staff there. And again, the idea is to build up systems, which then get delivered to the observatories. There are two of those: one is in Washington State, and one is in Louisiana.
Feltman: So speaking of prototypes, what has LIGO been up to since that big detection news 10 years ago?
Evans: So the big detection happened after we had gotten—some of the things you see here are the prototypes that went in to make Advanced LIGO possible, and that's what made that first detection possible.
Since then we've been working on—I think the highlight for MIT is quantum technologies, so we've been working on squeezed light sources. And the idea here is that if we modify the quantum state of our interferometer, we can lower the noise at the readout and detect gravitational waves from more distant sources.
Feltman: Cool, and what would that allow us to do?
Evans: The farther away you can detect a source, like a binary black hole system coalescing, the more of them you can see. And we have this feature that our detection rate goes with the volume of space we're sensitive to, so if we make the detectors twice as sensitive, they also see twice as far, which gives us eight times larger volume, and we get a lot more events to look at.
So right now we're at roughly an event per week, whereas when we first started we were at one event, if you're lucky, in a year.
Feltman: And so for, you know, the average person who's maybe interested in space but doesn't know a ton about gravitational waves, why is it important that we look for these events?
Evans: So we are detecting, right now, binary systems, and these can be pairs of, of black holes, pairs of neutron stars or a mix-and-match black hole-neutron star system, so a mixed pair. And the interesting thing about these sources is that these are the remnants of big stars ...
So large stars that have burned their fuel and collapsed make neutron stars and black holes. And we can detect individual sources from very far away, so 'high redshift' in astro-speak. And with future detectors we'll be able to get really to the edge of the known universe in terms of our ability to detect these sources.
These are essentially the stellar graveyard—so the place where big stars go to die. And by detecting these sources, individual sources, we can actually learn about the stellar graveyard and in, in that way about the stars that exist and existed in the universe.
Feltman: Very cool. So what's next for LIGO?
Evans: So LIGO is working on the next upgrade. We upgrade these detectors regularly; it's really still a new technology—it's only 10 years since the first detection. And we work on making the detectors better as a matter of course. We're always trying to make them better.
The next upgrade will be to put in better mirrors. Essentially, again, we're averaging over the surface, over the mirror, to make this measurement. We need a really good surface, and that comes down to the coatings we put on the mirrors, so we're putting in better mirrors with better coatings. That's the next thing. We'll be working on improving our squeezed light source to lower the quantum noise in the detector. So basically incremental improvements to the current detectors.
We'll then be working on a relatively large upgrade on a timescale of five years from now and from there incremental upgrades, essentially, for the lifetime of those detectors. And that lifetime is really until we get a next-generation detector going.
Feltman: Mm.
Evans: And I'm wearing the shirt of Cosmic Explorer here, which is the—our idea for the next generation of detectors.
Feltman: Yeah, tell me about Cosmic Explorer. What's gonna be different about those detectors?
Evans: Well, over 10 years ago now—and this is in 2014—we realized that we were never gonna be clever enough to really do everything we wanted to do with the current facilities ...
Feltman: Mm.
Evans: And we were going to have to build bigger detectors at some point. And so over the last—a little more than a decade we've been developing the idea of what these new, bigger detectors would look like, and that's developing this thing called Cosmic Explorer. It's like a supersized LIGO—factor of 10 larger, so 25 miles [about 40 kilometers] on a side.
Feltman: Wow.
Evans: And as things go roughly a factor of 10 more sensitive. With these detectors we could detect events from throughout the universe.
Feltman: Wow, and what's ...
Evans: Yeah, wow [laughs].
Feltman: The timeline looking at [laughs]—looking like for that?
Evans: At this particular moment in history it's hard to say.
Feltman: Sure.
Evans: I will go ahead and be optimistic, and I'll say early 2030s we could be building and mid- to late 2030s we could be detecting. And we hope that the LIGO detectors will still be operating and turning out great results into sort of 2040 ...
Feltman: Yeah.
Evans: So we'd have a, a good handoff to the new detectors as they come online in the late 2030s.
Feltman: What's on your wish list for, you know, the kinds of science that might become possible with Cosmic Explorer?
Evans: So once we're detecting sources out to high redshift—so we really get a sample of everything that's out there in the universe—we get to learn about how, you know, stars have evolved not just around us, the local universe, but even at the peak of star formation, so z of 2, and then farther out towards the beginnings of star formation, when the first stars were being formed. The heaviest of stars came from those times. So we really get to have a kind of cross section of the evolution of the universe going back in time.
And in astronomy there's always this feature that the farther away you look, the farther back in time you're looking.
Feltman: Yeah.
Evans: So we get to look back towards the beginning of the universe, in some sense, with gravitational waves as we look at these sources that are farther and farther away. With Cosmic Explorer we'll have not just one or two but hundreds of thousands of sources from the distant universe. So it's a really exciting way to explore the universe as a whole by looking at this stellar graveyard.
Feltman: And for you personally, you know, what questions really motivate you? Why are you so curious about this?
Evans: So my history is instrument science. I've always worked with the lasers and the electronics and the mechanical systems; that's where my love of the thing began. And I see Cosmic Explorer as really an extension of our first attempt. The LIGO detectors are the first attempt—first successful attempt, at least to detect gravitational waves, and Cosmic Explorer is the natural [next] iteration of that, where we get to apply all the lessons we've learned from these detectors to make the next generation, which is a much better detector technologically and, and incorporates now decades' worth of, of learning in—on, on the instrument side ...
Feltman: Yeah.
Evans: And of course, I'm also excited about the astrophysics we do, but for me the first love of that is really the instrument side. So it's a natural extension of everything we've learned over the last decade.
Feltman: Yeah, well, and speaking of, you know, the instrument side, the data, the astrophysics, one of the things that I remember most about that initial gravitational-wave detection were just how many people were involved in the paper tied to the announcement—I think there were more than 1,000 co-authors of, of that paper. How many people are, are working on LIGO, on average?
Evans: So it's a very interesting question 'cause if you go to the, the number of people you saw on the author list of that first paper, that's the LIGO Scientific Collaboration ...
Feltman: Right.
Evans: And also Virgo, so the detector in, in Italy. And you get a, a large group of, of scientists—the whole community, essentially, of gravitational-wave scientists is really a global affair, and we're at something like 2,000 people now in that community, depending on how you draw the, the boundaries.
The, the people working on the LIGO detector is a smaller group , maybe about 200 people, and many of those are at MIT or Caltech. So the next cut-down would be: 'How many people are actually at the observatories?' And there you get an even smaller number, maybe 50 at each observatory.
Feltman: Mm.
Evans: And then you say: 'Who's really, like, in the control room, turning the screws, making it better, doing the instrument science in the observatories?' Oftentimes those are graduate students and postdocs.
Feltman: Yeah.
Evans: So there you get to an even smaller number—five or 10. And of course, all the rest of the community is necessary for that work to be fruitful, but the number of people who are, are there actually with their hands on the machine is relatively small. And I, I point this out because often people think that the—you know, the graduate students will come in and say, 'What can I ever do that's impactful in such a large organization?'
Feltman: Yeah.
Evans: Well, the truth is that our students and our postdocs are very impactful, and, and they're the ones who are often the ones there, you know, really with their hands on the machine doing the work.
Feltman: That's really cool.
So obviously, it's really exciting to think about, you know, detecting more of the kinds of phenomena we've seen, seeing them farther out. Is there also any hope of detecting stuff we've never seen before?
Evans: Yeah, so let me first say that I'm super excited about the stuff that we already know exists, and we can calculate rates for them, and for every binary black hole system we detect we find some interesting feature. And as we go from 100 detections to 100,000 detections there'll be really fun corner cases that we get to explore, so there will be new things even in our current population.
Of course, we also would love to detect something that we've never seen before, but I have no idea how often they happen out in the universe, right? Maybe these are, you know, some strange kinds of supernova that admit copious gravitational waves or cosmic strings or any number of other things that we have not observed. I don't know what the rate will be, but they're very exciting sources, and we'd love to detect them.
Feltman: So for folks who are like, 'I'm down here on Earth; what are these gravitational waves and their detection gonna do for me?'
Evans: Mm-hmm.
Feltman: Are there any exciting things that we might be able to learn from gravitational waves that'll have applications on Earth, besides just the awesome science we're figuring out?
Evans: Yeah, so I'm, I'm sad to say we won't be making your cell phones better anytime soon, and I don't think that we'll be transmitting or receiving gravitational waves from your radio devices or using them for wireless or anything like that.
However, first, I would say: learning about the universe is, in and of itself, for me, a great objective, and I think that's true for a lot of people ...
Feltman: Sure, yeah.
Evans: That learning about the universe is a, is a wonderful thing in its own right. However, we also do look at the, the spin-offs that could come from our technology. And we do work on high-precision lasers; we have helped companies develop higher-precision lasers that we then use, but they're used in other applications. Our squeezed light sources are sort of broadly applicable in quantum information and quantum computing. And so we see these spin-offs as interesting things, which are not our primary objective, but yeah, there are technological spin-offs that come from the development we do to make our detectors better.
Feltman: Well, thank you so much for sitting down to chat with us and for showing us around. This has been really cool, and I'm really excited to, you know, see what happens when we can look back to the beginning of the universe.
Evans: Thanks for the opportunity to talk about this really exciting science.
Feltman: That's all for today's episode, but it doesn't have to be. We've posted an extended version over on our YouTube channel, so take a few minutes to go check that out. We'll be back on Friday with an episode I'm super excited to share with you. It's all about Dungeons and Dragons—and also science, I promise.
Science Quickly is produced by me, Rachel Feltman, along with Fonda Mwangi, Kelso Harper, Naeem Amarsy and Jeff DelViscio. This episode was edited by Alex Sugiura. Shayna Posses and Aaron Shattuck fact-check our show. Our theme music was composed by Dominic Smith. Subscribe to Scientific American for more up-to-date and in-depth science news.
For Scientific American, this is Rachel Feltman. See you on Friday!
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

NASA's supersonic X-59 jet that could slash NYC-London flight time in half taxis closer to take off
NASA's supersonic X-59 jet that could slash NYC-London flight time in half taxis closer to take off

New York Post

timea day ago

  • New York Post

NASA's supersonic X-59 jet that could slash NYC-London flight time in half taxis closer to take off

Breakfast in New York, midmorning snack in London. Taxi tests are underway on a highly anticipated supersonic plane designed to quietly break the sound barrier — and slash flight time between New York and London in half. The X-59 jet, dubbed the 'son of Concorde,' is one step closer to takeoff after the experimental aircraft taxied on a California runway at low speed using its own power for the first time on July 10, NASA said in a press release. The ground maneuvers at the US Air Force's Plant 42 in Palmdale mark the final series of trials for the 100-foot-long, 30-foot-wide jet before its maiden voyage, which is slated for sometime this year, according to the space agency. 3 NASA's X-59 quiet supersonic aircraft taxiing on a runway. Youtube/NASA Armstrong Flight Research Center 'Over the coming weeks, the aircraft will gradually increase its speed, leading up to a high-speed taxi test that will take the aircraft just short of the point where it would take off,' officials said. The high-tech plane, unveiled by NASA and Lockheed Martin last year, is the centerpiece of the space agency's QueSST mission to produce a quieter sonic boom for communities below and revolutionize air travel, potentially cutting transit time down significantly for commercial flights. It could possibly fly from New York to London flight in three and a half hours, the agency previously said. 3 NASA's X-59 quiet supersonic aircraft conducting its final tests before its maiden voyage. Carla Thomas/NASA / SWNS The new aircraft's innovative design and shape will cause it to produce a quiet 'thump' sound as it reaches speeds of up to 925 miles per hour, officials said. Supersonic flights have been banned in the US and other countries for the past half-century due to the thunderous sound generated when planes exceed the speed of sound — 767 miles per hour. 3 The aircraft is expected to revolutionize air travel. NASA But the X-59's thin, tapered nose is expected to break up shock waves that would cause the high-speed roar on a conventional aircraft, NASA previously boasted. The latest innovation will succeed the British Airways Concorde, which reached speeds of around 1,350 miles per hour and completed its fastest transatlantic flight in just under three hours on Feb. 7, 1996. The aircraft, which debuted in 1976, was plagued by costly maintenance and a fatal 2000 crash. It was retired from commercial service in 2003.

Scientists measure largest ever collision of two black holes
Scientists measure largest ever collision of two black holes

Yahoo

timea day ago

  • Yahoo

Scientists measure largest ever collision of two black holes

Two black holes have collided far beyond the distant edge of the Milky Way, creating the biggest merger ever recorded by gravitational wave detectors. The two phenomena, each more than 100 times the mass of the sun, had been circling each other before they violently collided about 10 billion light years from Earth. Scientists at the Ligo Hanford and Livingston Observatories detected ripples in space-time from the collision just before 2pm UK time on 23 November 2023, when the two US-based detectors in Washington and Louisiana twitched at the same time. Alongside their enormous masses, the signal, dubbed GW231123 after its discovery date, also showed the black holes spinning rapidly, according to researchers. 'This is the most massive black hole binary we've observed through gravitational waves, and it presents a real challenge to our understanding of black hole formation,' said Professor Mark Hannam, from Cardiff University and a member of the Ligo Scientific Collaboration. An artist's impression of a black hole using data from Nasa's James Webb Space Telescope (Nasa/JWST) Gravitational-wave observatories have recorded around 300 black hole mergers. Prior to GW231123, the heaviest merger detected was GW190521, whose combined mass was 140 times that of the sun. The latest merger produced a black hole up to 265 times more massive than the sun. 'The black holes appear to be spinning very rapidly — near the limit allowed by Einstein's theory of general relativity,' said Dr Charlie Hoy from the University of Portsmouth. 'That makes the signal difficult to model and interpret. It's an excellent case study for pushing forward the development of our theoretical tools.' 'It will take years for the community to fully unravel this intricate signal pattern and all its implications,' said Dr Gregorio Carullo, assistant professor at the University of Birmingham. 'Despite the most likely explanation remaining a black hole merger, more complex scenarios could be the key to deciphering its unexpected features. Exciting times ahead!" Facilities like Ligo in the United States, Virgo in Italy, and KAGRA in Japan are engineered to detect the tiniest distortions in spacetime caused by violent cosmic events such as black hole mergers. The fourth observing run began in May 2023, and data through January 2024 are scheduled for release later this summer. 'This event pushes our instrumentation and data-analysis capabilities to the edge of what's currently possible,' says Dr Sophie Bini, a postdoctoral researcher at Caltech. 'It's a powerful example of how much we can learn from gravitational-wave astronomy — and how much more there is to uncover.' GW231123 is set to be presented at the 24th International Conference on General Relativity and Gravitation (GR24) and the 16th Edoardo Amaldi Conference on Gravitational Waves, held jointly as the GR-Amaldi meeting in Glasgow, from 14 to 18 July.

AI's Achilles Heel—Puzzles Humans Solve in Seconds Often Defy Machines
AI's Achilles Heel—Puzzles Humans Solve in Seconds Often Defy Machines

Scientific American

time2 days ago

  • Scientific American

AI's Achilles Heel—Puzzles Humans Solve in Seconds Often Defy Machines

There are many ways to test the intelligence of an artificial intelligence —conversational fluidity, reading comprehension or mind-bendingly difficult physics. But some of the tests that are most likely to stump AIs are ones that humans find relatively easy, even entertaining. Though AIs increasingly excel at tasks that require high levels of human expertise, this does not mean that they are close to attaining artificial general intelligence, or AGI. AGI requires that an AI can take a very small amount of information and use it to generalize and adapt to highly novel situations. This ability, which is the basis for human learning, remains challenging for AIs. One test designed to evaluate an AI's ability to generalize is the Abstraction and Reasoning Corpus, or ARC: a collection of tiny, colored-grid puzzles that ask a solver to deduce a hidden rule and then apply it to a new grid. Developed by AI researcher François Chollet in 2019, it became the basis of the ARC Prize Foundation, a nonprofit program that administers the test—now an industry benchmark used by all major AI models. The organization also develops new tests and has been routinely using two (ARC-AGI-1 and its more challenging successor ARC-AGI-2). This week the foundation is launching ARC-AGI-3, which is specifically designed for testing AI agents—and is based on making them play video games. Scientific American spoke to ARC Prize Foundation president, AI researcher and entrepreneur Greg Kamradt to understand how these tests evaluate AIs, what they tell us about the potential for AGI and why they are often challenging for deep-learning models even though many humans tend to find them relatively easy. Links to try the tests are at the end of the article. On supporting science journalism If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today. [ An edited transcript of the interview follows. ] What definition of intelligence is measured by ARC-AGI-1? Our definition of intelligence is your ability to learn new things. We already know that AI can win at chess. We know they can beat Go. But those models cannot generalize to new domains; they can't go and learn English. So what François Chollet made was a benchmark called ARC-AGI—it teaches you a mini skill in the question, and then it asks you to demonstrate that mini skill. We're basically teaching something and asking you to repeat the skill that you just learned. So the test measures a model's ability to learn within a narrow domain. But our claim is that it does not measure AGI because it's still in a scoped domain [in which learning applies to only a limited area]. It measures that an AI can generalize, but we do not claim this is AGI. How are you defining AGI here? There are two ways I look at it. The first is more tech-forward, which is 'Can an artificial system match the learning efficiency of a human?' Now what I mean by that is after humans are born, they learn a lot outside their training data. In fact, they don't really have training data, other than a few evolutionary priors. So we learn how to speak English, we learn how to drive a car, and we learn how to ride a bike—all these things outside our training data. That's called generalization. When you can do things outside of what you've been trained on now, we define that as intelligence. Now, an alternative definition of AGI that we use is when we can no longer come up with problems that humans can do and AI cannot—that's when we have AGI. That's an observational definition. The flip side is also true, which is as long as the ARC Prize or humanity in general can still find problems that humans can do but AI cannot, then we do not have AGI. One of the key factors about François Chollet's benchmark... is that we test humans on them, and the average human can do these tasks and these problems, but AI still has a really hard time with it. The reason that's so interesting is that some advanced AIs, such as Grok, can pass any graduate-level exam or do all these crazy things, but that's spiky intelligence. It still doesn't have the generalization power of a human. And that's what this benchmark shows. How do your benchmarks differ from those used by other organizations? One of the things that differentiates us is that we require that our benchmark to be solvable by humans. That's in opposition to other benchmarks, where they do 'Ph.D.-plus-plus' problems. I don't need to be told that AI is smarter than me—I already know that OpenAI's o3 can do a lot of things better than me, but it doesn't have a human's power to generalize. That's what we measure on, so we need to test humans. We actually tested 400 people on ARC-AGI-2. We got them in a room, we gave them computers, we did demographic screening, and then gave them the test. The average person scored 66 percent on ARC-AGI-2. Collectively, though, the aggregated responses of five to 10 people will contain the correct answers to all the questions on the ARC2. What makes this test hard for AI and relatively easy for humans? There are two things. Humans are incredibly sample-efficient with their learning, meaning they can look at a problem and with maybe one or two examples, they can pick up the mini skill or transformation and they can go and do it. The algorithm that's running in a human's head is orders of magnitude better and more efficient than what we're seeing with AI right now. What is the difference between ARC-AGI-1 and ARC-AGI-2? So ARC-AGI-1, François Chollet made that himself. It was about 1,000 tasks. That was in 2019. He basically did the minimum viable version in order to measure generalization, and it held for five years because deep learning couldn't touch it at all. It wasn't even getting close. Then reasoning models that came out in 2024, by OpenAI, started making progress on it, which showed a step-level change in what AI could do. Then, when we went to ARC-AGI-2, we went a little bit further down the rabbit hole in regard to what humans can do and AI cannot. It requires a little bit more planning for each task. So instead of getting solved within five seconds, humans may be able to do it in a minute or two. There are more complicated rules, and the grids are larger, so you have to be more precise with your answer, but it's the same concept, more or less.... We are now launching a developer preview for ARC-AGI-3, and that's completely departing from this format. The new format will actually be interactive. So think of it more as an agent benchmark. How will ARC-AGI-3 test agents differently compared with previous tests? If you think about everyday life, it's rare that we have a stateless decision. When I say stateless, I mean just a question and an answer. Right now all benchmarks are more or less stateless benchmarks. If you ask a language model a question, it gives you a single answer. There's a lot that you cannot test with a stateless benchmark. You cannot test planning. You cannot test exploration. You cannot test intuiting about your environment or the goals that come with that. So we're making 100 novel video games that we will use to test humans to make sure that humans can do them because that's the basis for our benchmark. And then we're going to drop AIs into these video games and see if they can understand this environment that they've never seen beforehand. To date, with our internal testing, we haven't had a single AI be able to beat even one level of one of the games. Can you describe the video games here? Each 'environment,' or video game, is a two-dimensional, pixel-based puzzle. These games are structured as distinct levels, each designed to teach a specific mini skill to the player (human or AI). To successfully complete a level, the player must demonstrate mastery of that skill by executing planned sequences of actions. How is using video games to test for AGI different from the ways that video games have previously been used to test AI systems? Video games have long been used as benchmarks in AI research, with Atari games being a popular example. But traditional video game benchmarks face several limitations. Popular games have extensive training data publicly available, lack standardized performance evaluation metrics and permit brute-force methods involving billions of simulations. Additionally, the developers building AI agents typically have prior knowledge of these games—unintentionally embedding their own insights into the solutions.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store