logo
#

Latest news with #VentureBeat

Autonomous Swarms: What Happens When AI Agents Collaborate
Autonomous Swarms: What Happens When AI Agents Collaborate

Forbes

time07-07-2025

  • Business
  • Forbes

Autonomous Swarms: What Happens When AI Agents Collaborate

Paul Kovalenko, Langate CTO, SaaS Consultant. Helping enterprise SaaS companies optimize their development costs. AI has entered a new era, one defined not by isolated bots or task-specific scripts but by autonomous agent swarms: intelligent, collaborative systems that work like cross-functional digital teams. Each agent in a swarm is generally designed for a specific function—for example, data retrieval, analysis, decision making or execution—but together, they can dynamically coordinate and adapt in real time, reshaping how work gets done. Unlike traditional automation, which follows static rules, agent swarms can learn, adjust and self-organize, delivering efficiency, agility and innovation. Here's how autonomous AI swarms could transform core business functions and what it will take for organizations to begin working with them. Marketing And Sales Rather than relying on siloed tools, marketers are experimenting with AI agent swarms that dynamically coordinate tasks like campaign analysis, audience targeting, A/B testing and real-time content generation. For example, in an analysis of OpenAI's "Swarm" framework, VentureBeat notes that these agents can work together to handle marketing tasks like analyzing trends, adjusting strategies, identifying sales leads and providing support to customers with little human intervention. Another example is the open-source AutoGen framework developed by Microsoft, which enables multiple AI agents to interact with each other and with humans in a structured workflow. Legal And Compliance Legal and compliance departments are often inundated with documentation, contract reviews and ever-evolving regulatory requirements. Autonomous AI agents can step in to streamline these tasks. Some potential use cases for implementing would be to deploy agents to review contracts for key clauses, track global regulatory updates and flag inconsistencies or compliance risks in real time. A multinational enterprise might assign one agent to monitor GDPR updates, another to assess non-disclosure agreements for red flags and another to draft policy updates, while syncing findings with the human legal team for final approval. This could help organizations prevent costly legal missteps before they happen and shift their legal operations from reactive to proactive. Finance And Accounting Finance and accounting are also being rapidly transformed by collaborative AI agent systems that manage tasks like forecasting, invoice processing, expense classification and fraud detection. These agents can continuously learn from historical and real-time data to analyze financial transactions, cross-check across systems and detect anomalies with increasing accuracy. Speaking with PYMNTS, Sunil Rao, CEO and co-founder of the AI company Tribble, pointed out that agent swarms could also improve risk models by evaluating "different types of risks—credit, market and operational— simultaneously, integrating insights for a comprehensive risk profile.' Likewise, by integrating with enterprise resource planning (ERP) platforms, autonomous agents could reduce reconciliation times from days to hours, providing CFOs with up-to-the-minute dashboards, capabilities that once required significant manual effort. Internal Operations From onboarding to IT ticket resolution, internal operations often involve repetitive tasks that consume time and resources. Autonomous agent swarms have the potential to take over these routine workflows, improving both operational efficiency and the employee experience. A swarm of such agents could work in tandem: one handling HR documentation, another resolving payroll issues and a third offering IT support—each learning from employee interactions to become more helpful and proactive over time. Google, for instance, recently revealed updates that will allow agents to be able to find and synthesize digital information from anywhere within an organization and communicate with internal and external agents to allow users to complete tasks. Customer Support Unlike legacy chatbots that answer basic FAQs, AI teams consisting of specialized agents working in tandem can verify a customer's identity, retrieve purchase history and deliver a tailored solution or escalate to human support, if needed. A collaborative, multi-agent setup mirrors the efficiency of a well-trained human support team. Dr. Lance B. Eliot, an AI scientist, explains in Forbes that some of what's exciting about these agents is their ability to both ensure they have the necessary information to handle the customer's concern and do handoffs between other agents, such as a customer service agent to a refunds and returns agent. What To Know About Implementing AI Agent Swarms If they live up to their potential, autonomous agent swarms will mark a profound shift in how organizations operate. These systems could replicate the dynamics of high-functioning teams, enabling them to communicate, learn and adapt in real time to meet evolving goals. To begin exploring how AI agent swarms could impact their organization, executives should start by: • Pinpointing friction-heavy workflows. • Investing in modular AI development. • Setting clear return on investment (ROI) benchmarks. • Preparing internal teams for the transition. However, while the opportunity is massive, so are the responsibilities. Interconnected agents also broaden the attack surface for bad actors. Without strong access controls, encrypted communication channels and audit trails, agent swarms can introduce new vulnerabilities, ranging from unauthorized access to data leakage. IT leaders must treat these agents like human users, providing them with roles, limiting their permissions and continuously monitoring their behavior. Likewise, it will be important to communicate to your teams that, rather than replacing jobs wholesale, agent swarms will reshape their roles. Many routine, rule-based tasks can be shifted to autonomous systems, freeing employees to focus on creative, interpersonal and strategic work. Forward-thinking organizations will prioritize reskilling their teams, treating AI not as a threat but as a collaborator. The future belongs to companies that stop treating AI as a one-off tool and start managing it like a strategic workforce. Swarms are here. The question is: Will you lead them, or race to catch up? Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Post-Transformer Model Systems Can Drive Change
Post-Transformer Model Systems Can Drive Change

Forbes

time04-07-2025

  • Business
  • Forbes

Post-Transformer Model Systems Can Drive Change

Chipset on circuit board for semiconductor industry, 3d rendering What if you could have conventional large language model output with 10 times to 20 times less energy consumption? And what if you could put a powerful LLM right on your phone? It turns out there are new design concepts powering a new generation of AI platforms that will conserve energy and unlock all sorts of new and improved functionality, along with, importantly, capabilities for edge computing. What is Edge Computing? Edge computing occurs when the data processing and other workloads take place close to the point of origin, in other words, an endpoint, like a piece of data collection hardware, or a user's personal device. Another way to describe it is that edge computing starts to reverse us back away from the cloud era, where people realized that you could house data centrally. Yes, you can have these kinds of vendor services, to relieve clients of the need to handle on-premises systems, but then you have the costs of transfer, and, typically, less control. If you can simply run operations locally on a hardware device, that creates all kinds of efficiencies, including some related to energy consumption and fighting climate change. Enter the rise of new Liquid Foundation Models, which innovate from a traditional transformer-based LLM design, to something else. A September 2024 piece in VentureBeat by Carl Franzen covers some of the design that's relevant here. I'll include the usual disclaimer: I have been listed as a consultant with Liquid AI, and I know a lot of the people at the MIT CSAIL lab where this is being worked on. But don't take my word for it; check out what Franzen has to say. 'The new LFM models already boast superior performance to other transformer-based ones of comparable size such as Meta's Llama 3.1-8B and Microsoft's Phi-3.5 3.8B,' he writes. 'The models are engineered to be competitive not only on raw performance benchmarks but also in terms of operational efficiency, making them ideal for a variety of use cases, from enterprise-level applications specifically in the fields of financial services, biotechnology, and consumer electronics, to deployment on edge devices.' More from a Project Leader Then there's this interview at IIA this April with Will Knight and Ramin Hasani, of Liquid AI. Hasani talks about how the Liquid AI teams developed models using the brain of a worm: C elegans, to be exact. He talked about the use of these post-transformer models on devices, cars, drones, and planes, and applications to predictive finance and predictive healthcare. LFMs, he said, can do the job of a GPT, running locally on devices. 'They can hear, and they can talk,' he said. More New Things Since a recent project launch, Hasani said, Liquid AI has been having commercial discussions with big companies about how to apply this technology well to enterprise. 'People care about privacy, people care about secure applications of AI, and people care about low latency applications of AI,' he said. 'These are the three places where enterprise does not get the value from the other kinds of AI companies that are out there.' Talking about how an innovator should be a 'scientist at heart,' Hasani went over some of the basic value propositions of having an LLM running offline. Look, No Infrastructure One of the main points that came out of this particular conversation around LFMs is that if they're running off-line on a device, you don't need the extended infrastructure of connected systems. You don't need a data center or cloud services, or any of that. In essence, these systems can be low-cost, high-performance, and that's just one aspect of how people talk about applying a 'Moore's law' concept to AI. It means systems are getting cheaper, more versatile, and easier to manage – quickly. So keep an eye out for this kind of development as we see smarter AI emerging.

Scott Pilgrim EX announced by Tribute Games as new beat-em-up adventure
Scott Pilgrim EX announced by Tribute Games as new beat-em-up adventure

Express Tribune

time07-06-2025

  • Entertainment
  • Express Tribune

Scott Pilgrim EX announced by Tribute Games as new beat-em-up adventure

Scott Pilgrim EX, a brand-new beat-em-up game, has been officially announced by Tribute Games during the 2025 Summer Game Fest, as reported by Mike Minotti of VentureBeat on June 6. Known for developing the fan-favorite Teenage Mutant Ninja Turtles: Shredder's Revenge and the upcoming Marvel Cosmic Invasion, Tribute Games is returning to the retro-inspired action genre with a fresh take on the beloved Scott Pilgrim franchise. Though not a direct sequel or remake, Scott Pilgrim EX draws inspiration from the legacy of Scott Pilgrim vs. The World: The Game, a 2010 side-scrolling brawler that garnered critical acclaim for its pixel art style, nostalgic gameplay, and soundtrack. The Scott Pilgrim universe originally began as a comic book series by Bryan Lee O'Malley and has since expanded into a cult-classic film and animated adaptations. While specific details remain limited, the announcement trailer showcased classic beat-em-up action, cooperative multiplayer, and stylistic callbacks to both the comics and earlier game. Tribute Games, known for its polished pixel art and fluid gameplay mechanics, aims to reintroduce Scott Pilgrim to a new generation while appealing to longtime fans of the series. The original game, re-released in 2021 after years of fan demand, became a cult classic. With Scott Pilgrim EX, Tribute Games signals a return to that beloved style—potentially with new characters, features, and levels. More information about platforms, release dates, and gameplay specifics for Scott Pilgrim EX is expected in the coming months as Tribute Games continues development.

Microsoft just launched an AI that discovered a new chemical in 200 hours instead of years
Microsoft just launched an AI that discovered a new chemical in 200 hours instead of years

Business Mayor

time19-05-2025

  • Business
  • Business Mayor

Microsoft just launched an AI that discovered a new chemical in 200 hours instead of years

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft launched a new enterprise platform that harnesses artificial intelligence to dramatically accelerate scientific research and development, potentially compressing years of laboratory work into weeks or even days. The platform, called Microsoft Discovery, leverages specialized AI agents and high-performance computing to help scientists and engineers tackle complex research challenges without requiring them to write code, the company announced Monday at its annual Build developer conference. 'What we're doing is really taking a look at how we can apply advancements in agentic AI and compute work, and then on to quantum computing, and apply it in the really important space, which is science,' said Jason Zander, Corporate Vice President of Strategic Missions and Technologies at Microsoft, in an exclusive interview with VentureBeat. The system has already demonstrated its potential in Microsoft's own research, where it helped discover a novel coolant for immersion cooling of data centers in approximately 200 hours — a process that traditionally would have taken months or years. 'In 200 hours with this framework, we were able to go through and screen 367,000 potential candidates that we came up with,' Zander explained. 'We actually took it to a partner, and they actually synthesized it.' Microsoft Discovery represents a significant step toward democratizing advanced scientific tools, allowing researchers to interact with supercomputers and complex simulations using natural language rather than requiring specialized programming skills. 'It's about empowering scientists to transform the entire discovery process with agentic AI,' Zander emphasized. 'My PhD is in biology. I'm not a computer scientist, but if you can unlock that power of a supercomputer just by allowing me to prompt it, that's very powerful.' The platform addresses a key challenge in scientific research: the disconnect between domain expertise and computational skills. Traditionally, scientists would need to learn programming to leverage advanced computing tools, creating a bottleneck in the research process. This democratization could prove particularly valuable for smaller research institutions that lack the resources to hire computational specialists to augment their scientific teams. By allowing domain experts to directly query complex simulations and run experiments through natural language, Microsoft is effectively lowering the barrier to entry for cutting-edge research techniques. 'As a scientist, I'm a biologist. I don't know how to write computer code. I don't want to spend all my time going into an editor and writing scripts and stuff to ask a supercomputer to do something,' Zander said. 'I just wanted, like, this is what I want in plain English or plain language, and go do it.' Microsoft Discovery operates through what Zander described as a team of AI 'postdocs' — specialized agents that can perform different aspects of the scientific process, from literature review to computational simulations. 'These postdoc agents do that work,' Zander explained. 'It's like having a team of folks that just got their PhD. They're like residents in medicine — you're in the hospital, but you're still finishing.' The platform combines two key components: foundational models that handle planning and specialized models trained for particular scientific domains like physics, chemistry, and biology. What makes this approach unique is how it blends general AI capabilities with deeply specialized scientific knowledge. 'The core process, you'll find two parts of this,' Zander said. 'One is we're using foundational models for doing the planning. The other piece is, on the AI side, a set of models that are designed specifically for particular domains of science, that includes physics, chemistry, biology.' According to a company statement, Microsoft Discovery is built on a 'graph-based knowledge engine' that constructs nuanced relationships between proprietary data and external scientific research. This allows it to understand conflicting theories and diverse experimental results across disciplines, while maintaining transparency by tracking sources and reasoning processes. At the center of the user experience is a Copilot interface that orchestrates these specialized agents based on researcher prompts, identifying which agents to leverage and setting up end-to-end workflows. This interface essentially acts as the central hub where human scientists can guide their virtual research team. To demonstrate the platform's capabilities, Microsoft used Microsoft Discovery to address a pressing challenge in data center technology: finding alternatives to coolants containing PFAS, so-called 'forever chemicals' that are increasingly facing regulatory restrictions. Current data center cooling methods often rely on harmful chemicals that are becoming untenable as global regulations push to ban these substances. Microsoft researchers used the platform to screen hundreds of thousands of potential alternatives. 'We did prototypes on this. Actually, when I owned Azure, I did a prototype eight years ago, and it works super well, actually,' Zander said. 'It's actually like 60 to 90% more efficient than just air cooling. The big problem is that coolant material that's on market has PFAS in it.' After identifying promising candidates, Microsoft synthesized the coolant and demonstrated it cooling a GPU running a video game. While this specific application remains experimental, it illustrates how Microsoft Discovery can compress development timelines for companies facing regulatory challenges. The implications extend far beyond Microsoft's own data centers. Any industry facing similar regulatory pressure to replace established chemicals or materials could potentially use this approach to accelerate their R&D cycles dramatically. What once would have been multi-year development processes might now be completed in a matter of months. Daniel Pope, founder of Submer, a company focused on sustainable data centers, was quoted in the press release saying: 'The speed and depth of molecular screening achieved by Microsoft Discovery would've been impossible with traditional methods. What once took years of lab work and trial-and-error, Microsoft Discovery can accomplish in just weeks, and with greater confidence.' Microsoft is building an ecosystem of partners across diverse industries to implement the platform, indicating its broad applicability beyond the company's internal research needs. Pharmaceutical giant GSK is exploring the platform for its potential to transform medicinal chemistry. The company stated an intent to partner with Microsoft to advance 'GSK's generative platforms for parallel prediction and testing, creating new medicines with greater speed and precision.' In the consumer space, Estée Lauder plans to harness Microsoft Discovery to accelerate product development in skincare, makeup, and fragrance. 'The Microsoft Discovery platform will help us to unleash the power of our data to drive fast, agile, breakthrough innovation and high-quality, personalized products that will delight our consumers,' said Kosmas Kretsos, PhD, MBA, Vice President of R&D and Innovation Technology at Estée Lauder Companies. Microsoft is also expanding its partnership with Nvidia to integrate Nvidia's ALCHEMI and BioNeMo NIM microservices with Microsoft Discovery, enabling faster breakthroughs in materials and life sciences. This partnership will allow researchers to leverage state-of-the-art inference capabilities for candidate identification, property mapping, and synthetic data generation. 'AI is dramatically accelerating the pace of scientific discovery,' said Dion Harris, senior director of accelerated data center solutions at Nvidia. 'By integrating Nvidia ALCHEMI and BioNeMo NIM microservices into Azure Discovery, we're giving scientists the ability to move from data to discovery with unprecedented speed, scale, and efficiency.' Read More Hewlett Packard Enterprise to acquire Juniper Networks - Verdict In the semiconductor space, Microsoft plans to integrate Synopsys' industry solutions to accelerate chip design and development. Sassine Ghazi, President and CEO of Synopsys, described semiconductor engineering as 'among the most complex, consequential and high-stakes scientific endeavors of our time,' making it 'an extremely compelling use case for artificial intelligence.' System integrators Accenture and Capgemini will help customers implement and scale Microsoft Discovery deployments, bridging the gap between Microsoft's technology and industry-specific applications. Microsoft Discovery also represents a stepping stone toward the company's broader quantum computing ambitions. Zander explained that while the platform currently uses conventional high-performance computing, it's designed with future quantum capabilities in mind. 'Science is a hero scenario for a quantum computer,' Zander said. 'If you ask yourself, what can a quantum computer do? It's extremely good at exploring complicated problem spaces that classic computers just aren't able to do.' Microsoft recently announced advancements in quantum computing with its Majorana one chip, which the company claims could potentially fit a million qubits 'in the palm of your hand' — compared to competing approaches that might require 'a football field worth of equipment.' 'General generative chemistry — we think the hero scenario for high-scale quantum computers is actually chemistry,' Zander explained. 'Because what it can do is take a small amount of data and explore a space that would take millions of years for a classic, even the largest supercomputer, to do.' This connection between today's AI-driven discovery platform and tomorrow's quantum computers reveals Microsoft's long-term strategy: building the software infrastructure and user experience today that will eventually harness the revolutionary capabilities of quantum computing when the hardware matures. Zander envisions a future where quantum computers design their own successors: 'One of the first things that I want to do when I get the quantum computer that does that kind of work is I'm going to go give it my material stack for my chip. I'm going to basically say, 'Okay, go simulate that sucker. Tell me how I build a new, a better, new version of you.'' With the powerful capabilities Microsoft Discovery offers, questions about potential misuse naturally arise. Zander emphasized that the platform incorporates Microsoft's responsible AI framework. 'We have the responsible AI program, and it's been around, actually I think we were one of the first companies to actually put that kind of framework into place,' Zander said. 'Discovery absolutely is following all responsible AI guidelines.' These safeguards include ethical use guidelines and content moderation similar to those implemented in consumer AI systems, but tailored for scientific applications. The company appears to be taking a proactive approach to identifying potential misuse scenarios. 'We already look for particular types of algorithms that could be harmful and try and flag those in content moderation style,' Zander explained. 'Again, the analogy would be very similar to what a consumer kind of bot would do.' This focus on responsible innovation reflects the dual-use nature of powerful scientific tools — the same platform that could accelerate lifesaving drug discovery could potentially be misused in other contexts. Microsoft's approach attempts to balance innovation with appropriate safeguards, though the effectiveness of these measures will only become clear as the platform is adopted more widely. Microsoft's entry into scientific AI comes at a time when the field of accelerated discovery is heating up. The ability to compress research timelines could have profound implications for addressing urgent global challenges, from drug discovery to climate change solutions. What differentiates Microsoft's approach is its focus on accessibility for non-computational scientists and its integration with the company's existing cloud infrastructure and future quantum ambitions. By allowing domain experts to directly leverage advanced computing without intermediaries, Microsoft could potentially remove a significant bottleneck in scientific progress. 'The big efficiencies are coming from places where, instead of me cramming additional domain knowledge, in this case, a scientist having learned to code, we're basically saying, 'Actually, we'll let the genetic AI do that, you can do what you do, which is use your PhD and get forward progress,'' Zander explained. This democratization of advanced computational methods could lead to a fundamental shift in how scientific research is conducted globally. Smaller labs and institutions in regions with less computational infrastructure might suddenly gain access to capabilities previously available only to elite research institutions. However, the success of Microsoft Discovery will ultimately depend on how effectively it integrates into complex existing research workflows and whether its AI agents can truly understand the nuances of specialized scientific domains. The scientific community is notoriously rigorous and skeptical of new methodologies – Microsoft will need to demonstrate consistent, reproducible results to gain widespread adoption. The platform enters private preview today, with pricing details yet to be announced. Microsoft indicates that smaller research labs will be able to access the platform through Azure, with costs structured similarly to other cloud services. 'At the end of the day, our goal, from a business perspective, is that it's all about enabling that core platform, as opposed to you having to stand up,' Zander said. 'It'll just basically ride on top of the cloud and make it much easier for people to do.' As Microsoft builds out its ambitious scientific AI platform, it positions itself at a unique juncture in the history of both computing and scientific discovery. The scientific method – a process refined over centuries – is now being augmented by some of the most advanced artificial intelligence ever created. Microsoft Discovery represents a bet that the next era of scientific breakthroughs won't come from either brilliant human minds or powerful AI systems working in isolation, but from their collaboration – where AI handles the computational heavy lifting while human scientists provide the creativity, intuition, and critical thinking that machines still lack. 'If you think about chemistry, materials sciences, materials actually impact about 98% of the world,' Zander noted. 'Everything, the desks, the displays we're using, the clothing that we're wearing. It's all materials.' The implications of accelerating discovery in these domains extend far beyond Microsoft's business interests or even the tech industry. If successful, platforms like Microsoft Discovery could fundamentally alter the pace at which humanity can innovate in response to existential challenges – from climate change to pandemic prevention. The question now isn't whether AI will transform scientific research, but how quickly and how deeply. As Zander put it: 'We need to start working faster.' In a world facing increasingly complex challenges, Microsoft is betting that the combination of human scientific expertise and agentic AI might be exactly the acceleration we need.

Google's AlphaEvolve: The AI agent that reclaimed 0.7% of Google's compute – and how to copy it
Google's AlphaEvolve: The AI agent that reclaimed 0.7% of Google's compute – and how to copy it

Business Mayor

time17-05-2025

  • Business
  • Business Mayor

Google's AlphaEvolve: The AI agent that reclaimed 0.7% of Google's compute – and how to copy it

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google's new AlphaEvolve shows what happens when an AI agent graduates from lab demo to production work, and you've got one of the most talented technology companies driving it. Built by Google's DeepMind, the system autonomously rewrites critical code and already pays for itself inside Google. It shattered a 56-year-old record in matrix multiplication (the core of many machine learning workloads) and clawed back 0.7% of compute capacity across the company's global data centers. Those headline feats matter, but the deeper lesson for enterprise tech leaders is how AlphaEvolve pulls them off. Its architecture – controller, fast-draft models, deep-thinking models, automated evaluators and versioned memory – illustrates the kind of production-grade plumbing that makes autonomous agents safe to deploy at scale. Google's AI technology is arguably second to none. So the trick is figuring out how to learn from it, or even using it directly. Google says an Early Access Program is coming for academic partners and that 'broader availability' is being explored, but details are thin. Until then, AlphaEvolve is a best-practice template: If you want agents that touch high-value workloads, you'll need comparable orchestration, testing and guardrails. Consider just the data center win. Google won't put a price tag on the reclaimed 0.7%, but its annual capex runs tens of billions of dollars. Even a rough estimate puts the savings in the hundreds of millions annually— enough, as independent developer Sam Witteveen noted on our recent podcast, to pay for training one of the flagship Gemini models, estimated to cost upwards of $191 million for a version like Gemini Ultra. VentureBeat was the first to report about the AlphaEvolve news earlier this week. Now we'll go deeper: how the system works, where the engineering bar really sits and the concrete steps enterprises can take to build (or buy) something comparable. AlphaEvolve runs on what is best described as an agent operating system – a distributed, asynchronous pipeline built for continuous improvement at scale. Its core pieces are a controller, a pair of large language models (Gemini Flash for breadth; Gemini Pro for depth), a versioned program-memory database and a fleet of evaluator workers, all tuned for high throughput rather than just low latency. A high-level overview of the AlphaEvolve agent structure. Source: AlphaEvolve paper. This architecture isn't conceptually new, but the execution is. 'It's just an unbelievably good execution,' Witteveen says. The AlphaEvolve paper describes the orchestrator as an 'evolutionary algorithm that gradually develops programs that improve the score on the automated evaluation metrics' (p. 3); in short, an 'autonomous pipeline of LLMs whose task is to improve an algorithm by making direct changes to the code' (p. 1). Takeaway for enterprises: If your agent plans include unsupervised runs on high-value tasks, plan for similar infrastructure: job queues, a versioned memory store, service-mesh tracing and secure sandboxing for any code the agent produces. A key element of AlphaEvolve is its rigorous evaluation framework. Every iteration proposed by the pair of LLMs is accepted or rejected based on a user-supplied 'evaluate' function that returns machine-gradable metrics. This evaluation system begins with ultrafast unit-test checks on each proposed code change – simple, automatic tests (similar to the unit tests developers already write) that verify the snippet still compiles and produces the right answers on a handful of micro-inputs – before passing the survivors on to heavier benchmarks and LLM-generated reviews. This runs in parallel, so the search stays fast and safe. In short: Let the models suggest fixes, then verify each one against tests you trust. AlphaEvolve also supports multi-objective optimization (optimizing latency and accuracy simultaneously), evolving programs that hit several metrics at once. Counter-intuitively, balancing multiple goals can improve a single target metric by encouraging more diverse solutions. Takeaway for enterprises: Production agents need deterministic scorekeepers. Whether that's unit tests, full simulators, or canary traffic analysis. Automated evaluators are both your safety net and your growth engine. Before you launch an agentic project, ask: 'Do we have a metric the agent can score itself against?' AlphaEvolve tackles every coding problem with a two-model rhythm. First, Gemini Flash fires off quick drafts, giving the system a broad set of ideas to explore. Then Gemini Pro studies those drafts in more depth and returns a smaller set of stronger candidates. Feeding both models is a lightweight 'prompt builder,' a helper script that assembles the question each model sees. It blends three kinds of context: earlier code attempts saved in a project database, any guardrails or rules the engineering team has written and relevant external material such as research papers or developer notes. With that richer backdrop, Gemini Flash can roam widely while Gemini Pro zeroes in on quality. Unlike many agent demos that tweak one function at a time, AlphaEvolve edits entire repositories. It describes each change as a standard diff block – the same patch format engineers push to GitHub – so it can touch dozens of files without losing track. Afterward, automated tests decide whether the patch sticks. Over repeated cycles, the agent's memory of success and failure grows, so it proposes better patches and wastes less compute on dead ends. Takeaway for enterprises: Let cheaper, faster models handle brainstorming, then call on a more capable model to refine the best ideas. Preserve every trial in a searchable history, because that memory speeds up later work and can be reused across teams. Accordingly, vendors are rushing to provide developers with new tooling around things like memory. Products such as OpenMemory MCP, which provides a portable memory store, and the new long- and short-term memory APIs in LlamaIndex are making this kind of persistent context almost as easy to plug in as logging. OpenAI's Codex-1 software-engineering agent, also released today, underscores the same pattern. It fires off parallel tasks inside a secure sandbox, runs unit tests and returns pull-request drafts—effectively a code-specific echo of AlphaEvolve's broader search-and-evaluate loop. AlphaEvolve's tangible wins – reclaiming 0.7% of data center capacity, cutting Gemini training kernel runtime 23%, speeding FlashAttention 32%, and simplifying TPU design – share one trait: they target domains with airtight metrics. For data center scheduling, AlphaEvolve evolved a heuristic that was evaluated using a simulator of Google's data centers based on historical workloads. For kernel optimization, the objective was to minimize actual runtime on TPU accelerators across a dataset of realistic kernel input shapes. Takeaway for enterprises: When starting your agentic AI journey, look first at workflows where 'better' is a quantifiable number your system can compute – be it latency, cost, error rate or throughput. This focus allows automated search and de-risks deployment because the agent's output (often human-readable code, as in AlphaEvolve's case) can be integrated into existing review and validation pipelines. This clarity allows the agent to self-improve and demonstrate unambiguous value. While AlphaEvolve's achievements are inspiring, Google's paper is also clear about its scope and requirements. The primary limitation is the need for an automated evaluator; problems requiring manual experimentation or 'wet-lab' feedback are currently out of scope for this specific approach. The system can consume significant compute – 'on the order of 100 compute-hours to evaluate any new solution' (AlphaEvolve paper, page 8), necessitating parallelization and careful capacity planning. Before allocating significant budget to complex agentic systems, technical leaders must ask critical questions: Machine-gradable problem? Do we have a clear, automatable metric against which the agent can score its own performance? Do we have a clear, automatable metric against which the agent can score its own performance? Compute capacity? Can we afford the potentially compute-heavy inner loop of generation, evaluation, and refinement, especially during the development and training phase? Can we afford the potentially compute-heavy inner loop of generation, evaluation, and refinement, especially during the development and training phase? Codebase & memory readiness? Is your codebase structured for iterative, possibly diff-based, modifications? And can you implement the instrumented memory systems vital for an agent to learn from its evolutionary history? Read More When to ignore — and believe — the AI hype cycle Takeaway for enterprises: The increasing focus on robust agent identity and access management, as seen with platforms like Frontegg, Auth0 and others, also points to the maturing infrastructure required to deploy agents that interact securely with multiple enterprise systems. AlphaEvolve's message for enterprise teams is manifold. First, your operating system around agents is now far more important than model intelligence. Google's blueprint shows three pillars that can't be skipped: Deterministic evaluators that give the agent an unambiguous score every time it makes a change. Long-running orchestration that can juggle fast 'draft' models like Gemini Flash with slower, more rigorous models – whether that's Google's stack or a framework such as LangChain's LangGraph. Persistent memory so each iteration builds on the last instead of relearning from scratch. Enterprises that already have logging, test harnesses and versioned code repositories are closer than they think. The next step is to wire those assets into a self-serve evaluation loop so multiple agent-generated solutions can compete, and only the highest-scoring patch ships. As Cisco's Anurag Dhingra, VP and GM of Enterprise Connectivity and Collaboration, told VentureBeat in an interview this week: 'It's happening, it is very, very real,' he said of enterprises using AI agents in manufacturing, warehouses, customer contact centers. 'It is not something in the future. It is happening there today.' He warned that as these agents become more pervasive, doing 'human-like work,' the strain on existing systems will be immense: 'The network traffic is going to go through the roof,' Dhingra said. Your network, budget and competitive edge will likely feel that strain before the hype cycle settles. Start proving out a contained, metric-driven use case this quarter – then scale what works. Watch the video podcast I did with developer Sam Witteveen, where we go deep on production-grade agents, and how AlphaEvolve is showing the way:

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store