Latest news with #Llama4Scout

Forbes

4 days ago

Business
Forbes

Who Needs Big AI Models?

Cerebras Systems CEO and Founder Andrew Feldman The AI world continues to evolve rapidly, especially since the introduction of DeepSeek and its followers. Many have concluded that enterprises don't really need the large, expensive AI models touted by OpenAI, Meta, and Google, and are focusing instead on smaller models, such as DeepSeek V2-Lite with 2.4B parameters, or Llama 4 Scout and Maverick with 17B parameters, which can provide decent accuracy at a lower cost. It turns out that this is not the case for coders, or more accurately, the models that can and will replace many coders. Nor does the smaller-is-better mantra apply to reasoning or agentic AI, the next big thing. AI code generators require large models that can handle a wider context window, capable of accommodating approximately 100,000 lines of code. Mixture of expert (MOE) models supporting agentic and reasoning AI is also large. But these massive models are typically quite expensive, costing around $10 to $15 per million output tokens on modern GPUs. Therein lies an opportunity for novel AI architectures to encroach on GPUs' territory. Cerebras Systems Launches Big AI with Qwen3-235B Cerebras Systems (a client of Cambrian-AI Research) has announced support for the large Qwen3-235B, supporting 131K context length (about 200–300 pages of text), four times what was previously available. At the RAISE Summit in Paris, Cerebras touted Alibaba's Qwen3-235B, which uses a highly efficient mixture-of-experts architecture to deliver exceptional compute efficiency. But the real news is that Cerebras can run the model at only $0.60 per million input tokens and per million output tokens—less than one-tenth the cost of comparable closed-source models. While many consider the Cerebras wafer-scale engine expensive, this data turns that perception on its head. Agents are a use case that frequently requires very large models. One question I frequently get is, if Cerebras is so fast, why don't they have more customers? One reason is that they have not supported large context windows and larger models. Those seeking to develop code, for example, do not want to break down the problem into smaller fragments to fit, say, a 32KB context. Now, that barrier to sales has evaporated. 'We're seeing huge demand from developers for frontier models with long context, especially for code generation,' said Cerebras Systems CEO and Founder Andrew Feldman. "Qwen3-235B on Cerebras is our first model that stands toe-to-toe with frontier models like Claude 4 and DeepSeek R1. And with full 131K context, developers can now use Cerebras on production-grade coding applications and get answers back in less than a second instead of waiting for minutes on GPUs.' Cerebras is not just 30 times faster, it is 92% cheaper than GPUs. Cerebras has quadrupled its context length support from 32K to 131K tokens—the maximum supported by Qwen3-235B. This expansion directly impacts the model's ability to reason over large codebases and complex documentation. While 32K context is sufficient for simple code generation use cases, 131K context enables the model to process dozens of files and tens of thousands of lines of code simultaneously, allowing for production-grade application development. Cerebras is 15-100 times more affordable than GPUs when running Qwen3-235B Qwen3-235B excels at tasks requiring deep logical reasoning, advanced mathematics, and code generation, thanks to its ability to switch between "thinking mode" (for high-complexity tasks) and "non-thinking mode" (for efficient, general-purpose dialogue). The 131K context length allows the model to ingest and reason over large codebases (tens of thousands of lines), supporting tasks such as code refactoring, documentation, and bug detection. Cerebras also announced the further expansion of its ecosystem, with support from Amazon AWS, as well as DataRobot, Docker, Cline, and Notion. The addition of AWS is huge; Cerebras has added AWS to its cloud portfolio. Where is this heading? Big AI has constantly been downsized and optimized, with orders of magnitude of performance gains, model sizes, and price reductions. This trend will undoubtedly continue, but will be constantly offset by increases in capabilities, accuracy, intelligence, and entirely new features across modalities. So, if you want last year's AI, you're in great shape, as it continues to get cheaper. But if you want the latest features and functions, you will require the largest models and the longest input context length. It's the Yin and Yang of AI.

Multiverse Computing raises $215M for tech that could radically slim AI costs

Yahoo

12-06-2025

Business
Yahoo

Multiverse Computing raises $215M for tech that could radically slim AI costs

On Thursday, Spanish startup Multiverse Computing announced that it raised an enormous Series B round of €189 million (about $215 million) on the strength of a technology it calls 'CompactifAI.' CompactifAI is a quantum-computing inspired compression technology that is capable of reducing the size of LLMs by up to 95% without impacting model performance, the company says. Specifically, Multiverse offers compressed versions of well-known open source LLMs – primarily small models – such as Llama 4 Scout, Llama 3.3 70B, Llama 3.1 8B, Mistral Small 3.1. However, it will soon release a version of DeepSeek R1, with more open source and reasoning models coming soon. Proprietary models from OpenAI and others are not supported. It's 'slim' models, as the company calls them, are available on Amazon Web Services or can be licensed for on-premises uses. The company says its models are 4x-12x faster than the comparable not-compressed versions, which translates to 50%-80% reduction in inference costs. For instance, Multiverse says that its Lama 4 Scout Slim costs 10 cents per million tokens on AWS compared to Lama 4 Scout's 14 cents. The company says that some of its models can be made so small and energy efficient they could be run on PCs, phones, cars, drones and even the DIY-enthusiast's favorite tiny PC, Raspberry PI. (We are suddenly imagining those fantastical Raspberry PI Christmas-light houses upgraded with LLM-powered interactive talking Santas.) Multiverse has some technical might behind it. It was co-founded by CTO Román Orús, a professor at the Donostia International Physics Center in San Sebastián, Spain. Orús is known for his pioneering work on tensor networks (not to be confused with all AI-related things named Tensor at Google). Tensor networks are computational tools that mimic quantum computers but run on classic computers. One of their primary uses these days is compression of deep learning models. Multiverse's co-founder and CEO, Enrique Lizaso Olmos, also holds multiple mathematical degrees and has been a college professor. He spent most of his career in banking, best known as the former deputy CEO of Unnim Bank. The Series B was led by Bullhound Capital (which has backed companies like Spotify, Revolut, DeliveryHero, Avito, Discord) along with participation of HP Tech Ventures, SETT, Forgepoint Capital International, CDP Venture Capital, Santander Climate VC, Toshiba and Capital Riesgo de Euskadi – Grupo SPR. Multiverse says it has 160 patents and 100 customers globally, including Iberdrola, Bosch, and the Bank of Canada. With this funding, it has raised about $250M to date. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Bhavish Agarwal's AI company Krutrim to launch AI assistant and chatbot ‘Kruti' on June 12 — all we know so far

Mint

11-06-2025

Business
Mint

Bhavish Agarwal's AI company Krutrim to launch AI assistant and chatbot ‘Kruti' on June 12 — all we know so far

Krutrim, the artificial intelligence arm of Bhavish Agarwal's Ola Group, is set to launch its own agentic AI assistant dubbed 'Kruti' on June 12, the tech startup said in a post on X. 'Ask Kruti what to eat, where to order, or how to cook it. Your own food agent will now be just a prompt away. Coming on 12th June!' Krutrim posted on social media platform X, with a video visualising the tool. Billed as 'India's first AI assistant', Krutrim said the system 'not only responds to user prompts, but also takes initiative, adapts to user needs, and acts proactively to fulfil tasks or objectives'. Unlike standard chatbots that wait for explicit instructions, agentic AI assistants can anticipate needs, make decisions, and execute actions autonomously within defined boundaries, a report by PTI noted. Krutrim has also teased the launch in an earlier post on X, writing: 'Excited to introduce Kruti, India's first agentic AI assistant. Reimagined from the core, Kruti listens, adapts, and acts proactively, purposefully, and in your language. This is a leap beyond chatbots. More updates on 12th June. Stay tuned!' In April, Krutrim said it has started hosting Meta's Llama 4 open source models on its cloud platform, claiming: 'Krutrim became one of the world's first AI companies to deploy Meta's Llama 4 models on domestic servers. Krutrim presently hosts Llama 4 Scout and Llama 4 Maverick at prices ranging ₹ 7 to ₹ 17 per million tokens.' A token in AI generally refers to a component of a larger data set like words, characters, or phrases that are used by platforms for processing queries. Krutrim currently hosts China-based DeepSeek AI models that range from 8 billion to 700 billion parameters, at prices ranging from ₹ 10 to ₹ 60 per million tokens.

Yahoo

10-06-2025

Business
Yahoo

Apple's upgraded AI models underwhelm on performance

Apple has announced updates to the AI models that power its suite of Apple Intelligence features across iOS, macOS, and more. But according to the company's own benchmarks, the models underperform older models from rival tech firms, including OpenAI. Apple said in a blog post Monday that human testers rated the quality of text generated by its newest "Apple On-Device" model — which runs offline on products including the iPhone — "comparably" to, but not better than, text from similarly-sized Google and Alibaba models. Meanwhile, those same testers rated Apple's more capable new model, which is called "Apple Server" and designed to run in the company's data centers, behind OpenAI's year-old GPT-4o. In a separate test evaluating the ability of Apple's models to analyze images, human raters preferred Meta's Llama 4 Scout model over Apple Server, according to Apple. That's a bit surprising. On a number of tests, Llama 4 Scout performs worse than leading models from AI labs like Google, Anthropic, and OpenAI. The benchmark results add credence to reports suggesting Apple's AI research division has struggled to catch up to competitors in the cutthroat AI race. Apple's AI capabilities in recent years have underwhelmed, and a promised Siri upgrade has been delayed indefinitely. Some customers have sued Apple, accusing the firm of marketing AI features for its products that it hasn't yet delivered. In addition to generating text, Apple On-Device, which is roughly 3 billion parameters in size, drives features like summarization and text analysis. (Parameters roughly correspond to a model's problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.) As of Monday, third-party developers can tap into it via Apple's Foundation Models framework. Apple says both Apple On-Device and Apple Server boast improved tool-use and efficiency compared to their predecessors, and can understand around 15 languages. That's thanks in part to an expanded training dataset that includes image data, PDFs, documents, manuscripts, infographics, tables, and charts. This article originally appeared on TechCrunch at Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Meta delays release of its 'Behemoth' AI model

RTÉ News

16-05-2025

Business
RTÉ News

Meta delays release of its 'Behemoth' AI model

Meta Platforms is delaying the release of its flagship "Behemoth" AI model due to concerns about its capabilities, the Wall Street Journal has reported, citing people familiar with the matter. Company engineers are struggling to significantly improve the capabilities of its Behemoth large-language model, resulting in staff questions about whether improvements over earlier versions are significant enough to justify public release, the report said. Early in its development, Behemoth was internally scheduled for release in April to coincide with Meta's inaugural AI conference for developers, but later pushed an internal target for the model's launch to June, according to the report. It has now been delayed to fall or later, the report said. Meta had said in April it was previewing Llama 4 Behemoth, which it called "one of the smartest LLMs in the world and our most powerful yet to serve as a teacher for our new models". It released the latest version of its LLM Llama, called the Llama 4 Scout and Llama 4 Maverick, that month.