Red Hat leads launch of llm-d to scale generative AI in clouds

Techday NZ21-05-2025

Red Hat has introduced llm-d, an open source project aimed at enabling large-scale distributed generative AI inference across hybrid cloud environments.
The llm-d initiative is the result of collaboration between Red Hat and a group of founding contributors comprising CoreWeave, Google Cloud, IBM Research and NVIDIA, with additional support from AMD, Cisco, Hugging Face, Intel, Lambda, Mistral AI, and academic partners from the University of California, Berkeley, and the University of Chicago.
The new project utilises vLLM-based distributed inference, a native Kubernetes architecture, and AI-aware network routing to facilitate robust and scalable AI inference clouds that can meet demanding production service-level objectives. Red Hat asserts that this will support any AI model, on any hardware accelerator, in any cloud environment.
Brian Stevens, Senior Vice President and AI CTO at Red Hat, stated, "The launch of the llm-d community, backed by a vanguard of AI leaders, marks a pivotal moment in addressing the need for scalable gen AI inference, a crucial obstacle that must be overcome to enable broader enterprise AI adoption. By tapping the innovation of vLLM and the proven capabilities of Kubernetes, llm-d paves the way for distributed, scalable and high-performing AI inference across the expanded hybrid cloud, supporting any model, any accelerator, on any cloud environment and helping realize a vision of limitless AI potential."
Addressing the scaling needs of generative AI, Red Hat points to a Gartner forecast that suggests by 2028, more than 80% of data centre workload accelerators will be principally deployed for inference rather than model training. This projected shift highlights the necessity for efficient and scalable inference solutions as AI models become larger and more complex.
The llm-d project's architecture is designed to overcome the practical limitations of centralised AI inference, such as prohibitive costs and latency. Its main features include vLLM for rapid model support, Prefill and Decode Disaggregation for distributing computational workloads, KV Cache Offloading based on LMCache to shift memory loads onto standard storage, and AI-Aware Network Routing for optimised request scheduling. Further, the project supports Google Cloud's Tensor Processing Units and NVIDIA's Inference Xfer Library for high-performance data transfer.
The community formed around llm-d comprises both technology vendors and academic institutions. Each wants to address efficiency, cost, and performance at scale for AI-powered applications. Several of these partners provided statements regarding their involvement and the intended impact of the project.
Ramine Roane, Corporate Vice President, AI Product Management at AMD, said, "AMD is proud to be a founding member of the llm-d community, contributing our expertise in high-performance GPUs to advance AI inference for evolving enterprise AI needs. As organisations navigate the increasing complexity of generative AI to achieve greater scale and efficiency, AMD looks forward to meeting this industry demand through the llm-d project."
Shannon McFarland, Vice President, Cisco Open Source Program Office & Head of Cisco DevNet, remarked, "The llm-d project is an exciting step forward for practical generative AI. llm-d empowers developers to programmatically integrate and scale generative AI inference, unlocking new levels of innovation and efficiency in the modern AI landscape. Cisco is proud to be part of the llm-d community, where we're working together to explore real-world use cases that help organisations apply AI more effectively and efficiently."
Chen Goldberg, Senior Vice President, Engineering, CoreWeave, commented, "CoreWeave is proud to be a founding contributor to the llm-d project and to deepen our long-standing commitment to open source AI. From our early partnership with EleutherAI to our ongoing work advancing inference at scale, we've consistently invested in making powerful AI infrastructure more accessible. We're excited to collaborate with an incredible group of partners and the broader developer community to build a flexible, high-performance inference engine that accelerates innovation and lays the groundwork for open, interoperable AI."
Mark Lohmeyer, Vice President and General Manager, AI & Computing Infrastructure, Google Cloud, stated, "Efficient AI inference is paramount as organisations move to deploying AI at scale and deliver value for their users. As we enter this new age of inference, Google Cloud is proud to build upon our legacy of open source contributions as a founding contributor to the llm-d project. This new community will serve as a critical catalyst for distributed AI inference at scale, helping users realise enhanced workload efficiency with increased optionality for their infrastructure resources."
Jeff Boudier, Head of Product, Hugging Face, said, "We believe every company should be able to build and run their own models. With vLLM leveraging the Hugging Face transformers library as the source of truth for model definitions; a wide diversity of models large and small is available to power text, audio, image and video AI applications. Eight million AI Builders use Hugging Face to collaborate on over two million AI models and datasets openly shared with the global community. We are excited to support the llm-d project to enable developers to take these applications to scale."
Priya Nagpurkar, Vice President, Hybrid Cloud and AI Platform, IBM Research, commented, "At IBM, we believe the next phase of AI is about efficiency and scale. We're focused on unlocking value for enterprises through AI solutions they can deploy effectively. As a founding contributor to llm-d, IBM is proud to be a key part of building a differentiated hardware agnostic distributed AI inference platform. We're looking forward to continued contributions towards the growth and success of this community to transform the future of AI inference."
Bill Pearson, Vice President, Data Center & AI Software Solutions and Ecosystem, Intel, said, "The launch of llm-d will serve as a key inflection point for the industry in driving AI transformation at scale, and Intel is excited to participate as a founding supporter. Intel's involvement with llm-d is the latest milestone in our decades-long collaboration with Red Hat to empower enterprises with open source solutions that they can deploy anywhere, on their platform of choice. We look forward to further extending and building AI innovation through the llm-d community."
Eve Callicoat, Senior Staff Engineer, ML Platform, Lambda, commented, "Inference is where the real-world value of AI is delivered, and llm-d represents a major leap forward. Lambda is proud to support a project that makes state-of-the-art inference accessible, efficient, and open."
Ujval Kapasi, Vice President, Engineering AI Frameworks, NVIDIA, stated, "The llm-d project is an important addition to the open source AI ecosystem and reflects NVIDIA's support for collaboration to drive innovation in generative AI. Scalable, highly performant inference is key to the next wave of generative and agentic AI. We're working with Red Hat and other supporting partners to foster llm-d community engagement and industry adoption, helping accelerate llm-d with innovations from NVIDIA Dynamo such as NIXL."
Ion Stoica, Professor and Director of Sky Computing Lab, University of California, Berkeley, remarked, "We are pleased to see Red Hat build upon the established success of vLLM, which originated in our lab to help address the speed and memory challenges that come with running large AI models. Open source projects like vLLM, and now llm-d anchored in vLLM, are at the frontier of AI innovation tackling the most demanding AI inference requirements and moving the needle for the industry at large."
Junchen Jiang, Professor at the LMCache Lab, University of Chicago, added, "Distributed KV cache optimisations, such as offloading, compression, and blending, have been a key focus of our lab, and we are excited to see llm-d leveraging LMCache as a core component to reduce time to first token as well as improve throughput, particularly in long-context inference."

Hashtags

#InferenceXferLibrary

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Cloudflare blocks AI crawlers to support content creators

Techday NZ

an hour ago

Techday NZ

Cloudflare blocks AI crawlers to support content creators

Cloudflare has implemented a new default setting to block AI crawlers from accessing website content without explicit permission or compensation, making it the first internet infrastructure provider to do so. With this change, every new customer and domain on Cloudflare's platform will start with a setting that blocks AI crawlers by default, shifting the responsibility to AI companies to request access and clarify the crawler's intended purpose, such as training, inference, or search. This new approach replaces the previous opt-out system with an opt-in model, giving more power to content creators and publishers over the use of their work. Cloudflare is also developing a feature called "Pay Per Crawl," which would allow content creators to request payment from AI companies seeking to use their content, thereby creating potential new revenue streams. This move addresses concerns about AI companies scraping web content without consent or compensation—a practice that many publishers and stakeholders argue threatens the future economic sustainability of the internet. Shifting value in online content The longstanding model of the internet has been based on a cycle in which search engines index web content, drive traffic to original websites, and provide revenue to creators through advertising. However, according to Cloudflare, the growing use of AI crawlers that extract information for large language models and other generative applications has disrupted this cycle by delivering answers without redirecting users to the original source. This change means creators may lose both the financial benefits and audience engagement historically generated by their work. Matthew Prince, Cloudflare's Co-founder and CEO, commented, "If the Internet is going to survive the age of AI, we need to give publishers the control they deserve and build a new economic model that works for everyone – creators, consumers, tomorrow's AI founders, and the future of the web itself. Original content is what makes the Internet one of the greatest inventions in the last century, and it's essential that creators continue making it. AI crawlers have been scraping content without limits. Our goal is to put the power back in the hands of creators, while still helping AI companies innovate. This is about safeguarding the future of a free and vibrant Internet with a new model that works for everyone." This sentiment has been echoed by several publishers and content platforms. Roger Lynch, CEO of Condé Nast, stated, "Cloudflare's innovative approach to block AI crawlers is a game-changer for publishers and sets a new standard for how content is respected online. When AI companies can no longer take anything, they want for free, it opens the door to sustainable innovation built on permission and partnership. This is a critical step toward creating a fair value exchange on the Internet that protects creators, supports quality journalism and holds AI companies accountable." Neil Vogel, CEO of Dotdash Meredith, remarked, "We have long said that AI platforms must fairly compensate publishers and creators to use our content. We can now limit access to our content to those AI partners willing to engage in fair arrangements. We're proud to support Cloudflare and look forward to using their tools to protect our content and the open web." Renn Turiano, Chief Consumer and Product Officer at Gannett Media, noted, "As the largest publisher in the country, comprised of USA TODAY and over 200 local publications throughout the USA TODAY Network, blocking unauthorised scraping and the use of our original content without fair compensation is critically important. As our industry faces these challenges, we are optimistic the Cloudflare technology will help combat the theft of valuable IP." Other technology companies have also spoken in support of the new permission-based system. Bill Ready, CEO of Pinterest, stated, "Creators and publishers around the world leverage Pinterest to expand their businesses, reach new audiences and directly measure their success. As AI continues to reshape the digital landscape, we are committed to building a healthy Internet infrastructure where content is used for its intended purpose, so creators and publishers can thrive." Steve Huffman, Reddit's Co-founder and CEO, pointed out, "AI companies, search engines, researchers, and anyone else crawling sites have to be who they say they are. And any platform on the web should have a say in who is taking their content for what. The whole ecosystem of creators, platforms, web users and crawlers will be better when crawling is more transparent and controlled, and Cloudflare's efforts are a step in the right direction for everyone." Vivek Shah, CEO of Ziff Davis, added, "We applaud Cloudflare for advocating for a sustainable digital ecosystem that benefits all stakeholders — the consumers who rely on credible information, the publishers who invest in its creation, and the advertisers who support its dissemination." Default enforcement Prior to this change, Cloudflare had already offered a one-click option to block AI crawlers since mid-2024. Over a million customers have enabled this option. With the latest move, every new website signing up to Cloudflare will be prompted to decide whether to allow or deny AI crawler access, streamlining the decision process and ensuring the default favours content owner control. Industry support More than 30 publishers, media, and technology companies have voiced their support for the new permission-based crawling model, including ADWEEK, The Associated Press, TIME, The Atlantic, Reddit, Pinterest, Quora, Sky News Group, and Universal Music Group, among others. Will Lee, CEO of ADWEEK, stated, "As the front page and homepage for marketing, advertising and media industry leaders, ADWEEK's position has been clear that we must be compensated for our investment grade journalism and information. I am thrilled Cloudflare has created a marketplace and mechanism that will enable us to properly participate in the promise LLMs have for our industry." Paul Edmondson, CEO of The Arena Group, said, "We think of our writers and content creators as entrepreneurs. Their work deserves protection. By blocking unauthorized AI crawlers, Cloudflare is not just defending content – it's defending the future of creators and storytellers. This is a vital move toward a digital economy built on trust, permission and fair value." Additional key supporters include BuzzFeed, PMC, Quora, Stack Overflow, News/Media Alliance, and Webflow, all of whom have commented on the significance of the move for the digital economy and the rights of creators. Technical implementation Cloudflare has also put forward new ways for AI bots to authenticate themselves and for webmasters to identify them, including participating in the development of industry protocols for bot identification and authentication. These mechanisms aim to increase transparency in the operation of AI crawlers, allowing website owners to make more informed choices about access to their content. AI companies are now required to obtain clear, explicit permission from websites prior to scraping content for AI training or generation purposes. Existing and future customers of Cloudflare can review and modify their crawler settings as required.

AI drives 80 percent of phishing with USD $112 million lost in India

Techday NZ

8 hours ago

Techday NZ

AI drives 80 percent of phishing with USD $112 million lost in India

Artificial intelligence has become the predominant tool in cybercrime, according to recent research and data from law enforcement and the cybersecurity sector. AI's growing influence A June 2025 report revealed that AI is now utilised in 80 percent of all phishing campaigns analysed this year. This marks a shift from traditional, manually created scams to attacks fuelled by machine-generated deception. Concurrently, Indian police recorded that criminals stole the equivalent of USD $112 million in a single state between January and May 2025, attributing the sharp rise in financial losses to AI-assisted fraudulent operations. These findings are reflected in the daily experiences of security professionals, who observe an increasing use of automation in social engineering, malware development, and reconnaissance. The pace at which cyber attackers are operating is a significant challenge for current defensive strategies. Methods of attack Large language models are now being deployed to analyse public-facing employee data and construct highly personalised phishing messages. These emails replicate a victim's communication style, job role and business context. Additionally, deepfake technology has enabled attackers to create convincing audio and video content. Notably, an incident in Hong Kong this year saw a finance officer send HK $200 million after participating in a deepfake video call bearing the likeness of their chief executive. Generative AI is also powering the development of malware capable of altering its own code and behaviour within hours. This constant mutation enables it to bypass traditional defences like endpoint detection and sandboxing solutions. Another tactic, platform impersonation, was highlighted by Check Point, which identified fake online ads for a popular AI image generator. These ads redirected users to malicious software disguised as legitimate installers, merging advanced loader techniques with sophisticated social engineering. The overall result is a landscape where AI lowers the barriers to entry for cyber criminals while amplifying the reach and accuracy of their attacks. Regulatory landscape Regulators are under pressure to keep pace with the changing threat environment. The European Union's AI Act, described as the first horizontal regulation of its kind, became effective last year. However, significant obligations affecting general-purpose AI systems will begin from August 2025. Industry groups in Brussels have requested a delay on compliance deadlines due to uncertainty over some of the rules, but firms developing or deploying AI will soon be subject to financial penalties for not adhering to the regulations. Guidance issued under the Act directly links the risks posed by advanced AI models to cybersecurity, including the creation of adaptive malware and the automation of phishing. This has created an expectation that security and responsible AI management are now interrelated priorities for organisations. Company boards are expected to treat the risks associated with generative models with the same seriousness as data protection or financial governance risks. Defensive measures A number of strategies have been recommended in response to the evolving threat environment. Top of the list is the deployment of behaviour-based detection systems that use machine learning in conjunction with threat intelligence, as traditional signature-based tools struggle against ever-changing AI-generated malware. Regular vulnerability assessments and penetration testing, ideally by CREST-accredited experts, are also regarded as essential to expose weaknesses overlooked by both automated and manual processes. Verification protocols for audio and video content are another priority. Using additional communication channels or biometric checks can help prevent fraudulent transactions initiated by synthetic media. Adopting zero-trust architectures, which strictly limit user privileges and segment networks, is advised to contain potential breaches. Teams managing AI-related projects should map inputs and outputs, track possible abuse cases, and retain detailed logs in order to meet audit obligations under the forthcoming EU regulations. Staff training programmes are also shifting focus. Employees are being taught to recognise subtle cues and nuanced context, rather than relying on spotting poor grammar or spelling mistakes as indicators of phishing attempts. Training simulations must evolve alongside the sophistication of modern cyber attacks. The human factor Despite advancements in technology, experts reiterate that people remain a core part of the defence against AI-driven cybercrime. Attackers are leveraging speed and scale, but defenders can rely on creativity, expertise, and interdisciplinary collaboration. "Technology alone will not solve AI‑enabled cybercrime. Attackers rely on speed and scale, but defenders can leverage creativity, domain expertise and cross‑disciplinary thinking. Pair seasoned red‑teamers with automated fuzzers; combine SOC analysts' intuition with real‑time ML insights; empower finance and HR staff to challenge 'urgent' requests no matter how realistic the voice on the call," said Himali Dhande, Cybersecurity Operations Lead at Borderless CS. The path ahead There is a consensus among experts that the landscape has been permanently altered by the widespread adoption of AI. It is increasingly seen as necessary for organisations to shift from responding to known threats to anticipating future methods of attack. Proactive security, embedded into every project and process, is viewed as essential not only for compliance but also for continued protection. Borderless CS stated it, "continues to track AI‐driven attack vectors and integrate them into our penetration‐testing methodology, ensuring our clients stay ahead of a rapidly accelerating adversary. Let's shift from reacting to yesterday's exploits to pre‐empting tomorrow's."

Why Mark Zuckerberg is offering millions to work for him.

RNZ News

10 hours ago

RNZ News

Why Mark Zuckerberg is offering millions to work for him.

Mark Zuckerberg, the man behind Facebook and Instagram, is on the hunt for new staff, and his approach is wild. In the race to snap up A.I talent, Zuckerberg is reportedly going on house visits and dangling big incentives, in some cases offering people a signing bonus of 100 million dollars. To embed this content on your own webpage, cut and paste the following: See terms of use.