Latest news with #NPUs


Forbes
13-06-2025
- Business
- Forbes
Data Centers 2.0: How AI Is Transforming Operations
By Dr. Steven Woo, fellow and distinguished inventor at Rambus. Rapid advancements in AI are becoming commonplace, driven by large language models (LLMs) that now exceed 1 trillion parameters. While these AI models are revolutionizing many industries, their increasing demand for computational power is driving the need for more specialized and higher-performance infrastructure. AI workloads—which include model training, inference and real-time analytics—depend heavily on processing and moving data quickly, driving the semiconductor industry to develop specialized hardware and demand high levels of scalability, power efficiency and robust data security. As AI's role in business and society continues to grow, the infrastructure that supports these workloads—including computing power, storage systems, networking capabilities and security frameworks—must adapt to handle the increasing complexity and scale of AI applications. In particular, future AI-specific data centers must address challenges related to performance, energy consumption, thermal management and memory requirements. At their core, AI-specific servers share many components with traditional servers, including processors, memory and storage. However, each of these components is highly optimized for parallel processing and often features dedicated AI accelerators, such as neural processing units (NPUs) and AI processing units (APUs) alongside high-performance graphics processing units (GPUs). These components are specifically designed to handle the intensive computational demands of AI or high-performance computing (HPC) workloads. Unlike traditional servers, which are designed for general-purpose computing, AI servers are purpose-built for the specialized processes used in training and inference with complex models and processing large datasets. Training and inference using AI models require growing amounts of memory and power. McKinsey & Company reports that training AI models like ChatGPT and CoPilot can demand over 80 kilowatts (kW) per rack, a stark contrast to traditional servers, which typically use 17 kW per rack. This rapid growth in model complexity, combined with the need for high data throughput and near-real-time results, places a heavy burden on data center resources, as well as energy (i.e., electricity) needs. AI-specific servers are designed to address these needs with specialized memory configurations and high-performance accelerators that ensure optimal performance at scale. One key innovation in this memory architecture is high-bandwidth memory (HBM), which stacks multiple die together to increase storage capacity and connects these devices to processing engines through an interposer that increases data transfer speeds. This enables faster data access, reducing a key bottleneck associated with training large AI models. As model sizes increase, memory systems must incorporate parallelism and scaling techniques to keep up with AI workload demands. Stacked dynamic random access memory (DRAM) configurations like HBM offer the high bandwidth required by AI workloads, but they come with challenges, including increased power consumption, difficulties in scaling memory and complexities in managing heat dissipation. To mitigate these challenges, data centers are exploring advanced cooling mechanisms, such as liquid cooling systems, and more efficient power delivery strategies. These solutions aim to improve overall energy efficiency without sacrificing AI model performance. At the same time, optimizing the interface between processors and memory has become another crucial challenge. Memory systems account for a significant portion of a data center's energy consumption, consuming between 25% and 40% of the total power. Advances in memory-to-processor communication are helping reduce inefficiencies, lower latency and improve overall bandwidth. These improvements contribute to more efficient energy use and can reduce the amount of cooling required in a system. Another key design challenge involves securing sensitive data in AI data centers as workloads continue to grow in both scale and value. Given the vast amounts of data being processed, security is critical—not only because the data and insights are highly valuable, but also because an AI model itself could be tampered with or compromised without proper security measures, potentially leading to wasted resources and business losses. As AI workloads continue to evolve, new solutions are being developed to improve the performance, scalability and efficiency of AI data centers. Innovations in chip packaging, such as 3D chip stacking—where multiple chips are stacked vertically on top of each other to form a single integrated circuit (IC)—are particularly exciting, as they allow for tighter integration of more components in smaller footprints, reducing latency and enhancing overall system performance and compute density. Technologies like chiplets and 3D stacking are enabling the development of more advanced AI-specific hardware with improved scalability and lower energy consumption. Another promising innovation is processing-in-memory (PIM), which moves processing capabilities away from accelerators and compute engines and closer to memory. In existing systems, data is accessed in memory and sent to the processing engines, introducing latency and high-power consumption. With PIM, processing units are integrated directly within or nearer to the memory, minimizing the distance that data travels. By processing data in or near memory, there is less resulting data that is transmitted to the AI engines, further reducing the power spent moving data. This innovation significantly improves both performance and energy efficiency, offering a transformative shift in processing architectures. While advances like chiplets, 3D stacking, PIM and liquid cooling boost efficiency and improve performance for AI-specific workloads, they also introduce new challenges. • Chiplet-based architectures depend on advanced packaging and low-latency interconnects that are difficult to design and standardize across vendors. • 3D stacking, while improving compute density, introduces challenges with delivering increasing amounts of power and cooling to these structures. • PIM architectures require standardization and substantial changes to software stacks and ecosystems to fully realize their benefits. • Liquid cooling, though effective in enhancing energy efficiency, increases system complexity, introduces new mechanical systems and requires increased infrastructure costs. In addition, broader questions remain around scaling these innovations sustainably, managing material supply chains and ensuring system reliability under relentless AI workloads. As organizations invest in AI-specific infrastructure, they must stay attuned not only to technical breakthroughs but also to the evolving risks and standards that will shape the next generation of data centers. Addressing these gaps will help enterprises position themselves as leaders in the future of AI computing. Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?


Express Tribune
17-05-2025
- Business
- Express Tribune
Equal1 launches first silicon quantum computer for standard data centres
Listen to article Irish quantum computing firm Equal1 has launched what it calls the world's first silicon-based, rack-mountable quantum computer, bringing scalable quantum processing into standard high-performance computing (HPC) environments. Named Bell-1, the system can be deployed like a traditional server—no cleanrooms, complex infrastructure, or cryogenic labs required. It fits in standard 19-inch server racks, weighs roughly 200 kilograms, and draws only 1600 watts of power, comparable to an enterprise GPU server. At the core of Bell-1 is Equal1's UnityQ 6-qubit chip, based on silicon spin qubits manufactured using conventional semiconductor processes. The system integrates quantum, classical (Arm CPUs), and AI (NPUs) components into a single chip, eliminating latency issues typically found in hybrid quantum-classical systems. Bell-1 also features a closed-cycle cryo-cooling unit that cools the system to 0.3 Kelvin—colder than outer space—without relying on liquid helium or bulky dilution refrigerators. This self-contained design removes one of the major barriers to practical quantum deployment. Equal1 calls this new phase "Quantum Computing 2.0", aiming to shift the technology from isolated research labs into everyday commercial data centres. Industries such as finance, materials science, artificial intelligence, and pharmaceuticals are expected to benefit from real-time quantum acceleration on workloads like simulations and optimisation. 'Our vision with Bell-1 was to make quantum computing accessible, scalable, and practical,' said Equal1 CEO Jason Lynch. 'This is the first system designed for real-world use without compromising performance or ease of deployment.' The modular design allows for future upgrades as qubit counts rise. Rather than replacing full systems, users can swap in new chips as the UnityQ platform evolves, making Bell-1 a long-term investment for early adopters. The launch follows peer-reviewed research published by Equal1 in late 2024 that demonstrated industry-leading silicon qubit fidelity and gate speeds. The company says Bell-1 builds directly on this research, moving from laboratory prototype to production-ready quantum computing. With Bell-1, Equal1 eliminates the traditional trade-offs of quantum hardware: it is powerful, practical, scalable, and available now.