logo
#

Latest news with #SGLang

KAYTUS Unveils Upgraded MotusAI to Accelerate LLM Deployment
KAYTUS Unveils Upgraded MotusAI to Accelerate LLM Deployment

Business Wire

time12-06-2025

  • Business
  • Business Wire

KAYTUS Unveils Upgraded MotusAI to Accelerate LLM Deployment

SINGAPORE--(BUSINESS WIRE)-- KAYTUS, a leading provider of end-to-end AI and liquid cooling solutions, today announced the release of the latest version of its MotusAI AI DevOps Platform at ISC High Performance 2025. The upgraded MotusAI platform delivers significant enhancements in large model inference performance and offers broad compatibility with multiple open-source tools covering the full lifecycle of large models. Engineered for unified and dynamic resource scheduling, it dramatically improves resource utilization and operational efficiency in large-scale AI model development and deployment. This latest release of MotusAI is set to further accelerate AI adoption and fuel business innovation across key sectors such as education, finance, energy, automotive, and manufacturing. As large AI models become increasingly embedded in real-world applications, enterprises are deploying them at scale, to generate tangible value across a wide range of sectors. Yet, many organizations continue to face critical challenges in AI adoption, including prolonged deployment cycles, stringent stability requirements, fragmented open-source tool management, and low compute resource utilization. To address these pain points, KAYTUS has introduced the latest version of its MotusAI AI DevOps Platform, purpose-built to streamline AI deployment, enhance system stability, and optimize AI infrastructure efficiency for large-scale model operations. Enhanced Inference Performance to Ensure Service Quality Deploying AI inference services is a complex undertaking that involves service deployment, management, and continuous health monitoring. These tasks require stringent standards in model and service governance, performance tuning via acceleration frameworks, and long-term service stability, all of which typically demand substantial investments in manpower, time, and technical expertise. The upgraded MotusAI delivers robust large-model deployment capabilities that bring visibility and performance into perfect alignment. By integrating optimized frameworks such as SGLang and vLLM, MotusAI ensures high-performance, distributed inference services that enterprises can deploy quickly and with confidence. Designed to support large-parameter models, MotusAI leverages intelligent resource and network affinity scheduling to accelerate time-to-launch while maximizing hardware utilization. Its built-in monitoring capabilities span the full stack—from hardware and platforms to pods and services—offering automated fault diagnosis and rapid service recovery. MotusAI also supports dynamic scaling of inference workloads based on real-time usage and resource monitoring, delivering enhanced service stability. Comprehensive Tool Support to Accelerate AI Adoption As AI model technologies evolve rapidly, the supporting ecosystem of development tools continues to grow in complexity. Developers require a streamlined, universal platform to efficiently select, deploy, and operate these tools. The upgraded MotusAI provides extensive support for a wide range of leading open-source tools, enabling enterprise users to configure and manage their model development environments on demand. With built-in tools such as LabelStudio, MotusAI accelerates data annotation and synchronization across diverse categories, improving data processing efficiency and expediting model development cycles. MotusAI also offers an integrated toolchain for the entire AI model lifecycle. This includes LabelStudio and OpenRefine for data annotation and governance, LLaMA-Factory for fine-tuning large models, Dify and Confluence for large model application development, and Stable Diffusion for text-to-image generation. Together, these tools empower users to adopt large models quickly and boost development productivity at scale. Hybrid Training-Inference Scheduling on the Same Node to Maximize Resource Efficiency Efficient utilization of computing resources remains a critical priority for AI startups and small to mid-sized enterprises in the early stages of AI adoption. Traditional AI clusters typically allocate compute nodes separately for training and inference tasks, limiting the flexibility and efficiency of resource scheduling across the two types of workloads. The upgraded MotusAI overcomes traditional limitations by enabling hybrid scheduling of training and inference workloads on a single node, allowing for seamless integration and dynamic orchestration of diverse task types. Equipped with advanced GPU scheduling capabilities, MotusAI supports on-demand resource allocation, empowering users to efficiently manage GPU resources based on workload requirements. MotusAI also features multi-dimensional GPU scheduling, including fine-grained partitioning and support for Multi-Instance GPU (MIG), addressing a wide range of use cases across model development, debugging, and inference. MotusAI's enhanced scheduler significantly outperforms community-based versions, delivering a 5× improvement in task throughput and 5× reduction in latency for large-scale POD deployments. It enables rapid startup and environment readiness for hundreds of PODs while supporting dynamic workload scaling and tidal scheduling for both training and inference. These capabilities empower seamless task orchestration across a wide range of real-world AI scenarios. About KAYTUS KAYTUS is a leading provider of end-to-end AI and liquid cooling solutions, delivering a diverse range of innovative, open, and eco-friendly products for cloud, AI, edge computing, and other emerging applications. With a customer-centric approach, KAYTUS is agile and responsive to user needs through its adaptable business model. Discover more at and follow us on LinkedIn and X.

KAYTUS Unveils Upgraded MotusAI to Accelerate LLM Deployment
KAYTUS Unveils Upgraded MotusAI to Accelerate LLM Deployment

Yahoo

time12-06-2025

  • Business
  • Yahoo

KAYTUS Unveils Upgraded MotusAI to Accelerate LLM Deployment

Streamlined inference performance, tool compatibility, resource scheduling, and system stability to fast-track large AI model deployment. SINGAPORE, June 12, 2025--(BUSINESS WIRE)--KAYTUS, a leading provider of end-to-end AI and liquid cooling solutions, today announced the release of the latest version of its MotusAI AI DevOps Platform at ISC High Performance 2025. The upgraded MotusAI platform delivers significant enhancements in large model inference performance and offers broad compatibility with multiple open-source tools covering the full lifecycle of large models. Engineered for unified and dynamic resource scheduling, it dramatically improves resource utilization and operational efficiency in large-scale AI model development and deployment. This latest release of MotusAI is set to further accelerate AI adoption and fuel business innovation across key sectors such as education, finance, energy, automotive, and manufacturing. As large AI models become increasingly embedded in real-world applications, enterprises are deploying them at scale, to generate tangible value across a wide range of sectors. Yet, many organizations continue to face critical challenges in AI adoption, including prolonged deployment cycles, stringent stability requirements, fragmented open-source tool management, and low compute resource utilization. To address these pain points, KAYTUS has introduced the latest version of its MotusAI AI DevOps Platform, purpose-built to streamline AI deployment, enhance system stability, and optimize AI infrastructure efficiency for large-scale model operations. Enhanced Inference Performance to Ensure Service Quality Deploying AI inference services is a complex undertaking that involves service deployment, management, and continuous health monitoring. These tasks require stringent standards in model and service governance, performance tuning via acceleration frameworks, and long-term service stability, all of which typically demand substantial investments in manpower, time, and technical expertise. The upgraded MotusAI delivers robust large-model deployment capabilities that bring visibility and performance into perfect alignment. By integrating optimized frameworks such as SGLang and vLLM, MotusAI ensures high-performance, distributed inference services that enterprises can deploy quickly and with confidence. Designed to support large-parameter models, MotusAI leverages intelligent resource and network affinity scheduling to accelerate time-to-launch while maximizing hardware utilization. Its built-in monitoring capabilities span the full stack—from hardware and platforms to pods and services—offering automated fault diagnosis and rapid service recovery. MotusAI also supports dynamic scaling of inference workloads based on real-time usage and resource monitoring, delivering enhanced service stability. Comprehensive Tool Support to Accelerate AI Adoption As AI model technologies evolve rapidly, the supporting ecosystem of development tools continues to grow in complexity. Developers require a streamlined, universal platform to efficiently select, deploy, and operate these tools. The upgraded MotusAI provides extensive support for a wide range of leading open-source tools, enabling enterprise users to configure and manage their model development environments on demand. With built-in tools such as LabelStudio, MotusAI accelerates data annotation and synchronization across diverse categories, improving data processing efficiency and expediting model development cycles. MotusAI also offers an integrated toolchain for the entire AI model lifecycle. This includes LabelStudio and OpenRefine for data annotation and governance, LLaMA-Factory for fine-tuning large models, Dify and Confluence for large model application development, and Stable Diffusion for text-to-image generation. Together, these tools empower users to adopt large models quickly and boost development productivity at scale. Hybrid Training-Inference Scheduling on the Same Node to Maximize Resource Efficiency Efficient utilization of computing resources remains a critical priority for AI startups and small to mid-sized enterprises in the early stages of AI adoption. Traditional AI clusters typically allocate compute nodes separately for training and inference tasks, limiting the flexibility and efficiency of resource scheduling across the two types of workloads. The upgraded MotusAI overcomes traditional limitations by enabling hybrid scheduling of training and inference workloads on a single node, allowing for seamless integration and dynamic orchestration of diverse task types. Equipped with advanced GPU scheduling capabilities, MotusAI supports on-demand resource allocation, empowering users to efficiently manage GPU resources based on workload requirements. MotusAI also features multi-dimensional GPU scheduling, including fine-grained partitioning and support for Multi-Instance GPU (MIG), addressing a wide range of use cases across model development, debugging, and inference. MotusAI's enhanced scheduler significantly outperforms community-based versions, delivering a 5× improvement in task throughput and 5× reduction in latency for large-scale POD deployments. It enables rapid startup and environment readiness for hundreds of PODs while supporting dynamic workload scaling and tidal scheduling for both training and inference. These capabilities empower seamless task orchestration across a wide range of real-world AI scenarios. About KAYTUS KAYTUS is a leading provider of end-to-end AI and liquid cooling solutions, delivering a diverse range of innovative, open, and eco-friendly products for cloud, AI, edge computing, and other emerging applications. With a customer-centric approach, KAYTUS is agile and responsive to user needs through its adaptable business model. Discover more at and follow us on LinkedIn and X. View source version on Contacts Media Contacts media@ Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Atlas Cloud Launches High-Efficiency AI Inference Platform, Outperforming DeepSeek
Atlas Cloud Launches High-Efficiency AI Inference Platform, Outperforming DeepSeek

Miami Herald

time28-05-2025

  • Business
  • Miami Herald

Atlas Cloud Launches High-Efficiency AI Inference Platform, Outperforming DeepSeek

Developed with SGLang, Atlas Inference surpasses leading AI companies in throughput and cost, running DeepSeek V3 & R1 faster than DeepSeek themselves. NEW YORK CITY, NEW YORK / ACCESS Newswire / May 28, 2025 / Atlas Cloud, the all-in-one AI competency center for training and deploying AI models, today announced the launch of Atlas Inference, an AI inference platform that dramatically reduces GPU and server requirements, enabling faster, more cost-effective deployment of large language models (LLMs). Atlas Inference, co-developed with SGLang, an AI inference engine, maximizes GPU efficiency by processing more tokens faster and with less hardware. When comparing DeepSeek's published performance results, Atlas Inference's 12-node H100 cluster outperformed DeepSeek's reference implementation of their DeepSeek-V3 model while using two-thirds of the servers. Atlas' platform reduces infrastructure requirements and operational costs while addressing hardware costs, which represent up to 80% of AI operational expenses. "We built Atlas Inference to fundamentally break down the economics of AI deployment," said Jerry Tang, Atlas CEO. "Our platform's ability to process 54,500 input tokens and 22,500 output tokens per second per node means businesses can finally make high-volume LLM services profitable instead of merely break-even. I believe this will have a significant ripple effect throughout the industry. Simply put, we're surpassing industry standards set by hyperscalers by delivering superior throughput with fewer resources." Atlas Inference's performance also exceeds major players like Amazon, NVIDIA and Microsoft, delivering up to 2.1 times greater throughput using 12 nodes compared to competitors' larger setups. It maintains sub-5-second first-token latency and 100-millisecond inter-token latency with more than 10,000 concurrent sessions, ensuring a scaled, superior experience. The platform's performance is driven by four key innovations: Prefill/Decode Disaggregation: Separates compute-intensive operations from memory-bound processes to optimize efficiencyDeepExpert (DeepEP) Parallelism with Load Balancers: Ensures over 90% GPU utilizationTwo-Batch OverlapTechnology: Increases throughput by enabling larger batches and utilization of both compute and communication phases simultaneouslyDisposableTensor Memory Models: Prevents crashes during long sequences for reliable operation "This platform represents a significant leap forward for AI inference," said Yineng Zhang, Core Developer at SGLang. "What we built here may become the new standard for GPU utilization and latency management. We believe this will unlock capabilities previously out of reach for the majority of the industry regarding throughput and efficiency." Combined with a lower cost per token, linear scaling behavior, and reduced emissions compared to leading vendors, Atlas Inference provides a cost-efficient and scalable AI deployment. Atlas Inference works with standard hardware and supports custom models, giving customers complete flexibility. Teams can upload fine-tuned models and keep them isolated on dedicated GPUs, making the platform ideal for organizations requiring brand-specific voice or domain expertise. The platform is available immediately for enterprise customers and early-stage startups. About Atlas Cloud Atlas Cloud is your all-in-one AI competency center, powering leading AI teams with safe, simple, and scalable infrastructure for training and deploying models. Atlas Cloud also offers an on-demand GPU platform that delivers fast, serverless compute. Backed by Dell, HPE, and Supermicro, Atlas delivers near instant access to up to 5,000 GPUs across a global SuperCloud fabric with 99% uptime and baked-in compliance. Learn more at SOURCE: Atlas Cloud press release

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store