logo
#

Latest news with #DongJiang

iFLYTEK wins CNCF award for AI model training with Volcano
iFLYTEK wins CNCF award for AI model training with Volcano

Techday NZ

time10-06-2025

  • Business
  • Techday NZ

iFLYTEK wins CNCF award for AI model training with Volcano

iFLYTEK has been named the winner of the Cloud Native Computing Foundation's End User Case Study Contest for advancements in scalable artificial intelligence infrastructure using the Volcano project. The selection recognises iFLYTEK's deployment of Volcano to address operational inefficiencies and resource management issues that arose as the company expanded its AI workloads. iFLYTEK, which specialises in speech and language artificial intelligence, reported experiencing underutilised GPUs, increasingly complex workflows, and competition among teams for resources as its computing demands expanded. These problems resulted in slower development progress and placed additional strain on infrastructure assets. With the implementation of Volcano, iFLYTEK introduced elastic scheduling, directed acyclic graph (DAG)-based workflows, and multi-tenant isolation into its AI model training operations. This transition allowed the business to improve the efficiency of its infrastructure and simplify the management of large-scale training projects. Key operational improvements cited include a significant increase in resource utilisation and reductions in system disruptions. DongJiang, Senior Platform Architect at iFLYTEK, said, "Before Volcano, coordinating training under large-scale GPU clusters across teams meant constant firefighting, from resource bottlenecks and job failures to debugging tangled training pipelines. Volcano gave us the flexibility and control to scale AI training reliably and efficiently. We're honoured to have our work recognized by CNCF, and we're excited to share our journey with the broader community at KubeCon + CloudNativeCon China." Volcano is a cloud native batch system built on Kubernetes and is designed to support performance-focused workloads such as artificial intelligence and machine learning training, big data processing, and scientific computing. The platform's features include job orchestration, resource fairness, and queue management, intended to maximise the efficient management of distributed workloads. Volcano was first accepted into the CNCF Sandbox in 2020 and achieved Incubating maturity level by 2022, reflecting increasing adoption for compute-intensive operations. iFLYTEK's engineering team cited the need for an infrastructure that could adapt to the rising scale and complexity of AI model training. Their objectives were to improve allocation of computing resources, manage multi-stage workflows efficiently, and limit disruptions to jobs while ensuring equitable resource access among multiple internal teams. The adoption of Volcano yielded several measurable outcomes for iFLYTEK's AI infrastructure. The company reported a 40% increase in GPU utilisation, contributing to lower infrastructure costs and reduced idle periods. Additionally, the company experienced a 70% faster recovery rate from training job failures, which contributed to more consistent and uninterrupted AI development. The speed of hyperparameter searches—a process integral to AI model optimisation—was accelerated by 50%, allowing the company's teams to test and refine models more swiftly. Chris Aniszczyk, Chief Technology Officer at CNCF, said, "iFLYTEK's case study shows how open source can solve complex, high-stakes challenges at scale. By using Volcano to boost GPU efficiency and streamline training workflows, they've cut costs, sped up development, and built a more reliable AI platform on top of Kubernetes, which is essential for any organization striving to lead in AI." As artificial intelligence workloads become increasingly complex and reliant on large-scale compute resources, the use of tools like Volcano has expanded among organisations seeking more effective operational strategies. iFLYTEK's experience with the platform will be the subject of a presentation at KubeCon + CloudNativeCon China, where company representatives will outline approaches to managing distributed model training within Kubernetes-based environments. iFLYTEK will present its case study, titled "Scaling Large Model Training in Kubernetes Clusters with Volcano," sharing technical and practical insights with participants seeking to optimise large-scale artificial intelligence training infrastructure.

iFLYTEK Wins CNCF End User Case Study Contest for Scalable AI Infrastructure Breakthroughs with Volcano
iFLYTEK Wins CNCF End User Case Study Contest for Scalable AI Infrastructure Breakthroughs with Volcano

Yahoo

time10-06-2025

  • Business
  • Yahoo

iFLYTEK Wins CNCF End User Case Study Contest for Scalable AI Infrastructure Breakthroughs with Volcano

Company to present large-scale Kubernetes model training success at KubeCon + CloudNativeCon China 2025 HONG KONG, June 9, 2025 /PRNewswire/ -- The Cloud Native Computing Foundation® (CNCF®), which builds sustainable ecosystems for cloud native software, today announced iFLYTEK as the winner of the CNCF End User Case Study Contest. Selected for its impactful implementation of Volcano, iFLYTEK will present its success scaling large AI model training at KubeCon + CloudNativeCon China 2025, 10–11 June in Hong Kong. iFLYTEK, a Chinese tech firm focused on speech and language AI, faced scaling issues as its workloads grew. Inefficient scheduling left GPUs underused, workflows became harder to manage, and teams competed for resources. These challenges slowed progress and strained infrastructure. With Volcano, iFLYTEK adopted elastic scheduling, DAG-based workflows, and multi-tenant isolation, resulting in simplified operations and improved resource usage. "Before Volcano, coordinating training under large-scale GPU clusters across teams meant constant firefighting, from resource bottlenecks and job failures to debugging tangled training pipelines," said DongJiang, senior platform architect, iFLYTEK. "Volcano gave us the flexibility and control to scale AI training reliably and efficiently. We're honored to have our work recognized by CNCF, and we're excited to share our journey with the broader community at KubeCon + CloudNativeCon China." Volcano is a cloud native batch system built on Kubernetes, designed for high-performance workloads such as AI/ML training, big data processing, and scientific computing. It offers advanced scheduling capabilities such as job orchestration, resource fairness, and queue management, which are essential for managing large-scale, distributed tasks efficiently. Accepted into the CNCF Sandbox in 2020 and promoted to Incubating maturity level in 2022, Volcano has become a foundational tool for organizations running compute-intensive workloads. As AI demand increased, iFLYTEK turned to Volcano to support the growing complexity and scale of their training infrastructure. The engineering team was looking for a way to more efficiently allocate resources, manage complex multi-stage training workflows, and minimize job disruptions; all while ensuring fair access for different teams. With Volcano, they are now able to streamline operations, better utilize GPUs, and stabilize long-running jobs: 40% increase in GPU utilization, cutting infrastructure costs and reducing idle compute. 70% faster recovery from job failures, ensuring uninterrupted training processes. 50% acceleration in hyperparameter search, enabling faster iteration and innovation. "iFLYTEK's case study shows how open source can solve complex, high-stakes challenges at scale," said Chris Aniszczyk, CTO of CNCF. "By using Volcano to boost GPU efficiency and streamline training workflows, they've cut costs, sped up development, and built a more reliable AI platform on top of Kubernetes, which is essential for any organization striving to lead in AI." As AI workloads grow more complex and resource-intensive, iFLYTEK's experience shows how cloud native tools like Volcano can help teams simplify operations and improve scalability. Their upcoming KubeCon + CloudNativeCon China presentation will share practical insights on managing distributed training more effectively in Kubernetes environments. For more information and the full event schedule, including iFLYTEK's session "Scaling Large Model Training in Kubernetes Clusters with Volcano" on 11 June, visit: Additional Resources CNCF Newsletter CNCF Twitter CNCF Website Learn About CNCF Membership Learn About the CNCF End User Community About Cloud Native Computing FoundationCloud native computing empowers organizations to build and run scalable applications with an open source software stack in public, private, and hybrid clouds. The Cloud Native Computing Foundation (CNCF) hosts critical components of the global technology infrastructure, including Kubernetes, Prometheus, and Envoy. CNCF brings together the industry's top developers, end users, and vendors and runs the largest open source developer conferences in the world. Supported by more than 800 members, including the world's largest cloud computing and software companies, as well as over 200 innovative startups, CNCF is part of the nonprofit Linux Foundation. For more information, please visit The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our trademark usage page. Linux is a registered trademark of Linus Torvalds. Media ContactKaitlin ThornhillThe Linux Foundationpr@ View original content to download multimedia: SOURCE Cloud Native Computing Foundation

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store