logo
#

Latest news with #开源

Baidu the latest to join open-source movement with Ernie 4.5 models publicly available
Baidu the latest to join open-source movement with Ernie 4.5 models publicly available

South China Morning Post

time30-06-2025

  • Business
  • South China Morning Post

Baidu the latest to join open-source movement with Ernie 4.5 models publicly available

Chinese tech giant Baidu on Monday marked its entry into the highly competitive field of Chinese open-source artificial intelligence (AI) systems, by making its flagship Ernie 4.5 models available for download on AI site Hugging Face. Advertisement Baidu open-sourced 10 variants from its Ernie 4.5 multimodal model family, from the 0.3 billion parameter lightweight models to the heavyweight 424 billion parameter ones, according to a statement. Beijing-based Baidu, one of the earliest tech firms in China to develop large language models (LLMs) following the release of ChatGPT in November 2022, has made a U-turn by making its models open-source. A year ago, founder and CEO Robin Li Yanhong was publicly saying its Ernie series, like OpenAI's ChatGPT models, would be more powerful than open-source ones. However, the release of open-source models by Chinese start-up DeepSeek, which took the AI world by storm at the start of this year, triggered an accelerated shift to open-source by China's Big Tech firms. For example, the Qwen models developed by Alibaba Group Holding are the world's most popular open-source models among developers. Alibaba owns the South China Morning Post. The logo of Baidu's Ernie Bot is displayed near a screen showing the Baidu logo, in this illustration picture taken June 28, 2023. Photo: Reuters Citing a range of benchmark tests that value an AI system's general and domain knowledge, coding and maths skills, as well as reasoning capabilities, Baidu said that its 300B Ernie 4.5 model outperformed DeepSeek's V3, which was twice the size of the Ernie model. Advertisement The benchmark results showcase the progress Baidu has made in improving its models in recent months, after the company announced earlier this year it would shift to an open source approach. The move followed Hangzhou-based DeepSeek's emergence into the global spotlight with its open-source V3 and R1 models that were built cost-efficiently for high-performance tasks.

DeepSeek paper offers new details on how it used 2,048 Nvidia chips to take on OpenAI
DeepSeek paper offers new details on how it used 2,048 Nvidia chips to take on OpenAI

South China Morning Post

time16-05-2025

  • Business
  • South China Morning Post

DeepSeek paper offers new details on how it used 2,048 Nvidia chips to take on OpenAI

Chinese artificial intelligence (AI) research lab DeepSeek has released a new research paper revealing in detail for the first time how it built one of the world's most powerful open-source AI systems at a fraction of the cost of its competitors. 'Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures', co-authored by DeepSeek founder Liang Wenfeng and released on Wednesday, attributes the start-up's breakthrough in training high-performance, cost-efficient AI systems to a hardware-software co-design approach. 'DeepSeek-V3, trained on 2,048 Nvidia H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inference at scale,' the researchers wrote. DeepSeek and its hedge fund owner High-Flyer had previously stockpiled the H800, which Nvidia originally designed for the China market to comply with US export restrictions but were banned from export to to the country in 2023. The start-up's training approach stemmed from the team's awareness of hardware constraints and the 'exorbitant costs' of training large language models (LLMs) – the technology behind AI chatbots such as OpenAI's ChatGPT – according to the paper. The paper details technical optimisations that boost memory efficiency, streamline inter-chip communication, and enhance overall AI infrastructure performance – key advancements for reducing operational costs while scaling capabilities. These offer a 'practical blueprint for innovation in next-generation AI systems', the researchers said. Play DeepSeek also highlighted its use of a mixture-of-experts (MoE) model architecture, a machine-learning approach that divides an AI model into separate sub-networks, or experts, each focused on a subset of the input data while working collaboratively.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store