in

Alibaba Cloud unveils datacenter design, homebrew network for LLM training #cloudcomputing

Alibaba Cloud reveals its datacenter design, homebrew network used for LLM training • The Register

Alibaba Cloud has developed a specialized Ethernet-based network for training large language models, which has been in production for eight months. The decision to use Ethernet was made to avoid vendor lock-in and benefit from the evolution of the Ethernet Alliance. The network design, called High Performance Network (HPN), addresses issues like hash polarization and single points of failure in AI infrastructure.

Each host used for training contains eight GPUs and nine network interface cards, with a dedicated network for intra-host communication. The design aims to maximize GPU capabilities and network throughput. Alibaba Cloud prefers single-chip switches for their stability and lower failure rates compared to multi-chip switches.

The network design includes a DIY heatsink to prevent switches from overheating, as well as pods housing 15,000 GPUs in a single datacenter building. The company is already planning for the next generation of network architecture with higher capacity switches.

Alibaba Cloud’s training of large language models relies on a distributed training cluster with millions of GPUs. The company’s Qwen model, trained on 110 billion parameters, indicates the scale of its operations and the need for continued expansion. The network design and infrastructure improvements aim to support the growing demands of AI workloads in the future.

Source link

Source link: https://www.theregister.com/AMP/2024/06/27/alibaba_network_datacenter_designs_revealed/

What do you think?

Leave a Reply

GIPHY App Key not set. Please check settings

Free AI Tools to Boost Your Income: Expert Tips and Strategies for 2024 | by Shaaz Mustafa | Jul, 2024

Boost Your Income in 2024 with Free AI Tools #AItools

ChatGPT-Maker OpenAI And Microsoft Sued By US Newspapers, Here Is Why - Times Now

Access Denied: A Restriction on Entry to Certain Areas #Privacy