After Chinese companies lost access to Nvidia’s leading-edge A100 and H100 compute GPUs, which can be used to train various AI models, they had to find ways to train them without using the most advanced hardware. To compensate for the lack of powerful GPUs, Chinese AI model developers are instead simplifying their programs to reduce requirements, and using all the compute hardware they can get in combination, the Wall Street Journal reports.
Nvidia cannot sell its A100 and H100 compute GPUs to Chinese entities like Alibaba or Baidu without getting an export license from the U.S. Department of Commerce (and any application would almost certainly be denied). So Nvidia has developed A800 and H800 processors that offer reduced performance and come with handicapped NVLink capabilities, which limits the ability to build high-performance multi-GPU systems traditionally required to train large-scale AI models.
For example, the large-scale language model behind OpenAI’s ChatGPT requires from 5,000 to 10,000 of Nvidia’s A100 GPUs to train, according to estimates by UBS analysts, reports the WSJ. Since Chinese developers do not have access to A100s, they use less capable A800 and H800 in combination to achieve something akin to the performance of Nvidia’s higher-performance GPUs, according to Yang You, a professor at the National University of Singapore and founder of HPC-AI Tech. In April, Tencent introduced a new computing cluster using Nvidia’s H800s for large-scale AI model training. This approach can be expensive, as Chinese firms might need three times more H800s as their U.S. counterparts would require H100s for similar results.
Due to high costs and the inability to physically get all the GPUs they need, Chinese companies have designed methods to train large-scale AI models across different chip types, something that U.S.-based companies rarely do due to technical challenges and reliability concerns. For example, companies like Alibaba, Baidu, and Huawei have explored using combinations of Nvidia’s A100s, V100s, and P100s, and Huawei’s Ascends, according to research papers reviewed by WSJ.
Although there are numerous companies in China developing processors for AI workloads, their hardware is not supported by robust software platforms like Nvidia’s CUDA, which is why machines based on such chips are reportedly ‘prone to crushing.’
In addition, Chinese firms have also been more aggressive in combining various software techniques to reduce the computational requirements of training large-scale AI models, an approach that has yet to gain traction globally. Despite the challenges and ongoing refinements, Chinese researchers have seen some success in these methods.
In a recent paper, Huawei researchers demonstrated training their latest-generation large language model, PanGu-Σ, using only Ascend processors and without Nvidia compute GPUs. While there were some shortcomings, the model achieved state-of-the-art performance in a few Chinese-language tasks, such as reading comprehension and grammar tests.
Analysts warn that Chinese researchers will face increased difficulties without access to Nvidia’s new H100 chip, which includes an additional performance-enhancing feature particularly useful for training ChatGPT-like models. Meanwhile, a paper published last year by Baidu and Peng Cheng Laboratory demonstrated that researchers were training large language models using a method that could render the additional feature irrelevant.
“If it works well, they can effectively circumvent the sanctions,” Dylan Patel, chief analyst at SemiAnalysis, is reported to have said.