Fujitsu uses Fugaku supercomputer to train LLM: 13 billion parameters

Although Fujitsu’s Fugaku supercomputer is no longer the world’s fastest machine from the Top 500 supercomputer list, it still is a very capable system and the versatility of the A64FX processor allows to use it for a variety of workloads, such as AI. This week Fujitsu released its Fugaku-LLM, a large language model with advanced Japanese language processing capabilities that is designed for both research and commercial applications. 

Fujitsu’s Fugaku-LLM was trained using 380 billion tokens on 13,824 nodes of the Fugaku supercomputer based on the A64FX processor that supports FP64, FP32, FP16 and INT8 modes for a variety of AI and conventional supercomputer applications. The training of Fugaku-LLM naturally took advantage of distributed parallel learning techniques optimized for the supercomputer’s architecture and the Tofu interconnect D. 

The Fugaku-LLM features 13 billion parameters, which looks pale compared to GPT-4’s 175 billion, which is the largest LLM ever trained in Japan. Fujitsu says that its 13 billion parameter LLM does not require vast compute resources to inference, which will be optimal for businesses and researchers in Japan. Approximately 60% of the training data was in Japanese and 40% of the data was data in English, mathematics, and code.

This extensive Japanese-centric training sets it apart from other Japanese models that were trained primarily on English datasets. As a result, Fugaku-LLM boasts superior proficiency in Japanese, achieving an average score of 5.5 on the Japanese MT-Bench, the top score among openly available models trained with original data from Japan. It particularly excels in humanities and social sciences, achieving an impressive benchmark score of 9.18, according to Fujitsu.

The Fugaku-LLM initiative has been driven by collaborations among leading Japanese institutions including Tokyo Institute of Technology, Tohoku University, Fujitsu Limited, RIKEN, Nagoya University, CyberAgent, and Kotoba Technologies. One of the reasons they collaborated was a shortage of GPUs typically used to train and inference AI models. Another reason is that the model could be used with Fujitsu’s next-generation 150-core Monaka datacenter CPU optimized for both AI and HPC workloads.

Fugaku-LLM is now available for both academic and commercial purposes under specified licensing terms from GitHub and Hugging Face (though Fujitsu did not provide any links). Additionally, it will also be offered via the Fujitsu Research Portal from May 10, 2024.