#TokyoTech releases Fugaku-LLM, large language model on supercomputer Fugaku #AIResearch

A team of researchers in Japan has developed a large language model called Fugaku-LLM with enhanced Japanese language capabilities using the RIKEN supercomputer Fugaku. The model has 13 billion parameters and outperforms other models in terms of Japanese language proficiency. It was trained on a combination of Japanese, English, mathematics, and code data. The team optimized the performance of Transformers on Fugaku and accelerated communication for distributed training. Fugaku-LLM is available for research and commercial use, with the source code on GitHub and the model on Hugging Face. The research aims to enhance Japan’s advantage in AI and contribute to next-generation research and business applications. The development of Fugaku-LLM is part of a joint research project involving Tokyo Institute of Technology, Tohoku University, Fujitsu, RIKEN, Nagoya University, CyberAgent, and Kotoba Technologies. The model’s transparency, safety, and performance make it a valuable asset for natural dialogue processing and other applications. The research outcomes are being shared for further development by other researchers and engineers. The project was supported by the Fugaku policy-supporting proposal.

Source link

Source link: https://agenparl.eu/2024/05/10/%E3%80%90tokyo-tech-release%E3%80%91release-of-fugaku-llm-a-large-language-model-trained-on-the-supercomputer-fugaku/