Researchers at the University of California Santa Cruz have developed a way to make large language models 50 times more energy efficient by using alternative math and custom hardware. In their paper titled “Scalable MatMul-free Language Modeling,” the authors describe how artificial intelligence’s energy consumption can be reduced by eliminating matrix multiplication and incorporating a custom field-programmable gate array (FPGA).
The energy demands of AI have raised concerns about environmental sustainability, with datacenters powering AI services contributing to increased CO2 emissions. The researchers’ approach could provide a 50x energy savings, with a prototype demonstrating impressive results by running a billion-parameter language model on custom FPGA hardware with significantly lower power consumption compared to using a GPU.
The researchers replaced traditional matrix multiplication with binary or ternary representations, reducing computational costs. This approach aligns with other efforts to simplify neural network architectures for energy efficiency. By using ternary weights and an “overlay” approach instead of self-attention, the researchers achieved comparable performance with reduced energy consumption.
The paper also explores the use of fused kernels in GPU implementations to accelerate training and reduce memory consumption. The researchers believe their work demonstrates how scalable, lightweight language models can address computational demands and energy use in real-world applications. Overall, their findings offer a promising solution to the energy challenges posed by large language models in AI.
Source link
Source link: https://www.theregister.com/AMP/2024/06/26/ai_model_fpga/
GIPHY App Key not set. Please check settings