Researchers cut down on AI’s carbon footprint with new optimization framework

Zeus automatically adapts the power usage of deep learning models to chase clean electricity sources throughout the day

Researchers at the University of Michigan have devised a system to slash the energy consumption and carbon footprint of AI systems, particularly those using state-of-the-art deep learning (DL) models. Called Zeus, the optimization framework learns about DL models as they are trained, pinpointing the best tradeoff between power use and performance to offer a “free win” in reducing energy use and carbon emissions.

“Deep-learning models consume as much as 1,287 MWh during training—enough to supply an average US household for 120 years,” the researchers write. With Zeus, that figure could be reduced by up to 78% in some AI applications without any new hardware or major performance hit during model training. The framework was used to demonstrate carbon-aware deep learning training at the Carbon Hack ’22 Hackathon, winning second place and a $25,000 prize. The submission, Carbon-Aware Zeus, was led by associate professor Mosharaf Chowdhury, doctoral student Jae-Won Chung, and Master’s students Zhenning Yang and Luoxi Meng.

As mainstream uses for deep learning models explode, ranging from popular creative applications to 3D understanding, natural language processing, and recommendation systems, so too do the costs associated with training these hefty models. Paramount among these costs is the growing carbon footprint of AI systems — a cost that researchers are still working to fully come to grips with.

“Existing works primarily focus on optimizing DL training for faster completion,” the researchers say, “often without considering the impact on energy and carbon efficiency.” 

DL models — like the pervasive Transformer architecture underpinning DALL-E image generators and GPT language learning models — require intense front-end training that burns through power on GPU clusters. It’s the technology’s reliance on GPUs that enables the team to make their energy optimizations.  GPUs allow users to set a power limit at any time using software. Lowering the power limit of a GPU lowers its energy use while making it run slightly slower until the setting is adjusted again. 

Carbon-Aware Zeus is able to tune this setting in real time, taking into account a number of factors impacting the model’s carbon footprint. At Carbon Hack ’22, the team demonstrated how this tuning could be used to take fuller advantage of green energy sources by accelerating or decelerating a model’s power consumption as the data center’s power sources change over time.

“When clean electricity is available, we try to make a lot of training progress quickly,” explained Yang. “When it’s not, we can slightly slow down the GPU and have it consume less electricity during that period.

Existing efforts to improve data center sustainability rely on moving training jobs to greener geographic locations or delaying model training entirely until clean energy is available. These options don’t help the plethora of cases where model training is time sensitive, however, or when the data used by a deep learning job prevents relocation due to its immense scale or associated regulations such as GDPR. These jobs have been left to run without many options for improving their carbon footprint.

“Our aim is to design and implement a solution that does not conflict with these realistic constraints, while still reducing the carbon footprint of deep learning model training,” the researchers write.

Zeus is the first such framework designed to work with a variety of deep learning workloads and types of GPUs to reduce energy consumption and carbon emissions. In addition to tuning the GPU power limit, Zeus automatically searches for the best training batch size to strike the perfect balance between sustainability and performance. Future work on the project will include integration with KubeFlow, the largest open source MLOps platform.

Zeus will be presented at NSDI 2023 in the paper “Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training,” by co-lead authors Jie You and Jae-Won Chung as well as Prof. Mosharaf Chowdhury. The project “Carbon-Aware DNN Training with Zeus” was presented at Carbon Hack ’22, developed by Zhenning Yang, Luoxi Meng, Jae-Won Chung, and Mosharaf Chowdhury.