Measuring the CO2 footprint of an AI model

Training AI models, or running AI experiments, can consume a lot of power. But not always! Some are large and some are small. This week I've been using CodeCarbon, a tool for measuring the CO2 emissions of your code.

CodeCarbon tracks the amount of power that your computer's CPU/RAM/GPU/etc use during an experiment, to calculate a total of the power usage. It then performs an online lookup to find out how carbon-intensive your local electricity supplier is (since the CO2 impact of electricity generation varies throughout the day, throughout the year, and from place to place). From that, it calculates a total CO2 impact.

Using CodeCarbon in your Python script is easy:

from codecarbon import EmissionsTracker
tracker = EmissionsTracker(project_name='waffleogram')
# ...
tracker.start()
try:
    # The training loop goes here
finally:
    tracker.stop()

I've used this code to compare running the same experiment (simply training a small CNN) on 3 different machines I have available: my work laptop, my home server, and a dedicated GPU server at Naturalis. All these are evaluated on insect sound classification, running 2 epochs of EfficientNet training (dataset: InsectSet66). I'll quote the total electricity used for the training run, as calculated by CodeCarbon:

My laptop (no GPU, Intel i7 CPU):
About 3h30 to run 2 epochs.
0.048196 kWh of electricity used
My home server (no GPU, Intel pentium CPU):
About 18h to run 2 epochs (!).
0.247716 kWh of electricity used
Our Naturalis GPU server (A40 GPU [2 present, but only 1 used here], Intel Xeon CPU):
About 7 minutes to run 2 epochs.
0.045225 kWh of electricity used.

My home server is the least efficient method, primarily because its CPU is old and power-hungry.

The laptop and the GPU server apparently use a similar amount of energy for this task, despite many differences! The GPU server is much more power-hungry (e.g. the RAM takes 188W of power whereas my laptop RAM takes 6W) but it completes the task quickly.

The analysis that CodeCarbon gives you is incomplete. It's still useful! But there are a few extra factors that are worth thinking about, which a tool like this does not know about. Firstly, it doesn't know that my home computers were powered by our solar panels -- I ran these tests on a bright summer's day, when our home was generating excess energy, meaning that the true carbon footprint is effectively zero. Certainly much lower than electricity from the general Dutch grid. Secondly, it doesn't know whether you're using a machine that is already running for other reasons, or whether you bought/powered-up this machine specially for your experiment. The difference that makes is that in the latter case you should also count the carbon cost of running the base system.

I also tried running the same experiment on the GPU server, but swapping the simple CNN-based architecture for a slightly more complicated one running an adaptive feature extractor. Without changing anything else at all, this makes a big difference: the adaptive feature extractor makes the training task more complicated, and slower, to calculate -- it took 50 minutes (rather than 7) and its power usage was approximately 10 times higher.

So: it makes a big difference what machine-learning model you're training; it makes a big difference what machine you're running it on. Factors of 5 or 10 are really significant, especially when multiplied up to the scale of a whole research project. The important thing is to measure it. Hence tools like CodeCarbon.

See also this other recent blog from me: Is using LLMs bad for the environment?

Wed 31 July 2024 | IT | Permalink

mcld.co.uk

Other things on this site...