Photo (c) Jan Trutzschler von Falkenstein; CC-BY-NCPhoto (c) Serge BelongiePhoto (c) Rain RabbitPhoto (c) Samuel Craven

Research

I am an academic computer scientist, conducting research into automatic analysis of bird sounds using machine learning.
 —> Click here for more about my research.

Music


Other

Older things: Evolutionary sound · MCLD software · Sponsored haircut · StepMania · Oddmusic · Knots · Tetrastar fractal generator · Cipher cracking · CV · Emancipation fanzine

Blog

Training AI models, or running AI experiments, can consume a lot of power. But not always! Some are large and some are small. This week I've been using CodeCarbon, a tool for measuring the CO2 emissions of your code.

CodeCarbon tracks the amount of power that your computer's CPU/RAM/GPU/etc use during an experiment, to calculate a total of the power usage. It then performs an online lookup to find out how carbon-intensive your local electricity supplier is (since the CO2 impact of electricity generation varies throughout the day, throughout the year, and from place to place). From that, it calculates a total CO2 impact.

Using CodeCarbon in your Python script is easy:

from codecarbon import EmissionsTracker
tracker = EmissionsTracker(project_name='waffleogram')
# ...
tracker.start()
try:
    # The training loop goes here
finally:
    tracker.stop()

I've used this code to compare running the same experiment (simply training a small CNN) on 3 different machines I have available: my work laptop, my home server, and a dedicated GPU server at Naturalis. All these are evaluated on insect sound classification, running 2 epochs of EfficientNet training (dataset: InsectSet66). I'll quote the total electricity used for the training run, as calculated by CodeCarbon:

  1. My laptop (no GPU, Intel i7 CPU):
    About 3h30 to run 2 epochs.
    0.048196 kWh of electricity used
  2. My home server (no GPU, Intel pentium CPU):
    About 18h to run 2 epochs (!).
    0.247716 kWh of electricity used
  3. Our Naturalis GPU server (A40 GPU [2 present, but only 1 used here], Intel Xeon CPU):
    About 7 minutes to run 2 epochs.
    0.045225 kWh of electricity used.

My home server is the least efficient method, primarily because its CPU is old and power-hungry.

The laptop and the GPU server apparently use a similar amount of energy for this task, despite many differences! The GPU server is much more power-hungry (e.g. the RAM takes 188W of power whereas my laptop RAM takes 6W) but it completes the task quickly.

The analysis that CodeCarbon gives you is incomplete. It's still useful! But there are a few extra factors that are worth thinking about, which a tool like this does not know about. Firstly, it doesn't know that my home computers were powered by our solar panels -- I ran these tests on a bright summer's day, when our home was generating excess energy, meaning that the true carbon footprint is effectively zero. Certainly much lower than electricity from the general Dutch grid. Secondly, it doesn't know whether you're using a machine that is already running for other reasons, or whether you bought/powered-up this machine specially for your experiment. The difference that makes is that in the latter case you should also count the carbon cost of running the base system.

I also tried running the same experiment on the GPU server, but swapping the simple CNN-based architecture for a slightly more complicated one running an adaptive feature extractor. Without changing anything else at all, this makes a big difference: the adaptive feature extractor makes the training task more complicated, and slower, to calculate -- it took 50 minutes (rather than 7) and its power usage was approximately 10 times higher.

So: it makes a big difference what machine-learning model you're training; it makes a big difference what machine you're running it on. Factors of 5 or 10 are really significant, especially when multiplied up to the scale of a whole research project. The important thing is to measure it. Hence tools like CodeCarbon.

See also this other recent blog from me: Is using LLMs bad for the environment?

IT · Wed 31 July 2024

The idea that rail travel is "difficult" is hilarious when you compare it with eyes open against plane travel. I avoid flying if I can, and I've had fabulous trips to many countries by train and boat. Some friends say "Sounds great, but I couldn't manage it from where I live. And with the kids too!" - I think they've blinded themselves to how much flying is a pain in the bum.

I had to go by plane recently, for urgent reasons. It's complex, weird and alienating, and there are lots of strange processes which are definitely more complex than a train timetable.

I was really surprised to notice how much walking is involved. Recently, we were at Munich train station. It's a big train station, and we said to ourselves, "Gosh, it takes about 10 minutes to get from our arrival to our other platform" -- but then while taking a flight recently I had to walk inside the airport for twenty minutes. At BOTH ends. Twenty minutes is not an exaggeration: I timed it. (I used the travelators. I had no baggage to collect. So this is a very optimistic version of how long it takes.) So, forty minutes of walking added on to the journey time, which is hidden away, pretended not to exist. It's hidden inside the "airport" ritual so we don't think of it as a separate task.

I notice the weirdness of plane travel in the phenomenon of these hard, wheely, rectangular suitcases that are now ubiquitous. They're useless for almost everything - walking around a city, for example - and they're much heavier and less adaptable than soft-cased luggage. But they're specifically tailored for airline cabin baggage size restrictions, and designed to be robust against throwing around on a luggage conveyor, and so lots of people buy this baggage, condemning themselves to be awkward in all situations except the airport.

In my normal travels I don't need to worry exactly what size or shape my bag is - I just bring as much or as little as I need.

I notice more weirdness when I go through airport security, of course. Unless you're a frequent flyer there's no way to keep track of the varying security requirements: on my recent flight "shoes off" and "laptops out" were not required, but "belt off" was -- and I've no idea why. In one direction, I had to show them my tootpaste tube. In the other direction I didn't. The absolutely yucky experience of those body-scanners is another horrid part of it.

The "overbooking" phenomenon is yet another bizarre aspect, which I think regular users take for granted... Sitting in the boarding lounge for the flight I've booked, with a seat number printed right there on my boarding card, I hear regular announcements "This flight is overbooked - if you're willing to take the next flight, for a cash reward, contact the desk." Clearly it's dumb for the airlines to deliberately overbook their flights: they do it to squeeze the maximum profit out, inconveniencing us with a mild dishonesty while pretending they aren't doing. Does someone else also have a boarding card with my seat number written on it? Wouldn't that be weird? What if no-one volunteers?

I don't need to say much about the strange upselling tricks they use during the sales process, pretty much a cliche. One airline wanted to trick me into paying extra to choose a seat; one wanted to trick me into paying extra for baggage. These "dark patterns" are well-known to people who do this, but gee it makes the whole thing complex to work out.

How is it that people put themselves and their kids through this weird business, and why don't they say "I'd like to fly for a holiday, but I couldn't manage it with the kids"? It's what you're used to, I'm sure. There's also a lot of commercial interest in it, as you can tell if you walk around the airport terminal surrounded by strange perfume adverts, trying to find a drink that is NOT a beer from the official sponsor's brand.

It's funny to compare this against my experience of train travel. There are some definite tricky parts e.g. how much of a gap to leave to make sure you don't miss your connection? (If you're changing trains in Germany, leave 30 minutes gap in case of the almost-inevitable ten-minute delay...) But the process is so free, and so NORMAL. You don't need weird luggage, or bizarre procedures. It's almost the same effort to get on a train across Europe, as to get on a bus to go down into town!

eco · Sun 14 July 2024

Only recently, we discovered that you can make tasty flatbreads, quickly and surprisingly easily. This recipe is great for any time you realise you haven't got flatbreads/wraps/pittas/roti/tortilla in the house, but thought you did... So, this is the "emergency flatbread" recipe! Keep it simple.

These amounts make ONE flatbread, e.g. to accompany one person's meal. You can multiply the numbers easily to make more. It takes less than 5 minutes of active work, once you've got the hang of it, and it can fit in with whatever else you're cooking.

Put the flour in a medium-sized bowl, and add the water and salt. Mix these together with a fork (or the fingers of one hand) until it gets difficult (this only takes about 10 seconds).

Then add the oil (not too much!), and use one hand to mix this well, pushing and kneading it. You don't need to "knead" it like a risen bread, you just need to get it to form a single smooth lump with no dry bits or wet bits. Since flours are often different, you may need to add a bit more flour or water. I don't.

Put a lid on top of the bowl (e.g. a plate) and leave this to sit for a short while (20 minutes, or less), to let the flour and water get properly incorporated. This should help to produce a flexible flatbread. During this time, you could prepare your toppings.

Heat a flat frying pan over a hot oven hob. You want it to be hot, so the bread will cook quickly without becoming crisp. Don't add any oil.

Divide the dough into balls, one ball per flatbread.

Using your hands and/or a rolling pin, roll each ball out to a flat disc, as thin as you can. Mine are about 25cm across. You do need them to be quite thin, because if they're too thick they'll be a bit uncooked in the middle, and inflexible. They don't need to be perfectly circular, as long as there are no thick bits.

Now cook them in the hot dry pan for about 30 seconds each side. Don't press the bread down (else it'll stick) - but you can shake the pan a little to help loosen it. While the first side is cooking, optionally you can flavour the flatbread by sprinkling a teaspoon of za'atar on top and then a teaspoon of olive oil. That will cook nicely when you flip the bread over. Again, not too much fat/oil, do not add enough to make the pan wet.

Don't cook the flatbread too long or they'll become crisp and inflexible.

Serve them pretty quickly so they're nice and warm!

EXTRA RECIPE NOTES:

recipes · Sat 22 June 2024

We've been asked the question: "Is using LLMs bad for the environment?" It's an important question, since it's a new technology that many are trying to find uses for -- yet it clearly uses a nontrivial amount of energy to run LLMs, which translates into impacts such as CO2 emissions. As machine learning researchers, I think we have a duty to be able to give a decent answer.

Here's a beginning, from my Tilburg colleague Nikos: "One way to answer the question whether using LLMs is bad for the environment is to take a comparative approach. There are tasks that LLMs can do that other technologies can do (e.g., search), and there we can compare the resource intensity of the technologies. There are other capabilities that are unique to the LLM technology, and for those cases, the best available reference is how much resources humans would use to perform the same task." -- Good start. What to add?

LLMs are a general-purpose technology which, when used for a specific task such as web search, will always be much less efficient, simply because "classic" web search can be heavily optimised for the single task.

Some early estimates of the carbon footprint of LLMs (Strubell et al 2019) were too pessimistic and created some very bad headlines. Improved estimates are more accurate. Here's a recent research paper that tries to make accurate and precise estimates: Faiz et al (2024)

The biggest and most impressive LLMs are undoubtedly highly carbon-intensive as part of their drive to outperform their competitors. The difference can certainly be a factor of 100. However, there is also a push to create efficient "small" LLMs, even ones that could run on your own computer.

So, the exact footprint of an LLM depends on which LLM it is, but also on other factors such as how clean is the energy used for the data centre. These factors can easily change the footprint by a factor of 10 or 100 - they are not to be ignored. (Just the same way as, when deciding to take a flight or a train, the difference in carbon footprint can be x10 to x100. These "multipliers" are important to take care of.)

It is important to note that the developers of the most well-known LLMs (GPT4) refuse to publish the information that would give a clear answer to exactly how bad they are for the environment. -- Thus, I would recommend only using LLMs that make clear numerical statements about their carbon footprint. We must not incentivise bad practice.

IT · Fri 31 May 2024

We are pleased to announce a special session on "Signal Analysis for Biodiversity" to be held at the EUSIPCO 2024 conference: August 26-30 2024, Lyon, France. Please consider submitting a paper.

Special session description

This Special Session will bring together practitioners interested in use of signal processing and machine learning methods to monitor biodiversity and the behavior and interaction of living organisms in an environment. For example, a typical contribution would be about using audio or video signal processing to detect animals in a forest or a farming site.

The biodiversity crisis continues to grow and becomes more visible every year. Although much monitoring is already conducted, there is a massive information gap due to the scale of the issue: for example there is currently ongoing discussion about whether the recently-identified “insect apocalypse” applies across all species and all parts of the world. Resolving these issues is of vital importance since insects and many other animals are, among other things, crucial to society as crop pollinators. On the positive side new information streams for biodiversity are becoming available, from audio and video recorders, satellite and drone imaging, and many other environmental sensors. Signal processing and statistical optimisation have a key role to play, since they are needed to turn these raw data streams into evidence.

Scientific development of such methods requires attention to the specific properties of the signals as well as the inferences required: for example, a single audio signal may embed evidence of multiple different species, and their interactions, as well as weather and human factors.

Organising committee

Instructions for authors

Deadlines, dates, and author instructions are listed on the main call for papers for EUSIPCO 2024. Papers should be submitted in the EUSIPCO main submission system, ensuring to select the correct special session in the drop-down menu.

science · Tue 30 January 2024

This simple millet and pulse dish is adapted from the recipe "chana dal pulav" by Vijaya Venkatesh. It's a delighfully fragrant fluffy indian dish, which can be a simple one-pot meal or can be an accompaniment for a nice fragrant curry (e.g. a korma, or kofte with sauce).

It's also handy if you're cooking for someone who can't eat onion/garlic.

I've written up my own version of this recipe in order to make clearer some of the steps involved. I also used dessicated coconut in place of coconut cream and liked the effect. I also used kodo millet - you can use any millet (probably!) but I like this hulled kodo millet with its white colour and very pure taste.

Serves 2. Takes 25 minutes, plus an hour of pre-soaking time.

Soak the the millet in plenty of cold water, and the chana dal in hot water (e.g. from a boiled kettle), both for an hour or more. Drain them (separately). Try to get the chana dal very dried off, to make the next stage easier - I did this by draining them in a sieve, then on some kitchen paper.

In a medium saucepan with a lid, warm up the oil, and add all the ingredients from jeera to chilli powder one after another. Temper them - i.e. fry them for a minute or so until they become fragrant.

Add the drained dal and sauté for a minute on low heat, stirring occasionally to make sure they don't stick.

Add millet, salt, coconut and water - do measure the water, you need to get it right. Give a good mix.

Put the lid on, bring it almost to boiling, then turn the heat right down and let it cook gently - undisturbed - for approx 20 minutes. Then stir it with a spoon to fluff it up, and serve immediately.

recipes · Sun 21 January 2024

Just in time for Dry January - we made it to 250! Two hundred and fifty alcohol-free beers, from 20 different countries, ranked and rated, in our delightfully geeky spreadsheet. To celebrate it, in this thread below are some of our absolute favourites:

Photos of lovely beer cans

When we started this nonsense back in "Dry January" 2019 we had no idea that Europe was going to produce such a burst of really innovative alcohol-free brewing. The only credible beer I was aware of back then was Nanny State. Our spreadsheet officially confirms that there are now at least 147 which are worth your time, scoring 7 or higher! Here's the spreadsheet or as PDF version

food · Sun 07 January 2024

Hey you! Need lovely food? Me too! So here are our top hit new recipes we discovered in 2023. They're all vegetarian and more than half of them happen to be vegan too. Pick a recipe, try it out:



Most of the recipes I listed here are also vegan. It's getting easier and easier to cook ace vegan food - I used to think it was verging on impossible.

... And here are the top hits from previous years:

Food · Sun 10 December 2023

Other recent posts: