Other things on this site...

Evolutionary sound
Listen to Flat Four Internet Radio
Learn about
The Molecules of HIV
Make Oddmusic!
Make oddmusic!

Real-time audio software and multicore processing

We've been thinking about how best to incorporate multicore processing into SuperCollider's audio engine. A bit of background: the trend in computing is that although computers used to have one single CPU to do all the thinking, the latest computers tend to have multiple CPUs (each with access to shared memory). Furthermore, it's now even possible to make use of the number-crunching power lying unused on many graphics chips - although that doesn't use the same shared memory so it's a slightly different situation.

This all means that most software, which runs on an "old-fashioned" single-core model, might not be using the full power available. There are libraries available to help programmers easily move into this multicore world, such as the well-established and very easy-to-use OpenMP.

How does OpenMP work? It's very much like a traditional threading model, where if you want multiple things to happen at once, you launch as many separate "threads" as you need. OpenMP simplifies this by automatically creating the threads as needed (e.g. it can automatically parallellise the separate iterations of a for-loop), and also by automatically distributing the threads over the CPUs. It's often called a "fork-and-join" model: when the program reaches a block of code which could be parallellised, it divides itself up into many parallel threads - and then when the parallel bit is over, the program logic all joins back to the single thread that started it all.

With real-time audio processing there's a complication. We want the software to take some chunks of input audio (if used), do some processing, and create some chunks of output audio, all within a very tight timeframe. This has a few implications:

  • Performing a fork-and-join procedure at every audio "block", typically around a hundred times a second, is expensive in computer effort. I know because I tried it, and a highly efficient sine-wave generator suddenly became extremely heavy...
  • Multicore programming libraries often don't guarantee how fast they will do their job. Plus, there's an overhead involved in dividing tasks up. Plus, there may be added overhead because of the very nature of parallel processing (e.g. transferring data from main memory to GPU memory). All of which means that certain interesting-looking APIs (e.g. GPGPU systems such as Nvidia's CUDA; Apple's Grand Central) are unlikely to be particularly helpful for realtime audio.
  • More prosaically, my experiments find that CoreAudio (the Mac audio infrastructure) and OpenMP don't play well together, which is a shame - makes it harder for anyone trying to parallellise audio software on Mac. Luckily I didn't have this problem on Linux.

So the question remains. Do we want to make our realtime audio apps multicore, and if so, how? You don't always improve things by spreading them over more cores, because of the inherent overheads I mentioned. However, on an 8-core system it certainly seems a shame to be limited to a maximum of 1/8 of the computer's thinking power.

SuperCollider has a nice aspect which helps here. The audio engine ("scsynth") is a separate application, and you can have multiple instances. So you could quite easily launch multiple audio engines, and have each one of them handle different parts of your audio scene. Great - nice and easy - although with some limitations. The different audio engine instances wouldn't be able to share memory, so sharing data between them is a bit of a pain. Also, it seems that you can't really guarantee which CPU core is used to run which process (the "affinity") - typically they would tend to be distributed over the cores, but it'd be nicer if we could guarantee that.

So, an approach to within-process parallellisation? Maybe we need to launch a thread for each core, and have these threads do a kind of busy-waiting until the audio callback wants some work to be done. Busy-waiting would be hard to get right though, compromising between responsiveness and CPU cycles wasted on the active waiting.

| IT | Permalink