Nuno Trocado

Make Music with SuperCollider and Lisp (Part I)

2021-01-11

SuperCollider and Common Lisp

SuperCollider is a platform for audio synthesis and algorithmic composition. It has three major components:

scsynth: the audio server, which produces sound according to the commands that we send it;
sclang: a programming language, including abstractions to control the server and for common tasks in the domain of music;
scide: an environment for editing sclang programs, accessing documentation and using graphical tools.

The client/server architecture is a main feature of SuperCollider. Among other advantages, it allows us to control the server using a client and programming language other than sclang. This is what we'll explore in this tutorial.

Of the three major components provided by SuperCollider, we'll keep the audio server (scsynth), but we'll replace the other two with our own tools. Instead of sclang, we'll use Common Lisp together with cl-collider, a Lisp library that helps us communicate with the audio server, using the Open Sound Control (OSC) protocol. The development environment is more of a personal preference, but see below for recommendations.

Prerequisites

It is assumed that you know at least some basic Common Lisp. Fortunately, there's a range of resources available to learn Lisp if you're just starting out. Musicans experienced with Lisp-based musical systems enjoy a headstart.

It is also assumed that you have some notions of how digital audio works.

You don't need to know anything about SuperCollider. Later on you may find it's useful to grasp some sclang basics. This will allow you to better understand the example code that accompanies the documentation, follow tutorials, check out other people's projects, etc.

I find it very instructive to take SuperCollider resources, like books and online tutorials, and translate the demonstration code into Common Lisp. I've been keeping these undertakings in a github repository of cl-collider examples.

Setup

Download and install the latest SuperCollider release for your system.

In the Lisp side of things, you'll need an implementation (I recommend SBCL), an editor (a classic choice is Emacs with either Slime or Sly, along with other packages to help write Lisp code), and Quicklisp. If you don't have all this up and running already, just install Portacle and be done with it—Portacle has everything configured and ready to go.

Now load the cl-collider library:

(ql:quickload "cl-collider")
(in-package :sc-user)

Check if the variables *sc-synth-program*, *sc-plugin-paths*, and *sc-synthdefs-path* are set to the correct paths for your SuperCollider installation, and setf them if they aren't.

According to a SuperCollider naming tradition, we'll use *s* to store an object representing the audio server, which we want to run on the same machine as the client (localhost), on the arbitrarily selected port number 4444:

(setf *s* (make-external-server "localhost" :port 4444))

Finally, let's boot the server:

(server-boot *s*)

This runs scsynth and keeps it in the background, ready to accept OSC messages. It should print some information about your audio interface and drivers, along with other configuration options, and if all goes well we'll be ready to start.

Making noise

Try:

(play (white-noise.ar 0.1))

The way SuperCollider works is by connecting unit generators, also known as UGens. In a simplified description, UGens produce signals, which can be distributed to other UGens, and finally sent over to the audio interface's outputs, so we can hear the result. WhiteNoise is one such generator. It produces white noise: a random signal having equal intensity at all frequencies. In our code, its name appears translated to the naming conventions in Common Lisp: white-noise.

What about the .ar part? Each UGen can operate at different rates. .ar stands for audio rate, which means that values are calculated often enough for the generation and processing of an audio signal. Most UGens have two versions: the audio rate one, that ends with .ar, and the control rate one, that ends with .kr. Control rate is useful for signals that can be updated less frequently, thus needing less computer processing power. Some UGens also provide a .ir version, which stands for initialization rate, meaning that the output value is calculated only once, when the UGen is created.

On the SuperCollider documentation pages we can browse the available UGens and read about their description, inputs, examples, etc. Also on the Lisp side there are docstrings available for each UGen with a simple description.

white-noise.ar has two optional arguments: mul and add. They multiply and add to the output signal, respectively. Many UGens have these two arguments, which are used for scaling the output. In the example, we multiply the signal by 0.1 to reduce its amplitude and avoid blasting noise through the loudspeakers. The same thing could be achieved, in an arguably clearer way (especially when dealing with UGens with many arguments), by:

(play (* 0.1 (white-noise.ar)))

Finally, we use play to prepare and send the appropriate messages to the audio server. play returns a node object:

#<CL-COLLIDER::NODE :server #<CL-COLLIDER::EXTERNAL-SERVER localhost-127.0.0.1:4444> :id 1000 :name "temp-synth">

SuperCollider structures synthesis processes in a directed graph, in which we just created a new node. Note that an unique number id was automatically created—1000—as well as the name "temp-synth".

We can call (server-query-all-nodes) to get a tree representation of all the active nodes.

To remove the synthesis process from the server and stop the sound we call free with either the node object or its id. It's often practical to store the node in a variable when using play, which makes it easier to free it later. Like this:

(defparameter *white-noise* (play (white-noise.ar 0.1)))

;;; Later:
(free *white-noise*)

Another way of stopping things is with the appropriately named (stop). This immediately frees all the nodes in the default group (nodes can be organized into groups, but this is a more advanced topic).

Making pitches

Let's use a sinusoidal oscillator to make a pitched sound:

(play (sin-osc.ar 220 0.0 0.1))

The arguments to sin-osc.ar are the frequency (220), the phase (0.0), and like before mul (0.1) and add (not used, the default is 0). They are all optional arguments.

An UGen can be routed to modulate the input of another UGen:

(play (sin-osc.ar (sin-osc.kr 1 0.0 110 330) 0.0 0.1))

The output of sin-osc oscillates between -1.0 and +1.0. We take the control rate version of it (ends in .kr), and multiply the output by 110, so the limits become -110 and +110. Finally, we add 330 to get +220 and +440. This is fed into another sin-osc, now in audio rate (ends in .ar), to produce an audible signal that oscillates between 220 Hz and 440 Hz.

If we already know what the minimum and maximum values will be, we can avoid the arithmetic by using range:

(play (sin-osc.ar (range (sin-osc.kr 1) 220 440) 0.0 0.1))

It's possible to create complex textures just by nesting modulators. The following example only uses instances of sin-osc and lf-pulse, a pulse oscillator.

(play (sin-osc.ar (+ (lf-pulse.kr (lf-pulse.kr 0.5 0.5 0.8 1.5 0.5)
                                  0.0
                                  (range (sin-osc.kr 0.15) 0.10 0.45)
                                  (range (sin-osc.kr 0.2) 330 660))
                     (lf-pulse.kr (lf-pulse.kr 0.25 0.0 0.2 2.5 1)
                                  0.0
                                  (range (sin-osc.kr 0.25) 0.15 0.35)
                                  (range (sin-osc.kr 0.1) 110 770))))
      :gain 0.2)

This also shows how we can add two control signals. Try playing with the values—little changes yield a lot of variation. Also note that we use yet another way of adjusting the output volume, with play's keyword gain.

Multichannel audio and expansion

So far everything we've done is in mono and comes out on the left speaker only (assuming a stereo monitoring system). SuperCollider offers a number of UGens to deal with multichannel audio. One of the simplest ones is pan2, a two channel equal power panner.

(play (pan2.ar (ringz.ar (white-noise.ar 0.1)
                         (range (lf-pulse.kr 2 0.0 0.1)
                                300 1300)
                         0.5)
               (lf-tri.kr 0.05 3.0)
               0.1))

In this example, white noise is passed through a resonant filter (ringz), whose center frequency is modulated with a low frequency pulse oscillator, alternating between 300 Hz and 1300 Hz. The resulting signal is fed into pan2, as its first argument.

The second argument is the pan position, which ranges between -1 (left) and +1 (right). A triangular oscillator (lf-tri) with a low frequency of 0.05 Hz provides the constantly moving positions—its output also ranges between -1 and +1. A phase offset is also provided, to make the oscillator start at the lowest value. Phase offset is defined as a value ranging from 0 to 4. The correct value for the intended effect is 3.0. This way, we hear the sound coming from the left speaker first, then slowly moving to the right, and back again indefinitely.

Finally, pan2's third argument (0.1) is the final output level—yet another way of setting the volume.

An important, powerful and sometimes confusing concept is multichannel expansion. Simply put, when a list is given as an input to a UGen, it causes multiple copies of that UGen to be created, each using a different value from the input list. Let's take the first example in this section, but with two frequencies for the oscillator, 660 and 770, instead of a single one:

(play (sin-osc.ar '(660 770) 0.0 0.1))

Two tones come out, one on the left and the other one on the right channel.

With a list of more than two values, the resulting UGens will be sequentially assigned to the next output channels on the audio device. If there are only two available, the expansion won't be heard beyond this limit. We can, however, reduce the expansion back to a single channel using mix, like in the following eight-note chord:

(play (mix (sin-osc.ar (mapcar #'midicps
                               '(60 64 67 71 78 81 85 87.5))
                       0.0 0.05)))

Here, instead of entering the frequency values directly, we use midicps to convert midi note values (60 is C4) to frequencies.

A practical way of spreading a multichannel signal across the stereo field is with splay:

(play (splay.ar (loop :for i :from 1 :upto 3
                      :collect (* (sin-osc.ar (* 220 i 1.6) 0)
                                  (lf-pulse.kr (+ (/ i 3) (sin i))
                                               3 0.1 0.2)))))

The first tone, with the lowest frequency, plays on the left side, the second on the center, and the third and highest-pitched one on the right side.

Defining and controlling synths

So far we've been using play to make sounds. Another more involved and flexible way of working is by first defining audio processes with defsynth. Then they'll silently live on the server until instantiated by synth.

(defsynth tone ((freq 440) (amp 0.2))
  (out.ar 0 (saw.ar freq amp)))

In the example above, we define a synth named tone, and give it two parameters, amp and freq, each with a default value (respectively 1 and 440).

Next comes a body of expressions. If we want to hear the sounds that our synth will produce, and unlike previously, now we have to use a new UGen, out, to send the signal to an output bus. Buses will dealt with later, but for now it suffices to note that SuperCollider reserves the bus indexes starting with 0 to the hardware output channels. Our left channel is 0, and the right channel is 1. The first argument to out.ar is the index of the bus that we want to write to—in this case it's 0, the left channel.

Finally the signal chain has only one UGen, saw.ar, a sawtooth wave generator, called with the parameters that we defined before.

The above code, however, won't play any sound. Upon evaluating the synth definition, it will be constructed on the server, and silently live there until instantiated. To do so, we need:

(synth 'tone)

When we have enough:

(stop)

We can also store the resulting node on a variable, *tone*, and free it later. Also, we can provide values for the synth's parameters, like so:

(defparameter *tone*
  (synth 'tone
         :freq (midicps 42)
         :amp 0.3))

;;; When we have enough:
(free *tone*)

Beyond freeing, this also allows us to change the synth's parameters while it's playing, through ctrl:

(ctrl *tone* :freq (midicps 48))

Note that what play was doing behind the scenes was essentially defining a temporary synth and running it.

If the signal chain consists of multiple channels, the bus index provided to out.ar is the first one, and the others are mapped to successive indexes. The next version of our synth plays slightly detuned sawtooth waves in the left and right channels.

(defsynth tone ((freq 440) (amp 0.2))
  (out.ar 0 (saw.ar (let ((detune (* freq 0.01)))
                      (list (- freq detune) (+ freq detune)))
                    (/ amp 2))))

(let ((node (synth 'tone)))
  (sleep 2)
  (free node))

In the above code, we used sleep and then free to stop the synth automatically after two seconds. But there must be a better way, and of course there is one—/envelopes/.

(defsynth tone ((freq 440) (amp 0.2))
  (out.ar 0 (* (saw.ar (let ((detune (* freq 0.01)))
                         (list (- freq detune) (+ freq detune))))
               (env-gen.kr (perc 0.1 1.8)
                           :level-scale amp
                           :act :free))))

(synth 'tone)

In the example above, a sawtooth generator (saw) produces the audible sound, which is multiplied by the envelope generator env-gen, in order to dynamically control its amplitude. The first argument of env-gen defines the envelope itself. For convenience, some frequently used envelope shapes are already predefined, among them perc (for "percussive"), here called with an attack time of 0.1 seconds and a release time of 1.8 seconds.

Since we have an amplitude envelope, it's seems appropriate that amp is moved to its keyword argument level-scale.

The final keyword argument, act, controls what happens when the envelope is finished playing. In this case, it will free the synth automatically. If we hadn't done this, and even though the synth is now silent, because the envelope ran its course completely, the corresponding node would still be kept on the server. But we don't want to waste the server's resources, and therefore it's a good practice to free what we don't plan to keep using.

Now let's look at a more involved example:

(defsynth tone ((freq 440) (amp 0.2) (gate 1))
  (out.ar 0
          (* (rlpf.ar (+ (saw.ar (let ((detune (* freq 0.01)))
                                   (list (- freq detune)
                                         (+ freq detune))))
                         (* (brown-noise.ar)
                            (env-gen.kr (asr 1.5 0.175 0)
                                        :gate gate)))
                      (env-gen.kr (env '(80 2400 200 2400)
                                       '(0.05 0.2 5))))
             (env-gen.kr (adsr 0.05 0.4 0.4 1.2)
                         :gate gate
                         :level-scale amp
                         :act :free))))

(defparameter *tone*
  (synth 'tone
         :freq (midicps 42)))

This shows other envelope recipes. Instead of perc, the volume envelope is now defined with adsr, which follows the standard shape of attack (0.05), decay (0.4), sustain level (0.4), and release (1.2). Since we now have a sustain segment, we also need to tell the envelope when to leave the sustain segment and proceed to the release one. We use gate for that. When gate is 1 (or any value above 0) the envelope starts, and it is held open until gate is set to 0, moving the envelope to the release portion. Also, gate is now a also a synth parameter, with 1 as a default value, so the envelope triggers as soon as the synth is instantiated, but it's possible to later instruct the running synth to change this value to 0.

Our synth now also has a resonant low pass filter (rlpf), whose frequency is modulated by an envelope. We use env to define the envelope. env is the basic specification for envelopes. First, we give it a list of levels, then a list of times (optionally also a list of curves, not used in the example). So the envelope starts at level 80, takes 0.05 seconds to increase to level 2400, then takes 0.2 seconds to move down to level 200, and finally takes 5 seconds to move back up to 2400, holding the last value indefinitely.

There's also a new sound source: a brown noise generator. The amount of noise that gets mixed in with the sawtooth oscillator is controlled with an asr envelope, that has three segments: attack, sustain, and release.

To trigger the release portion of the envelopes, and ultimately stop the sound, we change the gate to 0:

(ctrl *tone* :gate 0)

The next example uses our synth and its gated envelopes together with sleep as a crude way of sequencing notes, in this case a kind of spectral arpeggio. To avoid blocking the main thread while this is playing, we move the synth logic to a function and create a new thread to run it.

(defun rising-arp (&optional (start-freq 80) (inharmonicity 0.425))
  (loop :for i :from 1 :upto 50
        :for mult := (if (zerop (mod i 3))
                         i
                         (/ i (- 1 inharmonicity)))
        :collect (synth 'tone
                        :freq (* mult start-freq inharmonicity)
                        :amp (* 0.007 i))
          :into nodes
        :do (sleep (alexandria:random-elt '(1/6 1/7 1/8 1/9 1/10)))
        :finally (mapc (lambda (node) (ctrl node :gate 0)) nodes)))

(bt:make-thread
 #'rising-arp
 :name "rising-arp")

Proxy

Yet another way of creating, starting and controlling audio processes is with proxy. Unlike play, where if we evaluate the same code again we create a new node on the server, redefining a proxy replaces (crossfades) the currently running node with a new one. This makes it especially useful for experimenting and building up our sounds, and also for live coding.

(proxy :fm-hits
       (let ((trig (impulse.kr .5)))
         (apply #'freeverb2.ar
                (splay.ar
                 (loop :repeat 10
                       :collect (* (pm-osc.ar (+ 100 (random 1600))
                                              (+ 100 (random 700))
                                              (random 10)
                                              0 0.3)
                                   (env-gen.kr (perc 0.01 2)
                                               :gate trig))))))
       :fade 0.1)

When we reevaluate the proxy, new random values are calculated for the synth's parameters and the timbre changes in the time given in :fade (that's what I'm doing in the audio recording).

Up next

We've already covered a good deal of introductory information to get going with cl-collider. But many topics are left. On the next installments we'll see: buses and routing, sampling and audio manipulation, scheduling and more sophisticated ways of sequencing events, setting up OSC responders for bidirectional communication between client and server and for interfacing with external devices, some useful Emacs configuration options, and more.