what happens when computer

There was a recent blog post by Eevee called Music Theory For Nerds, which is an excellent whirlwind introduction to a vast swath of music theory. It's very good, and you should go read it.
When I first read it, it reminded me that one topic I've wanted to write about on this blog was microtonal music. As it turns out, writing about the motivation and theory behind microtonal music involves delving into some of the more esoteric foundations of Western music, to the degree that this post doesn't even get to microtonal tunings. Eevee at one point remarks, on the topic of our twelve-note octave:
I don't know why twelve in particular has this effect, or if other roots do as well, but it's probably why Western music settled on twelve.
This blog post is a post about how we settled on twelve.

Consonance and Dissonance

It helps us to think of sounds as waves, which correspond to the vibrations sound-carrying materials like air. For demonstration here, I'll be using basic sine waves, which sound like this:
We can visualize a sine wave as looking more or less like this: I drew this in Inkscape and not a graphing tool, so it's not totally accurate, but it doesn't need to be a perfect sine wave to be an okay teaching tool.
When we talk about the frequency of this wave, we're talking about how often the peaks in that wave show up. (The frequency is related to the period of the wave, which measures the distance between peaks in the waveform: the bigger the period, the less often peaks will show up.) Frequency is measured in hertz (abbreviated as Hz), which in this context just means "peaks of the wave per second". It's also my least favorite kind of donut.
In reality, notes played by a physical instrument or produced by a human voice are a lot more complex than just this simple sine wave: they are often made up of several simultaneous waves combined together! However, even with very complicated musical sounds, we can still pick out the pitch of the sound, which is the frequency of the note that is perceptually dominant. The pitch is heavily related to the fundamental frequency of a sound, but the two aren't identical: the pitch of a sound is a subjective perceptual property, which may not be identical to the fundamental frequency due to complexities in the sounds or the way we percieve them. Consequently, for our purposes here, we can treat every note as if it's associated with a single frequency, by which I will mean note's pitch. I'll often say "the frequency of the note" in this post when I in fact mean "the frequency of the pitch of the note".
Finally, let's to delve into perception a little bit here: this whole section is a very much handwavey, but bear with me. When notes that have the same pitch are played simultaneously, they are perceived by human being as being "the same note". Additionally, when notes whose pitches are simple ratios are played simultaneously, are perceived as somehow "pleasing" or "complimentary" to each other, while notes whose frequencies are related by more complicated ratios are perceived as "less pleasing".
For example, this will play two frequencies in sequence and then together, one at 300 Hz, and one at 600 Hz, which are related by a simple 2:1 ratio. The two notes will sound appropriate to one another:
In comparison, a less simple ratio—here, 300 Hz and 573 Hz, related by a 100:191 ratio—will sound somewhat less pleasant when played together:
In music and music theory, notes which "go together" are said to be consonant and notes which "don't go together" are called dissonant. When talking about two notes, we can refer to their difference as an interval, so consonance and dissonance are generally a property of intervals.
There isn't a strictly-defined separation between consonance and dissonance: some pairs of sounds are clearly consonant (such as notes related by the the 2:1 ratio) and some are clearly dissonant, but there is no well-defined cutoff point where an interval stops being consonant and start being dissonant. It's best to think of them as relative to one another: a pair of sounds can be more consonant or more dissonant than another pair, rather than being consonant or dissonant on an absolute scale. Also note that the terms consonant and dissonant are old, and have been informally and sometimes contradictorily defined for centuries: some people define them in terms of frequencies, some in terms of perception, some in terms of both. Defining them as pleasant and unpleasant is reductive, but not necessarily a bad intuition.

Picking Points in Sound-Space

When we write music, we generally want to put pleasant sounds together, which suggests that we'd like to take the space of all possible sounds—many of which are not going to sound pleasant together at all—and carve it up so that we can have a nice grab-bag of consonant frequencies to work with while omitting dissonant ones. Let's start in a very simple way: we know that intervals composed of simple ratios of frequences are consonant, so let's start with two of the simplest possible ratios—the 2:1 ratio and the 3:2 ratio—and use them to build up a set of "compatible" sounds, a bunch of sounds we can pull from that are guaranteed to have some consonant neighbors in the same set.
Let's start with 400 Hz as our first tone:
Let's stipulate that every frequency which can be generated from 400 Hz using the 2:1 ratio—that is, frequencies like 100 Hz, 200 Hz, 800 Hz, 1600 Hz—are also included, because all of those frequencies are always going to be consonant with each other. Whenever we add a new frequency \(f\) to our scale, we are also going to add \(2^n \times f\) for every integer \(n\).
Now, let's generate another sound using by the 3:2 ratio. \(400 \times \frac{3}{2}\) produces \(600\). So, let's add 600 Hz (again including every frequency of the form \(2^n \times 600\), outside the range of our graphic.)
Next, \(600 \times \frac{3}{2}\) produces \(900\). That's outside the range I've chosen to display here: but, like before, we're going to be including not just 900 Hz, but also all the doubled and halved frequencies we can reach from 900 Hz, and one of those—450 Hz—is within the range of our graphic. Luckily, no matter which new tone we choose add, we're going to also be adding one corresponding tone in the 400 Hz to 800 Hz range, so whenever the new tone generated by application of the 3:2 ratio is outside this range, I'll go ahead and halve it so we can only work with frequencies in this range.
Okay, let's keep going: \(450 \times \frac{3}{2}\) is 675:
You know the drill now, so I'm just going to list a bunch more:
  • \(675.00 \times \frac{3}{2} \times \frac{1}{2} = 506.25\)
  • \(506.25 \times \frac{3}{2}\ \ \ \ \ \ \ \ \approx 759.37\)
  • \(759.37 \times \frac{3}{2} \times \frac{1}{2} \approx 569.53\)
  • \(569.53 \times \frac{3}{2} \times \frac{1}{2} \approx 427.14\)
  • \(427.14 \times \frac{3}{2}\ \ \ \ \ \ \ \ \approx 640.72\)
  • \(640.72 \times \frac{3}{2} \times \frac{1}{2} \approx 480.54\)
  • \(480.54 \times \frac{3}{2}\ \ \ \ \ \ \ \ \approx 720.81\)
  • \(720.81 \times \frac{3}{2} \times \frac{1}{2} \approx 540.61\)
I'm going to stop there—I'll explain why in a moment—but here's what all the tones we've created so far look like:
If you play them in the order we created them, they sound like this:
If you play them from left to right, they sound like this:
These tones might seem familiar if you've played on (or with) a piano before: we've recreated the tones on a piano keyboard! …or, well, we've gotten very close.

Pythagorean Comma

I've been avoiding lumping in too much new jargon, but I might as well do a few reveals: the set of notes I've been building up is generally called Pythagorean Tuning; an interval with a 2:1 ratio is called an octave; an interval with a 3:2 ratio is generally called a perfect fifth, due to a note-numbering system I'm going to gloss over here.
You'll recall I stopped generating notes after we accumulated twelve of them. We could try to keep going, but something funny thing happens after we add the next note, which is that we get almost but not quite back to where we started: \(540.61 \times \frac{3}{2} \times \frac{1}{2} \approx 405.45\).
If you play 405.45 Hz and 400 Hz together, they sound close but not quite the same. They're certainly a lot closer together than any other pair of notes we've produced so far! If you play them next to each other, you can barely hear the difference:
It turns out that if you keep using the 3:2 ratio to generate tones, you'll never exactly get back to where you started: instead, you'll start generating tones that differ just slightly from the ones you already generated. You can keep going, and you'll keep generating more frequencies that will all be slightly different than the ones you generated before, and get back to yet another note that's a bit above 405.45 Hz, and so forth. The closest we'll get to the original 400 Hz will be after we've applied the 3:2 ratio 53 times, but even then, we're not going to hit 400 Hz exactly.
The difference between that original root frequency (400 Hz) and the almost-but-not-quite-the-root frequency (405.45 Hz) is referred to as Pythagorean comma. The word comma derives from a Greek word for 'cutting': in typography, it originally referred to the part of the sentence after a punctuation mark, which was 'cut off' from the beginning of the sentence, and gradually came to signify the mark itself. In this musical sense, it was originally used because the Pythagorean comma is a 'cut-off portion' of the musical scale. It's not really very useful to measure Pythagorean comma in hertz, because the difference between the root value and the not-quite-the-root value depends on which frequency we choose as the root, i.e. if we were working in the range of 300 Hz to 600 Hz, then it would be smaller (the difference between 304.09 Hz and 300 Hz), but if we were working in the range of 500 Hz to 1000 Hz, then it would be larger (the difference between 506.82 Hz and 500 Hz.) So let's move away from hertz and start using a different unit of measurement, one where we cancel out the logarithmic nature of frequencies: the cent.
The cent is a unit defined in terms of ratios of frequencies, so that the space of an octave—that is, a frequency and its double or half—is always expressed as exactly 1200 cents. Cents turn out to be wonderful for expressing tunings. I'm going to gloss over the conversion formulae between frequencies and cents—you can find them on Wikipedia, of course—and convert all the frequencies we've generated into cents. I'll include the not-quite-the-root frequency at the end of the table:
frequency cents
400.00 Hz 0.00 cents
427.14 Hz 113.68 cents
450.00 Hz 203.91 cents
480.54 Hz 317.59 cents
506.25 Hz 407.82 cents
540.60 Hz 521.50 cents
569.53 Hz 611.73 cents
600.00 Hz 701.95 cents
640.72 Hz 815.64 cents
675.00 Hz 905.86 cents
720.81 Hz 1019.55 cents
759.37 Hz 1109.77 cents
405.45 Hz 23.46 cents
Okay, now we have a value for Pythagorean comma: it will always be approximately 23.46 cents, regardless of which base frequency we choose to work with. Cents are a more convenient way of expressing ratios of frequencies, but if we wanted to express Pythagorean comma as a ratio, it would be 531441:524288.
Now, let's also try graphing all our cent amounts, including the close-to-the-root note. We end up with something that looks like this:
See those two dots near each other, all the way on the left? The difference between them is what we mean by Pythagorean comma.
As an interesting aside: the existence of Pythagorean comma is also the reason for one of the odd vestigial features of musical notation: namely, the existence of enharmonically equivalent notes. When learning modern music notation, you quickly learn that there are some odd redundancies in staff notation, including the fact that some notes can be expressed more than one way: for example, A♭ and G♯ are two ways of writing the same note. That's true with modern tunings and notation, but one reason for having both notations is that they did used to connote different notes in tunings like the Pythagorean tuning: if we use A♭ as our root note, then we'd write our close-to-the-root-but-not-quite note as G♯, and the two would differ from each other by the Pythagorean comma. This distinction is no longer made, but we're still stuck with the notation.
Back to the scale we're generating, where we have have to decide what to do with the not-quite-the-root note: we could include our it in the scale we're creating, and maybe even continue generating new frequencies from our 3:2 generator. This would build a scale that includes more and more notes, but in addition to requiring some a physically unwieldy keyboards, our scales are already getting diminishing returns from including these new notes: our goal was to choose a set of notes that sounded good together. Already, we've confused would-be composers to including two very-nearly-the-same notes when all the other notes are nicely distinct, and that new root isn't going to sound nearly as nice when played with some of the notes we generated earlier. Anyway, since our new note is almost just the original root note, we could just leave it off, stop the scale with only twelve notes. Sure, that means that one of our intervals is gonna be a teeny bit less consonant than the others, but most of them sound great, right?
As it turns out, that's exactly what Western music did for about two thousand years.

What Are The Problems with Pythagorean Tuning?

The tuning I've described here was originally described by Pythagoras Surprising, I know. in the 6th century BCE, and was historically quite popular in Western music up until about the 16th century. It has a nice basis in mathematics, and features the 2:3 ratio throughout, which means it has a lot of nice consonances between its various notes. However, the West has stopped using this tuning nearly as much as it once did. What are the problems with it? Why would we want something else?
Well, for one, there are a lot of ratios other than 2:3 that also sound nice: for example, the 5:4 ratio, being small and simple, is also a consonant interval. If we apply that ratio to our root note from before of 400 Hz, then we get \(400 * \frac{5}{4} = 500\). This frequency isn't in the scale we created. The closest we have is 506.25 Hz, which corresponds a ratio of 81:64—close, but not really as pleasant as a real 5:4 ratio. Take a listen: the following clip alternates back and forth between a perfect 3:4 interval and the less-pleasing 81:64 interval we get from Pythagorean tuning:
That meant that music composed with Pythagorean tuning in mind tended to avoid that interval—what we now call a major third—because it just didn't sound as nice as it does in other tuning systems.
Another big problem was the wolf, which is a wonderfully evocative name for a musical phenomenon. In Pythagorean tuning, once we've arranged every tone in ascending order of frequency, the fifth—that is, the tone that's related to the root tone by the 2:3 ratio—is seven steps up from the root, as you can see here:
If we move our focus to the next note up from each of those—the second and the eight frequency in our list, respectively—we discover that they are also related to each other by the 2:3 ratio! If we look at every note and its partner seven steps above it—looping around when necessary—we find that each of those pairs is related to each by a simple, pleasing ratio, either 2:3 or 3:4. …well, all of those pairs them except one.
Tonei Tonei+7 Ratio
400.00 Hz 600.00 Hz 1.500
427.15 Hz 640.72 Hz 1.500
450.00 Hz 675.00 Hz 1.500
480.54 Hz 720.81 Hz 1.500
506.25 Hz 759.38 Hz 1.500
540.61 Hz 400.00 Hz 1.352
569.53 Hz 427.15 Hz 1.333
600.00 Hz 450.00 Hz 1.333
640.72 Hz 480.54 Hz 1.333
675.00 Hz 506.25 Hz 1.333
720.81 Hz 540.61 Hz 1.333
759.38 Hz 569.53 Hz 1.333
There's an odd one out in this tuning, and it's because of the Pythagorean comma: a perfect fifth up from 540.61 Hz is 810.92 Hz, but that frequency corresponds to the not-quite-the-root note that we chose to omit from our scale. Instead, we kept the actual root, 400 Hz (or the corresponding note an octave higher, at 800 Hz) which, when taken with 540.61 Hz, produces an interval that doesn't really sound like a fifth, and instead produces a kind of dissonant "beating" when you listen to it.
This interval was said to "howl" when played, so it was avoided by composers, who referred to it as the wolf. It's also sometimes called a Procrustean fifth, because of the way it doesn't really fit. Similar not-quite-right intervals will arise in other alternate tuning systems, so the phrase wolf interval also refers to any these kinds of intervals in other systems. When Pythagorean tuning was in widespread use, composers had to be aware of wolf intervals when composing, and would write compositions that didn't ever use that particular interval.
Here is a sound clip which first plays a 540.61 Hz tone paired with a perfect fifth (at 810.92 Hz), and then plays the same tone paired with 800 Hz, which is the interval referred to as "the wolf".
What else could we do if we wanted to build a musical system that didn't have problems like the wolf? In the sixteenth and seventeenth centuries, Western music in general moved to a different system called quarter-comma meantone—which was a different tuning system that actually had an even more acute wolf interval—but In Pythagorean tuning, the wolf interval is about 23 cents flat, while in quarter-comma meantone, it's 35 cents sharp. I'm going to skip over that one as well as a handful of other systems, and move straight to the system we picked up in the 19th century without looking back: equal temperament. This is also sometimes more specifically called 12-TET, for '12-Tone Equal Temperament', or 12-EDO, for '12 Equal Divions of the Octave'.

Equal Temperament

Let's look again at the chart that shows the frequencies of a Pythagorean tuning, as well as how many cents each one is from the root:
frequency cents
400.00 Hz 0.00 cents
427.14 Hz 113.68 cents
450.00 Hz 203.91 cents
480.54 Hz 317.59 cents
506.25 Hz 407.82 cents
540.60 Hz 521.50 cents
569.53 Hz 611.73 cents
600.00 Hz 701.95 cents
640.72 Hz 815.64 cents
675.00 Hz 905.86 cents
720.81 Hz 1019.55 cents
759.37 Hz 1109.77 cents
Notice how the stride between each pair of adjacent tones is roughly similar, but with some bit of wiggle room. Let's look at the sizes of the gaps between each adjacent pair of frequencies, calculated in cents:
frequency cents change in cents
400.00 Hz 0.00 cents +90.23 cents
427.14 Hz 113.68 cents +113.68 cents
450.00 Hz 203.91 cents +90.22 cents
480.54 Hz 317.59 cents +113.68 cents
506.25 Hz 407.82 cents +90.22 cents
540.60 Hz 521.50 cents +113.68 cents
569.53 Hz 611.73 cents +90.22 cents
600.00 Hz 701.95 cents +90.22 cents
640.72 Hz 815.64 cents +113.68 cents
675.00 Hz 905.86 cents +90.22 cents
720.81 Hz 1019.55 cents +113.68 cents
759.37 Hz 1109.77 cents +90.22 cents
We can see that every note is either 90 or 113 cents above the previous note. This hints at one approach we can take to avoid dissonant intervals like the wolf: given that they're all more or less similar already, we could just fudge all the intervals so they're all exactly equal. That means each interval might not sound quite perfect—no two of them will be related by a perfect simple ratio—but to most untrained ears, they'll be close enough. Now, instead of doing some elaborate set of frequency multiplications to choose our notes as in Pythagorean tuning, we just split the octave into twelve equal steps of 100 cents each, and convert those back to get our frequencies:
frequency cents change in cents
400.00 Hz 0.00 cents +100 cents
423.78 Hz 100.00 cents +100 cents
448.98 Hz 200.00 cents +100 cents
475.68 Hz 300.00 cents +100 cents
503.96 Hz 400.00 cents +100 cents
533.93 Hz 500.00 cents +100 cents
565.68 Hz 600.00 cents +100 cents
599.32 Hz 700.00 cents +100 cents
634.96 Hz 800.00 cents +100 cents
672.71 Hz 900.00 cents +100 cents
712.71 Hz 1000.00 cents +100 cents
755.09 Hz 1100.00 cents +100 cents
It turns out that this is, broadly speaking, a pretty good approximation! We're within two cents of a perfect fifth, and most of the other ratios are at least workable. Because our "fifths" are ever-so-slightly less than a perfect 2:1 ratio, we can loop around and get back to exactly the root note, and consequently our wolf intervals disappear, as well.
Of course, we did this by fudging things, so none of these ratios are going to sound exactly as nice. We still don't approximate the 5:4 ratio very well—we're about fourteen cents too high—but again, an untrained ear probably won't really notice. And it turns out that we do approximate a perfect fifth to within two cents, which is a pretty good approximation!
It's a pretty convenient system, and it's in pretty wide use today: at the very least, it's how a lot of people think about tuning. In practice, a number of instruments can't use equal temperament exactly because of physical limitations on the instrument itself, and singers tend to use something closer to the Pythagorean tuning (because it's easier to make your voice go up a perfect fifth than an ever-so-slightly-imperfect fifth) but we tend to treat equal temperament as the ideal tuning for most modern instruments.
That's the high-level overview of where twelve came from: it's has its roots in Pythagorean tuning, where it made sense to only include twelve tones, and our current system of equal temperament came from a reasonable compromise that ironed out some of the rough edges in Pythagorean tuning. Of course, there are still more tunings and temperaments to discuss—including some very interesting systems that result in more than twelve tones!—but this post is already a bit long, so I'm going to split the rest of this material into a separate post. Next time: microtonal music!