I have long been fascinated by evolutionary biology and the idea that any mental faculty shared by all humans must have had an evolutionary purpose at some point in our ancestors’ development. Music is one such faculty, whose purpose is not immediately obvious. I have been reading as much as possible about the topic and mulling the matter over for the last few years so plan to outline some of my thoughts in a series of blogs. I hope that they will eventually also solidify into in an e-book.
Many social and sexual advantages have been proposed over the years, but in this entry, I want to look at how the physics of sound transmission could have created a selection pressure for pre-human creatures to begin to sing in harmony between one and two million years ago.
Before we start…
Any ability ubiquitous throughout the human race and with brain regions devoted to its execution must have been selectively useful to man at some point in his history. If this were not the case, a creature possessing the ability would have no selection advantage over his fellows who did not. Consequently the ability and the physical equipment necessary to achieve it would not proliferate.
While there are numerous brain regions implicated in the processing of pitch, rhythm and harmony, some might argue that these are evolutionary by-products of other processes, particularly language, which have been more recently co-opted into processing musical stimuli. I am strongly against that viewpoint and will explain why in future blogs. You will have to forgive my not fully explaining my premises- when these blogs become an e-book my thought process will be explained in chronological order but the blog is more of a sounding board for new ideas and consequently skips out the processes behind some of my pre-formed conclusions. Many far better informed authors than myself have already provided strong evidence to the contrary in print. I have been particularly influenced in my thinking by The Singing Neanderthals by Steven Mithen and The Prehistory of Music by Iain Morley.
A note’s pitch can be described in physical terms as its frequency. Sound travels through the air as longitudinal waves of pressure, and the frequency of these waves (the number of peaks and troughs per second) is what we perceive as their pitch. A sound wave whose peaks and troughs of pressure are closer together will be perceived as higher pitched and one whose peaks and troughs are further apart, lower. Many of the graphs in this post were created using the amazing free Android app version of Desmos.
A low frequency (pitch) sine wave
A high frequency (pitch) sine wave
When a sound wave travels through air, it gradually becomes weaker. This is because the the wave is causing “air molecules” to “wobble” as it passes and thus “giving away” its energy. A wave which has a greater amount of energy to begin with (amplitude, or height on the graph) will travel further before running out of steam.
Furthermore, an obstacle may prevent a sound wave from reaching a listener. For example imagine that the wave were to hit a wall. Some of its energy would dissipate into the bricks and the rest would be reflected back in the direction from which it came. Because of this, low frequencies tend to travel further than high ones, as their slower oscillations (ups and downs!) mean that they are less likely to hit obstacles.
Low frequency sound waves with obstacle
The high frequency wave is more likely to hit the obstacle and be reflected or absorbed
In 3D space of course the higher frequency soundwave is not likely to hit walls, but air molecules. If it performs four oscillations (up and down movements) for every cm it travels forward it is going to interact with, and lose energy to, far more air molecules than a lower frequency wave which only performs one. You can find a more in-depth explanation with snazzy moving diagrams here.
From an evolutionary perspective, there are many social benefits to singing and learning to harmonise with others. I will be covering them in future posts. However the nature of sound transmission gives rise to a clear physical benefit, one whose nature is hinted at by birdsong.
Before we get to that, let’s meet our singers. Our proto-human ancestors of just under 2 million years ago (homo ergaster), were living in rapidly expanding social groups due to a changing climate. The deforestation of their East African home meant that they were moving out of the trees, walking upright, living in larger groups (for protection from predators) and consequently undergoing huge increases in size to the frontal and pre-frontal cortices, brain regions associated with impulse control, logic and other higher faculties required for social interaction. Having evolved bipedalism (upright walking) their new posture had also led to a lengthened vocal tract and lowered larynx. This combination of improved communication tool and larger social brain meant that they were now becoming able to produce complex, pitched vocalisations and better interpret the meaning of others’ “songs”. I feel that it is reasonable to assume that they would have shared messages with other neighbouring groups by use of group vocalisations and for reasons we will see, pitched and harmonised vocalisations would be vastly more effective that non-coordinated ones.
Although for the purposes of this particular blog their motivations are not particularly important, I am imagining that the main purpose of group vocalisations in our hominid ancestors would have been territory defence. Many modern monkeys are forced to co-operate with mating rivals in order to defend their breeding grounds from other groups. If our ancestors had the same problem, then a good way for them to put off other rival groups from entering their territory might have been to work together to “sing” loudly and signal their unfriendly intentions to any prospective raiders. Equally if they sought to attract females to the area, “singing” loudly would be a good way to draw attention.
NB- sexual selection is one of the earliest arguments for the evolution of musicality, which was posed by Charles Darwin and will be returned to in later posts. I am ignoring it for this post as I am proposing a physical survival benefit for harmonisation which would have benefited our ancestors long before they evolved the capacity for being emotionally affected by pitched sound. This would have been required in order for them to find song sexually attractive.
So, from the perspective of a sound wave, there are two ways to travel further. One is to increase your amplitude (or volume) and the other is the decrease your wavelength (or pitch). Of course as far as the optimal wavelength is concerned, this varies dependent on the environment- whether there are trees present, temperature, wind etc. Again I’ll be looking at this in future posts- for the purposes of this one I will be ignoring all environmental considerations.
Our singing creatures could give their signal a higher amplitude (volume) by simultaneously singing one pitch. As you can see in the diagram below, multiple waves at the same pitch will join together in the air (or whatever medium they are travelling through) to create a taller (louder) wave of the same frequency which will consequently travel further. This can lead to issues of its own though, the main one being phase cancellation. This happens when one wave begins half way through the wavelength of the other, consequently cancelling it out as shown below. This could happen because the singers are not robots and do not create perfectly neat, synchronised waves, or due to some of the sound being reflected by obstacles and consequently travelling back into the path of outgoing waves, cancelling them out.
The blue line is a sine wave (pure tone) and the green line is the same wave inverted. The purple line is what a listener would hear if the two were played together- aka complete silence!
Two creatures singing the same pitch simultaneously might find their songs cancelling one another out in this way (or as is more likely with organic singers, drifting in and out of phase with one another leading to a signal whose volume would be perceived by a listener as constantly fluctuating). The way to eliminate this would be to sing in harmony. Consider the following diagrams:
Two singers singing a perfect fifth apart. The fundamental or lower frequency is blue and the fifth is green. The purple wave is what a listener would hear.
The same two singers again but this time the higher frequency wave is inverted.
…And again with the lower frequency inverted.
The really important feature of these three graphs is that regardless of whether any of the waves are inverted, the wave which is heard by the listener is always the same! To check this, compare the shapes of the three purple lines. If moved a little to the left or right they all overlap perfectly. So by singing in harmony our singers have eliminated the effects of phase cancellation and created a consistent wave whose meaning is still discernible by the receiver.
Here is the same principle demonstrated with two singers separated by a perfect fourth (frequency ratio 4:3):
Fundamental (lower pitch) in blue, perfect fourth in green, what listener hears in purple
Same again with the fourth (higher frequency, green) inverted
…And again with the fundamental (lower frequency) inverted
Of course a real creature cannot produce a perfect sine wave. Their signal is modulated by a range of factors. Firstly the vocal chords act in a similar way to a stringed instrument, adding overtones to the signal produced. Furthermore the position of the tongue, lips, jaw and so on affect the shape of the resonant cavity through which sound escapes the body and consequently alter the balance of partials present in the output signal. For more information on this have a look at this great post by singing instructor Karyn O’Connor. These partials are smaller modulations of the fundamental frequency. They also convey information on the sender’s feelings and intentions to the receiver. For example, due to the altered shape of their mouth and throat, a smiling person makes a sound more like “Eeee”, whose sound wave would look like this:
This is the waveform produced by a human (me!) singing with a smile. It makes an “Eeeee” type sound. You can see the frequency is roughly constant- every oscillation (1 peak and 1 trough) is marked with a vertical red line. This smaller wobbles are the partials of the wave- the extra frequencies added by the physical shape of my sound producing apparatus aka mouth and throat.
Whereas an angry, frowning person with their face scrunched into an angry frown might make a sound more like this:
This one makes a sound something like “Oooo”. You can see that the frequency is the same as the other sound, but the partials are very very different. It is this high frequency modulation which a listener would use to gauge my emotional state and intentions.
While both of the sound waves above have the same fundamental pitch, you can see that the higher frequency modulations of the waves are very different and it is these higher frequency elements which convey their meanings. So as a general principle, while low frequencies travel better, the higher frequency data in a man-made sound wave convey its social meaning.
Now consider what happens when we combine these natural sound waves together. For the purposes of this experiment I have created a graph of an “ee” vowel and an “oo” vowel.
The “ee” vowel’s formula is: sin(2x) + ((sin(4x))/12 + (sin(6x))/32 + (sin(14x))/32 + ((sin(16x))/8 + (sin(18x))/5
This is one sine wave “sin(2x)” , being modulated by several smaller, higher frequency partials. Dividing a partial by a greater number diminishes its amplitude, so “sin(4x)”, a partial at double the frequency of the fundamental (an octave above) has 12 times less amplitude than the fundamental. These smaller partials would be caused by the resonant properties of the mouth and throat.
The “oo” vowel’s formula is: 2sin(x) + (sin(2x))/5 + (sin(3x))/2
Now look what happens when two friendly, smiling humans sing in fifths (3:2 ratio):
“Ee”s at 3:2 perfect fifth apart)
And how about when two angry, frowning humans sing in fifths:
“Oo”s in fifths (3:2)
I’ll admit that this squiggley mess is a bit confusing… But it shows some interesting properties of harmony. Lower frequencies travel further, but high frequency, low amplitude modulations are the parts of a vocalisation from which we infer its meaning. If you look at the combined waves of both diagrams (purple) you will notice that the high frequency modulations from the higher pitched singer’s wave (green) are still present in the combined wave. However, if you look at the broader pattern of the purple wave it performs fewer oscillations than the green one, meaning it will lose considerably less of its energy as it travels through the air to a listener! This means that the harmonised wave will travel considerably further than that produced by a pair of singers singing at the same pitch.
Just to round things off here is the same demonstration with two singers in major thirds (5:4 ratio).
Ees in major thirds (5:4)
Oos in major thirds (5:4)
So in conclusion, harmonising would provide a survival benefit for proto-humans wanting to transmit a message as far as possible. It creates a sound wave which has a high amplitude and low frequency but retains the high pitched modulations which confer meaning to the listener. This combined wave consequently travels further, taking its message to more enemies or potential mates.
Before I sign off, here is another purely physics-based survival benefit to singing in harmony. The human mind is adept at working out from the harmonics present in a wave what its fundamental frequency is, and can use that ability to work out what pitches are present in a signal containing multiple pitches. I imagine that this ability would have evolved as a mechanism for working out the sources of environmental sounds, eg separating the sound of wind whistling through trees from the sound of a predator’s approaching feet. Consequently we can assume that it was fairly well developed by the time early hominids began producing pitched vocalisations. This means that a listener hearing the signal of a harmonised group would be able to make a better prediction of the group’s size than a listener hearing a group who were all singing the same pitch. Naturally if a group were singing to defend a territory this clear indicator of their numbers would be very helpful to them.
There is much more to say on the subject of auditory physics and its connection to the origins of musicality. I am looking forward to exploring the topic much further in future blogs. I am certainly no physics expert and the ideas above are merely my opinions not generally accepted fact, so please leave your comments and thoughts below. I would love to know what you think and hear from people more knowledgeable from myself in order to better understand the topic! Please also note that my suggestion that homo ergaster might have been the first hominids to harmonise is pure conjecture based on my limited understanding of the archaeological data.
In my next post I will be examining the physics behind triads and other types of chords and discussing whether there is a potential evolutionary reason for their popularity or purely a cultural one.
Please subscribe to the mailing list to be updated when future installments are available. If you have any thoughts on the topic which are more in-depth than the comment box will allow then please don’t hesitate to message me using the contact page. I rarely have anyone to discuss these things with and would love to hear from you!
1 Comment
Another point in favour of this argument is that short burst signals travel better in trees, as shown by bird song. This is adapted to its producers’ environments and generally you find short, sharp whistles amongst forest species. However, you find more complicated “trills” amongst species living in open areas where these convey more information and are less likely to have their messages interfered with by trees. This fits with the idea that somewhere just after two million years ago, where the auditory cortex suddenly expands massively in the fossilised skulls from that period, people were moving into savanna areas where pitched vocalisation was suddenly more important for communication between social groups and more practical due to the altered environmental constraints.
I think that this development came first, with humans harmonising for better signal transfer and only much later beginning to deliberately modulate their utterances into what became language.
In Catchpole and Slater’s “Birdsong: Biological Themes and Variations”, p. 77 says ” Atmospheric Turbulence does not distort frequencies to any great extent. So, as with reverberation, the coding of information in patterns of frequency would be predicted as best for transmission [rather than by using rhythmic or volume modulation]. This does indeed appear to be the way in which most birds code their signals”… And the way in which proto-humans probably sang.