By Jack Sharkey
A large part of my job is figuring out ways to tell people that quality sound is important, but I also believe you can enjoy a cheap pair of earbuds as well as a $50,000 system. Heresy, I know, but it’s a matter of degrees. Heck, I grew up listening to really great music on the 4 x 10” in-dash speaker in my mother’s Country Squire station wagon, and I’m about as annoying a music/audio fan as you’ll meet – I’m just proletarian about it.
Researching this piece, I surfed the Googleyweb and read dozens of comments about how mp3s sound the same as FLAC files. Or how mp3s are worse than FLAC files. Or how if you like great sounding music you’re an out-of-touch snob who hears things that aren’t there. Or how if you really dig listening to your music on cheap earbuds you’re somehow not worthy of being called a music fan.
In today’s episode I’m not going to try to dispel myths, or try to convince you of something. What I am going to try to do is put some perspective on the arguments. In the interest of journalistic integrity, I have used actual comments from various web sources (mostly from YouTube videos about lossy compression) as section headers in this piece, more or less to prove that there is in fact a lot of arguing going on.
Caring About the Difference Is Always A Personal Thing
I’d like to start by saying the argument is becoming boring to the point of catalepsy. I know, more heresy from a guy who works for a high-end speaker manufacturer, but follow through with me, we’re arguing about what other people hear. How can we possibly know what other people hear?
Like life, music is a journey, not a destination. What we listen to, why we listen, and how we listen all evolve as our lives progress. With that in mind, the hypothesis statement of this piece is simple: When it comes to music, the what, why, or how doesn’t matter – all that matters is that music touches you, but we should all have access to the very best listening experience possible - if we want it.
My own personal timeline of music is long with wide stretches of no change interspersed with giant leaps in quality. I used to be somewhat of an early adopter of new technology, but as we’ve entered a period where all we seem to be doing is re-inventing the wheel, I’ve kind of become a late adopter, choosing to wait to see the validity of each new technology that gets thrown at me.
Truly, the biggest musical revelation I ever had was the first time I listened to an FM radio broadcast in 1974 using a pair of Koss headphones I borrowed from a friend. That was the moment the quest for sound began for me.
Watch this video that attempts to illustrate the difference between an mp3 file and a “lossless version” (of unknown format) of a snippet of the song The Bad Touch by The Bloodhound Gang.
A couple of things we should take notice of:
1. The sample contains no vocals
2. The instrumentation is canned drums with little or no brass (cymbals) and no dynamics
3. A synthesizer is the main sound used
4. A bass that may be synthesized is predominant
5. There is a guitar that may be synthesized but is otherwise very undynamic
6. There are little to no spatial effects such as reverb and echo
7. The song is really dynamically compressed so there aren't a lot of dynamic changes
But the biggest thing we should take away from this video is the narrator’s conclusion: Wow. That really is all you are missing when you listen to an mp3.
Right. In this song under this example. I think it’s dangerous to draw the conclusion that every mp3 only removes that small amount of “data.” It’s also non-factual. Plus, removing sounds – even ones we think of as inconsequential – changes (distorts) the artist's original intention.
I also think it’s important to remember that YouTube squashes the heck out of the audio in every video (otherwise how would all of those kitty videos fit in your computer?). That means you aren’t experiencing the full effect of the lost data because of the amount of compression used in the delivery format (YouTube). The data look good on the graph but your ears are really not deciphering the difference.
I Can’t Tell the Difference
Back in June, NPR released a cool little piece on its The Record Blog about this exact subject. Six different songs from various musical genres were presented, each with three different compression levels. Take a couple of minutes and go take the test, then come back. (Click Here)
On first try, I got 4 out of six (I missed the Jay-Z and Katy Perry tracks, because I wasn’t listening for the right things) and on each of the two I mistook the 320kps file for the uncompressed version. On the second try I got 3 out of six, but on the third try I got all six. I know several really serious industry people who were not able to get all six for a few tries, but once they locked in on the differences, they were consistently correct.
Here’s a quick list of what you might want to listen for when trying to hear the differences between the tracks:
Speed of Sound: Listen for the crash cymbal in the very beginning and in the middle. Also listen to the ride cymbal (which is kind of buried in the mix) for the attack and decay of the cymbal itself. Admittedly, the differences are subtle (particularly between 320kps and lossless), but it is there.
There’s A World: This is a really tough one because the production is very mid-rangey, but if you listen to the glockenspiel and reeds at the end of the sample, in the uncompressed version you can actually hear a slight delay between the two instruments, plus there is a stronger sense of separation between them.
Tom’s Diner: Back in February we examined this song and the role it played in the inception of the mp3 (How A Great Song Helped Upend the Music Industry’s Apple Cart). Listen to the snaps of Vega’s tongue and the sibilance of her voice. Another giveaway is the reverb – in the uncompressed sample there is an actual feeling of three-dimensional space in the reverb, whereas in the 320kps sample you can hear the reverb but it is two-dimensional. In the 128kps sample the reverb is so two-dimensional as to sound almost like a slapback.
Mozart Piano Concerto #17: The piano sounds deeper – bigger, and the strings sit in their own space in the mix. Also listen for the decay of the higher notes on the piano. Of all the songs in this test (other than Tom’s Diner) the differences are most apparent in this one which tells you it’s the natural instruments (including the human voice) – with their natural attack and decay – that are most affected by data compression.
Tom Ford: Listen to the Pac-Man noises, especially as compared to how they sit against the percussive ratchet sound. Also, Jay Z’s voice has more separation and lifts out away from the mix. The drums also move more air and sound bigger and more three-dimensional.
Dark Horse: Listen as exclusively as you can to the background vocals. You’ll hear more stereo separation (panning) in the uncompressed file, and you will hear more of the breathiness in the sighs of the singers in between the notes. Subtle yes, but once you hear them full-on, it makes quite a difference in the perception of the music.
I listened to all of these on a standard HP computer soundcard and a pair of KEF X300A speakers.
MP3 is the 240p of music
When it comes to lossy compression, there is a lot of talk about perceptual encoding, or perceptual compression, so let’s try to understand exactly what we’re not hearing, as opposed to what we were meant to hear.
Sometimes there is a misperception about what is lost when we data compress. I’ve had people try to convince me that layered sounds in an audio file are compressed out. For example, let's say a portion of a song has multiple people clapping hands at the same time a couple of people are snapping their fingers. There is a school of thought that says when we data compress the finger snaps are removed from under (sonically speaking) the hand claps (in this example). This is not what happens. Regardless of format, when those handclaps and finger snaps are recorded and mixed they become one complex signal of hand claps and finger snaps. The compression algorithm doesn’t differentiate. What it does remove is the spaces in between the notes (in the case the rhythmic clapping and snapping). This means what you are mostly losing when you data compress music is air, reflection, reverb, space and the perception of a soundstage. You basically lose the things that make a song jump out and seem alive. The problem is there is another school of thought that asserts that the spaces between the notes are unimportant wastes of precious storage capacity and that is also completely false – the spaces give music its life.
Based on the current trends in modern music, where a lot of bass and heavy drum sounds are played underneath a minimal chord structure, you’re not going to have a great deal of air or space between notes, so the loss is negligible. However, if you’re trying to listen to The Trumpet Voluntary In D, or maybe Two Princes by the Spin Doctors (seriously, how awesome is that snare drum?) then you’re going to miss some of the spatiality of the music. It is what it is kids, so all of you people who insist there is no perceptual difference, are, well, in a word, wrong.
The actual question is: Does it matter?
Yes. And No. Kind of maybe. As a shiftless and bored 14 year-old driving with my mom to the Foodtown when I heard Ramblin’ Man for the first time on the horrible in-dash speaker in her station wagon, I was moved by what I heard. Period. The sonic quality of what I was listening to didn’t matter. Now when I listen to that song and that record, I am still moved by the music, in spite of the fact that the recording is basically horrible. But, I do in fact get more enjoyment from the music regardless of the source recording on a better sound system punching out a hefty amount of SPL.
It’s all subjective. Different formats offer different experiences. With mp3 you can carry all the music your little heart desires on your phone so you’re never more than a battery charge away from your personal sonic nirvana. That’s okay. If you’re inclined to listen to music as an experience in and of itself rather than as an adjunct to another activity (commuting on the B Train) then obviously an mp3 played on inexpensive equipment is going to leave the experience somewhat lacking.
I subscribe to the notion that I want to hear as much of the original musical performance as I possibly can. Music is not quantifiable, it is not objective. It is simply and purely emotion – yours and the artist’s. We owe it to ourselves to grab as much of that feeling and emotion as we can. I am a fan of performance and excellent audio equipment heightens my enjoyment, but someone who really only cares about the thump in their head while they walk down the street has an equally valid position. We can all be right in our choices, as long as we understand the differences.
The best thing about being alive right now is we’re all free to make the choice between a decent sound that fits a mobile listening lifestyle and an incredible sound that carries us away even further.
We are in the infancy of digital music, so we really shouldn’t be drawing any conclusions just yet about what some of us hear and what some of us don’t. My feeling is that years from now, people will look back at this era of musical technology as quaint and simple. As we move forward, we – musicians, professionals and fans – will come to understand that so much of music is not what you hear, but what you sense. That it is not so much what is played, but what is not played and what happens in between. Argument is good – it pushes technology forward, but argument based on supposition and not fact tends to veer astray of the point.
In the timelline of music reproduction, our current digital technology is at about the same development age as the very first 78 RPM records, or the first talking movie. Edison recorded and played back "Mary's Little Lamb" in 1877 and thirty years later the 78 RPM record was produced. We're about thirty years out from the useful implementation of data compression for music. In spite of how scary smart everyone is today as compared to back then, thirty years of technological development is nothing. Believe it or not, those who insist we’ve reached critical mass in terms of audio playback quality need only look at the amazing advances made in video technology in just the past 15 years – there is still plenty of room for improvement.
Let music mean whatever it means to you, but I guarantee you that once you start to listen differently the world opens up a little bit.
The opinions in this article are the author's own and not necessarily those of KEF or its employees.