Docs


PCM - pulse-code modulation - is an old sampling format used for waveform signal encoding in digital form.

PCM defines two parameters: Time (frequency) and amplitude (voltage level). Time is encoded by sampling frequency, the horizontal axis of a PCM waveform, in Hz (oscillations/second). Amplitude is encoded by bit depth (bits per sample, BPS).

Two important concepts to understand are thus:

1. Frequency detail is defined by the number of samples available at a given sampling frequency. The higher the sampling frequency, the higher the detail, the more samples per cycle. As sampling frequency rises, there are less and less samples/cycle until degenerating into on/off noise at very high frequencies. The detail limit is f/8. As an example, a sampling frequency of 48000 Hz provides 8 samples/cycle at a signal frequency of less than 6000 Hz (48000/8=6000). 96000 Hz sampling defines well signals of less than 12000 Hz, 192000 Hz sampling has a practical audio bandwidth of below 24000 Hz. Anything below 160000 Hz/24-bit cannot be called "hi-fi" as it does not define at least 20-20000 Hz with enough accuracy.

2. The biggest flaw of PCM is that detail vanishes as bit depth decreases. 16-bit is (unfortunately) a very common BPS value, but it is 16-bit (65536 voltage divisions) only between roughly 0 and -6 dB. Anything below ~-6 dB is 15-bit (32768), anything below ~-12 dB is 14-bit (16384 levels), etc. Things get very coarse very quickly at very quiet levels. Fortunately, there is 32-bit and 24-bit sample detail, unfortunately, modern DACs/ADCs aren't too accurate to realistically fulfil that sort of detail in the real world.



Ed Meitner had mentioned in a Positive Feedback interview (read it, it explains a lot) that in his opinion DSD is a superior choice to PCM because of more natural transients at zero-crossing. PCM zero-crossing has the interesting quality of having no detail at all (0 bits). As it happens though, humans listen to waveform vectors rather than waveforms gradually losing detail as loudness wanes. Once you compare vinyl or tape and DSD to PCM on a modern hi-fi speaker system, this becomes obvious. PCM has a kind of woolly/blurry character to transients, which becomes less noticeable as sampling frequency and bit depth increase, but it is there nevertheless.

These are the basic physics needed to understand PCM sampling. Most digital formats (CD, DVD-Audio, Blu-ray, AAC, MP3, FLAC, ALAC...) are PCM-based. Audio for CDs is mastered with PCM-based tools and stored in PCM wave (.wav or AIFF) format. MP3 files are decoded to PCM for output to DACs in a computer soundcard or portable player, etc. Computer sounds are stored as some sort of PCM waveforms too, such as sounds in games, system sounds, MIDI wavetables, etc. The only digital formats that are not PCM-based are DSD and SACD (which is a form of DSD encoded on an optical disc).

Lossy And Lossless Formats

Lossy formats are a little abomination of the times when bandwidth was limited. The most popular, MP3, is actually an acronym for MPEG (Moving Pictures Expert Group) version 2 Layer III audio compression. Developed by Fraunhofer IIS institute in Germany, MP3 employs a bunch of perceptive encoding techniques to remove audio data deemed "superfluous" in a PCM file. The essential techniques are: Removing harmonics which are considered "excessive" and leaving basic tones (tonality estimation), reducing dynamic range (ATH - absolute threshold of hearing) and removing masking harmonics (muter harmonics occluded by louder harmonics in given ranges), and limiting bandwidth (lowpassing or bandpassing, lower-bitrate MP3 files limited to 16 KHz, 18 KHz...). The result is an encoded waveform which can be about 1/10th the original PCM waveform size. This process is called "lossy compression": Discarding data that in the compression algorithm's consideration is "non-essential" for playback. The result, of course, is a certain thinning of ambience and a dryish/harsher character of the waveform, sometimes with noticeable attack artifacts on chromatic percussion.

The paradox is that sometimes MP3 compression produces more pleasant sound than original CDs. Why does this happen? Simple, CDs are low in resolution and have certain harsh parts which MP3 compression may cut. Real detailed time resolution of a CD is 44100/8=less than 5512.5 Hz. Midrange, in other words. Remember that MP3 compression thins out high frequencies by limiting bandwidth through a lowpass (as low as 16 KHz in a 128-kbps file, or 18 KHz for 160 kbps). These are harsh on a CD: 44100/3=14700 Hz, and there are only two coordinates above that point. MP3 harmonic-cutting algorithm also removes quite a lot of "hard to perceive" high frequencies. Which again, are harsh on a CD.


44100 Hz/16-bit (CD audio equivalent), 12 KHz sine wave. It gets worse as frequency increases; anything beyond ~14700 Hz is technical noise.


96000 Hz/24-bit, 12 KHz sine wave. Notice the larger number of samples allows more freedom to describe treble.

This is very easy to verify by running a lowpass filter on a lossless CD copy (or a 44/16 FLAC download). Classical music works best as strings and winds have a lot of high harmonics. Use NaiveLPF with a VST host like REAPER to experiment with different cuts. You can also use parametric EQ in a player like Winamp or XMPlay to cut very high frequencies. You'll find that 44/16 (and CDs) start sounding more pleasant when the lowpass is around 11 KHz (in other words, 44100/4=11025 Hz). Synthesiser makers have known this for quite a long time, and included analogue lowpass filters to "fatten" PCM sampler synths' sound and allow cutting off the harsher treble harmonics, especially on instruments that don't need them like basses.

There are certain lossy formats which can sound very nice (Opus), but the general good attitude is to stay away from them unless absolutely needed (like reducing a video file size for playing it on a portable device or to reduce download time/size). Other lossy formats are AAC, AC3, Musepack, Opus, WMA (which also has a lossless subformat), Ogg... AAC and AC3 tend to sound tinny with a subtle harshness/dryness. Musepack can be more or less natural, Opus is by far the least distorting lossy format. Ogg can be thinning and cold in sound, sometimes a properly encoded MP3 file can sound better than Ogg. WMA is a proprietary Microsoft format which isn't used much (it was designed to "embrace and extinguish" MP3 but that never worked), it's best to stay away from WMA because it's a Microsoft format. Also as there's no guarantee it'll work on anything (portable players usually just ignore it as an example). In general though all those formats can sound better than lower-bitrate MP3 files as they were all released after MP3 and have improved perceptive coding harmonic-shaving techniques. The downside is a higher CPU/DSP usage, e. g. Ogg requires a faster CPU than MP3, which could be decoded on a faster 80486 CPU (Ogg will need a Pentium in the very least). Memory usage is also higher for newer lossy formats, which is why cheaper devices usually didn't support them (DSP/RAM requirements are higher), combined with developers' apathy towards less popular formats.

A very bad example of an MP3 file (this is 128 kbps, but it sounds more like 64 kbps recoded to 128):



Ogg Q5 encode of the original piece:



Ogg supports 96000 Hz sampling rate by the way.

Original FLAC file:



There are several lossless formats as well, the most common being FLAC and DSD files. Lossless codecs are, quite simply, copying the original digital audio file and compressing it unaltered, without any dirty tricks like washing out harmonics and lowering dynamic range and filtering out high frequencies like lossy compressors do. DSD has been slowly gaining traction as a file format by itself, and many newer DACs and even portable players support it. DSD is not PCM though, it's rather a raw bitstream before quantising (thus it's missing an important distortion stage which is present in PCM). DSD means "Direct Stream Digital" and it is the exact same format developed by Sony used in SACDs (Super-audio-CDs). FLAC means "Free Lossless Audio Codec". It is an open-source, free-to-use codec which in spite of anti-lossless fanatism has been getting popular. Higher bandwidth and larger hard drives and DVD and Blu-ray data discs, as well as large memory cards allow forgetting lossy formats altogether and using FLAC and other similar lossy formats. FLAC is a PCM compressor, at the very basic stage it just doesn't waste bytes describing empty space like PCM wave files do. Other lossless compression formats are Wavpack (having the distinct advantage of supporting 32-bit float compression, unlike FLAC, which can only store integers), ALAC (which looks like Apple's own "divide and conquer" version of FLAC), Monkey's Audio (boasting slightly better compression ratios over FLAC), and WMA lossless, which by now nobody knows exists. All lossless formats except for DSD are PCM compressors, most perform quite alike (don't expect Ogg- or MP3-like feats of 1:10 compression, 0.5 or 0.3 original file size is usually the best they can do, 0.3 being the ratio for a quiet track).

It is good practice to only play and encode to lossless formats, especially now that there are no serious limitations on storage and bandwidth. Lossless formats are also best for processing with any external effects such as EQ (MP3 and AAC files can get toxic with a digital EQ sometimes, other effects like loudness/dynamic boosters can magnify lossy compression defects). Obviously no losses of original data means more ambience and artistic meaning present, at least in higher-res formats like 96/24, 192/24 and DSD.

Psychoacoustics of Frequency Ranges

Everybody knows what frequency ranges there are, such as bass, midrange, treble. However most people outside the music/sound engineering business have a somewhat dim understanding of which exact frequency ranges those are.


Strictly speaking, bass is 120 Hz and down, and it's not directional, anything above that is low midrange,
but ah well. 250-3000 Hz is usually defined as "critical midrange", the range which is clearest to human
hearing. Which is why the "critical range" must be free of any defects in both the mix and the playback
device.


What matters and what most people don't have an idea about is frequency range perception. Bass and midrange are all perceived as "solid" fundamentals and harmonics of an instrument.

Bass:


Midrange:



Treble (5000-10000 Hz):


High Frequencies (10000-20000 Hz):


Fullrange:


Treble is trickier. Treble defines space and presence. What makes a flute sound like a flute is its treble as well as its midrange (not much bass there). Metallic instruments' metallic tone presence is also defined in high midrange and treble.







A string section's presence is defined in treble and even higher frequencies above 10 KHz.


44 KHz/16-bit;


96 KHz/24-bit. Granted, those are two different records, but the presence improves, also sharpness of
definition is better (both dynamics and treble are functions of dynamic/time resolution).


Treble and high frequencies above 10 KHz define ambience, presence, depth, and instrument separation. Very high frequencies are used to give room dimension cues. Treble also conveys emotion and sweetness of instruments.

Differences Between PCM Resolutions


Typical PCM format resolutions are: 44100 Hz/16-bit (CD Audio), 48000 Hz/16-bit (AC97 soundcard standard), 88200 Hz/24-bit (double CDA resolution sometimes used for classical music and live records), 96000 Hz/24-bit (becoming more or less common, this is actually supported by video DVDs as well as DVD-A discs, but seldom implemented in hardware players or even software), 176400 Hz/24-bit (another DVD-A and live/classical music record format, often downloadable as FLAC from online stores), 192000 Hz/24-bit (highest DVD-A resolution), 384000 Hz/24-bit and in theory even higher (fairly uncommon but offered by some online stores and supported by Blu-ray).

There are some quite obvious differences between resolutions which anti-lossless and anti-high-res fanatics will try to deny. Still, they're there. One possible cause for trying to deny them (other than sheer blind, or more precisely, deaf fanatism) is having lo-fi gear or gear that pretends to have some fidelity but doesn't. Accurate treble/HF imaging is a hallmark of hi-fi gear, though nowadays even cheap Bluetooth speakers may have fairly decent treble/HF performance. Most of the physical differences are about treble and high-frequency accuracy/imaging. However as it turns out treble and high frequencies are more important than many people think.

Treble and HF range is essential for instrument separation and space definition. Lynn Olson described his first impression from hearing a CD in the 1980s: "The quiet passages were dead silent... actually, like a switch turned off... but any sensation of space, of stereophonic dimension, and of acoustic presence was totally absent. The start-stop reverberation sounded as flat as a paper Moon and just as fake." This is because CDs cannot define space properly as they run out of samples/cycle in high frequencies. 44100/8=5512.5 Hz - practical detail limit. Being generous might double this range to about 11025 Hz, but anything above that becomes harsh noise that is not very harmonic with the main harmonic (midrange) content.

Because music is harmonic, there are other consequences other than just plain flatness. Expression and life is up there in treble and HF range. Liveliness gets cut/lost when upper harmonics are deformed. Instrument sweetness gets damaged. String sections lose their magic. Cymbals lose their dimension and sparkle. Even bass drums get flattened and lose presence and drums overall lose their "flavour", 3D contour. Overall though, the life of music gets damaged, its meaning and expression. Some bits of meaning are simply not there on CDs, but they are preserved on vinyl and in higher-res PCM and DSD records (easy to prove by listening to a 1970s album in every format).

So in a nutshell, the higher the sampling rate, the finer the grain and the more harmonics are accurately represented. Well not at the 0-crossing, but at louder ranges.

What about bit depth? Bit depth is what defines amplitude detail. Part of the relative charm of, say, 96/24 is that 96 KHz makes it sound deeper and preserves more meaning ("sounds mystical" as an acquainted composer said). But taking the "24" out of 96/24, turning it into 96/16 rather quickly makes it sound cold and hollow, without the same expression. 24-bit allows a lot more voltage divisions than 16-bit, but it still has the same PCM limitation of losing resolution as volume goes down, albeit not as steeply as 16-bit.

Dynamic Accuracy

One of the odd obsessions of audio engineers over the years has been that with frequency response at the expense of dynamics. Accurate frequency playback is all fine and necessary, but what about speed? Poor-quality capacitors in the signal path, as an example, slow dynamics down. The waveform loses liveliness and music becomes more anemic and dull. Dynamic response might be more difficult to measure, but it sure can be heard when playback of a record is compared against a live musical instrument. High-frequency response also relies a lot on quick electronic dynamics. Drums and cymbals play much more lifelike on a fast system rather than a slow one. Everything sounds more lifelike, lively. The problem though is that low-res PCM tends to make things slow and woolly. This has been somewhat worked around by modern equipment manufacturers by using oversampling, e. g. 4x oversampling means an original CD record is played at 176400 Hz rather than 44100. Depending on the resampling algorithm used, oversampling can sound anything from crass and tinny to sharper and more concise than the original CD source. It still won't magically add missing treble definition or missing low loudness definition.

In the '70s and '80s this wasn't quite as noticeable for a bunch of reasons... One, DACs were built with more precision, using R2R designs rather than less accurate and cheap sigma-delta designs fashionable now. So a goodish Kenwood pocket CD player from the '90s, as an example, will still outperform a modern CD player, yet alone an abomination like an MP3 player. There's more depth, dimension and meaning in one of those things. Two, in the 1970s and 1980s designers weren't too aware of the importancy of components, and the components themselves weren't too good (even electrolytic capacitors have improved a lot, not to mention polypropylene, teflon and silver mica). Electrolytic capacitors in the '80s and stuff like carbon-composite resistors all sounded slow, hissy and ponderous. Three, transducer (headphone and speaker driver) technology has improved a lot. You could hardly get the same quality out of a studio speaker driver in the '80s that sold for $200 that you can get out of a mass-produced $15 driver nowadays. Speakers were slow and not too accurate way back in the '80s. Mass-made speaker and headphone driver quality nowadays is stunning compared to even the best designs from 1980s. Four, faster opamps mean modern preamps and headphone amplifiers are quicker and more lifelike. However, slower opamps in the 1980s/1990s also meant digital audio wasn't shown as harsh as it is, as slower opamps attenuate treble and high frequencies where most PCM defects are. Midrange detail is also "bigger" on a slower opamp, so opamp colouring compensated for PCM flaws.

So what does this all mean? In a nutshell, "warmth" and "warm playback" refer to dynamic accuracy as well as accurate frequency playback. Transistor amps also are inferior to valve amps here because the real reason valve amps sound warm isn't just even-harmonic distortion, but latency. Transistor amps are slower in treble and they also tend to distort more in treble than valve amps. Oh sure, the distortion can be more or less suppressed with feedback, but valves will still be quicker and thus more lifelike in treble/high frequencies than transistors. It's very easy to notice by playing electric guitar on a valve amp against a transistor amp: Strings have sparkle and dimension, whereas they get a kind of muddy dullness added on a transistor amp. Even if a valve amp adds its own noticeable colour, it's a less harsh colouring than on a transistor amp. And it can be further reduced with high-quality components.

The important conclusion though is this: Warm playback is defined not only by harmonic accuracy, but also dynamic accuracy. A "warm" device is one that is quick and lively, not just one that can accurately play noises at different frequencies more or less within the same straight line of little deviation. Transistor treble also tends to be slow and thus is perceived as "fake" unconsciously, perhaps one of the reasons why people become less sensitive to treble/high frequencies in records.

Zero-Crossing

First of all, go and listen to a DSD album with some really good hi-fi gear. Then listen to anything PCM on the same gear. Then notice the stilled, lifeless, dull transients of PCM. Playing DSD after PCM is like a fresh summer day in the country after grey concrete urban autumn full of slush. Why is this happening?

Transients.

Us human beings really don't listen to abstract noises as such, we listen to vectors. And our hearing is very sensitive to how a waveform changes, not just its intensity, but which way it's headed from its very birth. Human hearing is extremely sensitive, it resolves down to the movement of an object the size of a hydrogen molecule. In the same way it can resolve a waveform change the size of a hydrogen molecule. Music really is all about those subtle changes as well as power and dynamics and harmonics.

Odd Audio Buffs' Behaviour

One of the odd consequences of the unnaturality of CD sound is an obsession with upgrades, tweaking, and so on, all to improve and get at least a slightly different sound out of mostly flat and fakesy CDs. The problem is that this tends to emphasise some frequency ranges over others, and over time devices start getting around CD/MP3 flaws by making them less obvious: Darkening the frequency response, using slower opamps, even using aggressive comb and lowpass filters to remove harshest frequencies (Apple's approach for its DAPs). On the one hand, this does sort of work, camouflaging the harshest bits out of perception; on the other, better formats cannot play well on a darkened pair of speakers, as an example, as the treble/HF range is mostly cut off, so presence, expression and depth are gone.

This situation is obviously a vicious advantage for manufacturers, as they can keep pushing different kinds of sound and digital processing as new fashions. The main problem here is a vicious "democratic" approach to judging format acceptance by the likes of Sony and Philips: "If regular listeners can't make out differences between high-res and low-res PCM, let's leave the formats expensive and hard to source for those willing to shell kilobucks out". Never mind the utterly idiotic situation when a cleaner woman at a cafe gets moved by 192/24 classical records. How they manage that sort of a prejudice is a mystery, perhaps they just want to kill music with a dead format like CD audio and derivatives. Needless to say, everyone ought to make every effort to spread high-res formats in spite of this vicious anti-musical attitude.

Myths And Misconceptions

1. "Higher-res PCM formats' very high frequency resolution matters a lot". E. g. 192 KHz PCM in theory defines slightly less than 96 KHz signal bandwidth. Quite the opposite is the case, main hearing of most people deteriorates with age. Older people cannot hear much better than 16 KHz, even. Now there were experiments which have proven that humans are also affected by ultrasound (e. g. a forest noise's ultrasonic components affect brain state and give that soothing calm feeling), but in reality a higher-res digital format's frequency resolution matters much less than the finer sample grid. Worse, many amplifiers (and DACs?) are unsettled by very high-frequency harmonics. So it's only prudent to lowpass even 192/24 audio at a certain frequency, say, 30 KHz, to remove harsh harmonics on the analogue playback side. So to restate, it's the finer coordinate grid that matters (more freedom/definition for the waveform) and not high-frequency harmonics as such.

2. "Digital formats and signal transition are flawless". This is obvious nonsense. Digital encoding and decoding with a lot of processing (such as quantising) in-between are another kind of distortion that the originally analogue (in most cases) signal has to go through. A microphone's output is an analogue signal. Quantising it to a finite number of digital samples will inevitably make it coarser than the infinite-coordinate original current shape. Besides which, there's a bunch of ways in which even digital signals can be distorted, ironically because of an interruption in analogue modulation of digital transmission.

3. "Newer is better". Often it isn't. As an example, LDHC, Bluetooth's lossless PCM transmission protocol, only allows 96/24 PCM transmission maximum. Yet Bluetooth headphones and speakers have been getting popular in spite of their technical inferiority. Besides which, one would rather have a nicer proven DAC (like AK4556) handling decoding rather than a colder/harsher design in a Bluetooth speaker.

4. "CD audio has a wider bandwidth than analogue tape". By now it should be obvious it doesn't. Said "22050 Hz" signal bandwidth is a scam as it should be obvious from what was written above. Sometimes fanatism becomes ridiculous, a fellow in a tape deck group was once complaining about how his tape deck could not go "beyond 23 KHz" unlike a CD. Now strictly speaking type I cassette tape would really max out around 16-18 KHz definition, but those were real moving signal KHz, not CDs' fake 1 or 1.5 samples/cycle at the same frequencies. That's more of a "noise bandwidth" rather than practical musical record bandwidth.

5. "Digital distortion doesn't exist". Oh yes it does, plenty of it. The big difference is that most analogue distortion is positive, it adds up to the original signal. Digital distortion is negative, it deforms the original signal by not describing it accurately. Original signal is audible under analogue distortion, but you don't know what the original signal was under digital distortion unless you compare it with the original master. Good luck if it's open reel or DSD or high-res PCM; otherwise treble and low-volume details are lost forever. The best way to compare different formats is to listen to real instruments, record them with goodish microphones and then listen to the records.

6. "The same impression is conveyed no matter the format". That really is the problem, it isn't. Lower-resolution formats kill the magic. Meaning and artistic expression get lost with details. There's something in the treble that vanishes in CDs and the like. MP3 files are just plain dead, lifeless, music-less most of the time. The problem is that in many cases it's the electronics that kill definition too; consumer audio devices are just too coarse all too often.

7. And this leads to myth 7. "You need hi-fi gear to even hear the differences in high-res formats". Quite the opposite is true. CDs and even MP3/AAC files sound better on hi-fi speakers/headphones. 96/24 PCM already plays more lively on cheapest headphones, noticeably better than CDs.

8. "But there is no noise as with vinyl!". Part of the problem with this attitude is that the goal of a record is to transmit as much of the original performance's magic, art as possible. So the real difference between formats is how much of the original performance is transmitted, rather than any noise. Which is what many people don't listen for, even though they ought to.

You have no rights to post comments