Specifications help us to understand how an item works, be it a microphone, preamplifier, signal processor, speaker or even a length of a cable. Specifications provide an insight into the quality of design and construction, and guide us as to the suitability of a product for a given purpose. Listed on many specifications are many terms that may be unfamiliar to some readers. We feel that to understand our specifications it is important to understand the meaning of the technology terms behind the specifications.
Signal-To-Noise Ratio (SNR)
All electronic devices generate or pick up noise from their own or surrounding components in the system. The PC is an incredibly harsh environment for quality audio processing and therefore the design of an audio card is critical to the quality achieved. SNR is the difference between the maximum signal and the noise floor measured from a device's outputs and is rated in decibels (dB). Therefore a SNR of 95dB means there is a difference of 95 decibel units between the maximum output and the noise floor. A larger SNR value translates to better clarity of sound and means that the music will have much better overall dynamics and is more pleasurable to listen to. SNR can be shown as 100dB or -100dB but both results mean the same thing. An important point to note is that for every 6dB increase in SNR there is doubling in the audio quality. So you can see that even achieving 1dB extra provides a massive boost in audio quality!
Look For: High Numbers (100dB is better than 90dB)
Total Harmonic Distortion (THD)
As with noise, a device's circuitry can cause harmonics of the original signal to be created and mixed with the original signal. If a device generates or plays a signal at say 2kHz, harmonics of that signal would be seen at 4kHz, 6kHz, 8kHz, 10kHz, etc, etc. Although harmonics are not reflections or echoes the analogy does serve well to explain how harmonics affect the signal. Just as an echo gets quieter after every instance so the harmonics are smaller at every interval of the original signal's frequency. It is called harmonic distortion due to the fact that the harmonics distort the shape of the original signal thereby affecting the quality of sound heard.
The original signal is known as the "Fundamental" and the THD figure is calculated by feeding a pure sine wave into the audio device and then measuring the amplitude (height) of both the harmonic and the fundamental and then dividing the former from the latter. THD is shown as a percentage and signifies the percentage distortion of the fundamental.
Total Harmonic Distortion + Noise (THD+N)
Standard practice dictates that when specifying THD figures the noise generated by the device (and therefore contributing to the signal's distortion) should also be factored, as THD only is not a "real-world" number. Therefore combining THD with the noise generated gives a more realistic view of the quality of a component or device. This combined figure is shown in percentage terms.
Look For: Low Numbers (0.05% is better than 0.1%)
Bit-Resolution & Dynamic Range
A bit is a unit of measure that indicates the resolution of a digitized sound sample. Think of it as using a ruler that only has centimeter marks and no millimeter markers. If a measurement was required that fell in-between two of the cm markers you would have to make an approximation of the measurement. Dependant on what the measurement was for this could be disastrous to the outcome, building an airplane for example requires nanometer level precision in places. So the more markers on a ruler the more accurate a measurement can be taken. The same principle applies to bit-resolution. The more bits there are to measure the original analog source the more accurate the digital representation.
The formula for bits vs. dynamic range is simple: for every bit a signal contains, the theoretical dynamic range of the system increases by 6dB. An 8-bit system has a theoretical dynamic range of 8 X 6 or 48 dB, and a 16-bit signal has a theoretical dynamic range of 96dB. Dynamic range is the difference in dB between the maximum reproducible signal and the minimum reproducible signal in a system. In theory dynamic range is not the same as the signal to noise ratio. Signal to noise ratio takes into account the actual limitations of the system, which is almost always present as noise, and the maximum amplitude of signal that the system is capable of producing. No digital system ever reproduces its theoretical dynamic range since noise masks out the lower level signals.
For this reason and also due to the different testing methodologies the Dynamic Range and SNR numbers are often very close to each other and manufacturers rarely quote both figures. However, really good designs are capable of producing high 80s and low 90s dB performance on a 16-bit implementation and >100dB for 24-bit implementations.
Look For: High Numbers (90dB is better than 80dB)
The sampling rate (or sampling frequency) is the frequency with which samples are taken and converted into digital form. They are rated in Hertz (Hz) and are measured in times per second. Therefore a unit of 50Hz means 50 times per second. These days sample rates can be seen as high as 96kHz (96 thousand samples per second). Think of it as flipping the pages of a notebook to make a stick-man, drawn on each page, move. This moving image is made of many static images. In this way digital audio is made up of many "snap-shots" taken from the original analog source. The more snapshots there are the better the recorded audio sounds.
Look For: High Numbers (48kHz is better than 22kHz)
The maximum frequency a digital system can record or reproduce is half of the sample rate. A sample recorded at 25kHz will return a maximum high frequency signal of 12.5kHz. Sampling technology is imperfect, and the various filters used to clean up signals, prior to and after sampling, affect the "top octave" (high frequencies).
Signals between 10kHz and 20kHz represent an octave in transposition, but in reality, the only information contained in this area are the high harmonics and "musical noise" aspects of sound. Examples of this are the effect of resin used on a violin bow, the buzzing of lips on a brass instrument, and the general clarity of the recorded work (to which audiophiles assign words like "detail" and "air"). CD-Audio's sample rate is 44.1kHz and therefore its maximum reproducible frequency response is 22kHz.
When CD-Audio was developed it was known that the human ear could not "hear"
sound much above 20kHz and therefore it was believed that there was no perceptible
loss of quality. However this has been shown in many experiments to be incorrect.
In many tests users listened to both CD-Quality recordings and then recordings
with much higher frequency response and the results showed that the listeners
preferred the latter. Although the users could not technically "hear"
the higher frequency range it did positively affect their listening experience.
Without a doubt, the quality of the high frequency production in any system (analog or digital) has a profound affect on the listener. Typically also there is a lower range figure listed and the lower the number the better as this means that the system can reproduce extremely low bass frequencies that give much better impact to powerful sections of music.
Look For: Lowest to Highest Numbers (15Hz - 40kHz is better than 20Hz - 20kHz)
Bit RateTo cope with today's higher quality audio formats, components have to be able to operate at faster speeds and have more bandwidth. For example a piece of audio recorded at 24-bits/96kHz would occupy more space on a hard disk than a piece recorded at 16-bit/44.1kHz. This is because it takes more bits to represent the higher quality recording. If this high-quality recording is to be played back efficiently, without any skips or jumps the components have to be graded to support a higher bit-rate. Imagine an hourglass with each grain of sand being a bit. Your component now represents the thin channel where the sand falls through. The wider that channel the more grains of sand can fall through, meaning that you can pass more sand from one end to the other in a shorter time. In audio terms the higher the bit-rate the more detail can be handled efficiently or effects can be added in real-time. Bit-Rates are measured in bits-per-second or in the case of modern technology in megabits-per-second (millions of bits per second)
Look For: High Numbers (320Kbps is better than 128Kbps)
Analog-To-Digital Converter (ADC)
An electronic device or chip used to change or convert an analog (waveform style) signal into a digital signal (made up of 1s and 0s). Analog signals take the form of a wave like a sine wave. If you take a pen and draw on a sheet of paper a continuous line that bends up and down like a wave or serpent you will see a graphic depiction of an analog signal. Analog signals are constant, continuous waveforms.
Digital signals, on the other hand, are made up of discrete ones and zeroes. The ones stand for an "on" state and the zeroes for an "off" state. They do not form continuous waves like the analog signal, but they are much easier to manipulate than analog signals and do not degrade with repeated manipulation. For instance, it is very simple to multiply a digital signal, split it up, invert it and then send it to multiple locations. An analog signal, however, is much more difficult to affect in a similar way and during the process the waveform is altered and noise is added resulting in an impure signal.
Analog signals are converted to digital signals for a variety of reasons:
to improve processing power (the ability to manipulate the signals), to encode
analog signals for later playback through a digital medium (such as a CD or
DVD), and more. The conversion process entails breaking up the analog waveform
into thousands of narrow, individual slices. Each slice is the height of the
wave at that point. When the slices are lined up in the correct order, the
heights of the slices will closely track the waveform. Think of a pyramid
in Egypt - from a distance the pyramid's edges look like they are solid, straight
lines. However, when you get close to the pyramid you see that the edges are
actually made up of hundreds or thousands of individual blocks. The lines
up the pyramid are not exactly straight but are actually stepped.
The digital signal derived from an analog waveform operates in the same way. It is stepped like the pyramid but has so many steps that it seems to form a continuous line like the pyramid from a distance.
The purpose of the ADC is to take the waveform (the analog signal) and split it up into the thousands of tiny digital "steps" which simulate the wave. Once the signal is in the digital domain, it can be easily copied and manipulated with no degradation and with enhanced capabilities.
Digital-To-Analog Converter (DAC)
Electronic devices that decode digital data (ones and zeroes) into an analog waveform (electrical signal) that can be amplified and played by loudspeakers.
When an analog signal is recorded onto a digital medium, it is split up into thousands of very thin slices. Each of these slices is given a height and an order, and then the information is digitally stored. When digital signals are played back, the thousands of slices are lined up in the proper order. The digital to analog converter forms a solid, flowing line from the tops of the slices to create a continuous, analog waveform. In the example of the pyramid, if you drew a line up the ends of the blocks, you would get a straight line. Digital encoding works the same way. It would save a triangle like a pyramid as a series of slices or blocks of varying height. When decoded, a line would be drawn from block to block recreating the pyramid shape.
One important factor is having enough individual slices to portray the analog
waveform accurately. The digital medium must have a high resolution - just
like with a video display you can see the individual picture elements at low
resolutions but at high resolutions they merge to form a complete, high-quality
image. The number of samples (individual slices) varies by the sampling rate,
which is given in kilohertz, kHz (44.1kHz being the common sampling rate for
audio CDs). The higher the sampling rate, the more individual slices of the
signal are created. The DAC uses the individual pieces to recreate the original
analog signal and thus allows digital music to be played over analog loudspeakers
(all speakers and amplifiers operate on an analog level).
The other important factor js quantization during the decoding process, which is measured in bits( i.e. 8-bits, 16-bits or 24-bits). The number of bits used is often referred to as the "word length". For each sample or slice of the analog signal, the height must be specified. Quantization works by taking the maximum signal level and dividing it up into pieces. These pieces are used to measure the height of the sample similar in effect to using a ruler with gradation marks. The greater the word length (more bits used), the finer the gradations. Using a small number of bits is like measuring using only inches while using a high number of bits is like measuring uses millimeters - using more, smaller units allows more precision and equates to a more accurate signal reproduction.
Most digital playback devices (CD, DVD, laserdisc, etc.) include a DAC. There
are also separate DACs available that improves the sound quality through the
use of higher quality components than those typically found in digital playback
devices allowing an improvement in sound quality. Any digital playback device
with the appropriate digital output can be connected to an external DAC to
improve its audio output quality.
The EAX API was devised to allow games developers to add complex reverb effects to their interactive, in-game 3D audio. The first soundcard to support EAX 1.0 was the original Sound Blaster Live! Card, but other audio solutions soon appeared that supported EAX 1.0. This was by design, as Creative Labs made the API specification public shortly after Sound Blaster Live! was released.
EAX 1.0 introduced the fundamental concept of environmental audio effect presets. Developers writing to Microsoft's DirectSound API used its property set mechanism to gain access to the real-time on-board processing capabilities of the Sound Blaster Live! via the EAX 1.0 API. They would first query the system to see if EAX was available on the audio device. If it was, they were then able to choose from, and switch between the various factory-preset environments, designed to simulate different acoustic spaces such as bathroom, hall and cave.
As 3D audio became more popular, it became clear that there were other aspects of the audio environment that were critical for enhanced game-play. Most importantly this included the need to simulate the effect of a sound being muffled by objects between the player and the source of the sound. With EAX 2.0, Creative set about addressing this need by coming up with the concept of a "listener" object and a number of "source" objects. Sources could be "occluded" or "obstructed" depending on their position relative to the listener and any objects in the game, such as walls, pillars etc., that lay between.
As with EAX 1.0, Creative made the decision to make the EAX 2.0 specification public. At the same time, Creative played a leading role within the Interactive Audio Significant Interest group (IASIG) to help to formalize an advanced 3D audio standard, based upon EAX, which became the I3DL2 specification. This specification served a valuable purpose in preventing a fragmentation of approaches to the delivery architecture for advanced 3D audio, but in reality, EAX continued to be the API of choice for developers, and thus the "de facto" standard for PC games.
EAX 3.0 (EAX ADVANCED HD)
With the introduction of the Sound Blaster Audigy series of soundcards, Creative needed to create a way for game developers to utilize the additional power that the Audigy processor offered. With the concept of environmental audio now firmly established with games developers, Creative set about the task of creating the next generation, EAX ADVANCED HD.
The EAX 3.0 API was developed around a completely redesigned environmental
reverb engine, more sophisticated than its predecessor thanks to the increased
DSP power that the Audigy processor offered. One of the major criticisms of
previous EAX versions was that there was no easy way to transition smoothly
from one audio environment to the next. To solve this, EAX 3.0 provided developers
with access to every single one of the reverb engine's parameters. Now, rather
than simply switching from one environment to another as the player moved
around the game world, the technique of "morphing" from one effect
to another became possible. EAX 3.0 also introduced control over "clustered
reflections", which provided a very simple method of rendering highly
realistic proximity effects, especially when combined with "environment
panning". Environment panning allowed developers to place both the early
reflections and the late reverb components of an environment anywhere in 3D
EAX 4.0 (EAX 4.0 ADVANCED HD)
While EAX 3.0 took advantage of the Audigy processor's increased power, there was still more processing available for rendering additional (secondary / tertiary) environments as well as other effects. The EAX 4.0 API provided developers with access to this additional processing power, and Audigy owners were able to download and install an updated driver that supported it. With EAX 4.0, developers could build incredibly sophisticated soundscapes using "Multi-Environment" effects. In addition to reverb effects, EAX 4.0 also provided access to special effects, including distortion, flanger and auto wah.
EAX ADVANCED HD Music Enhancement Tools
EAX ADVANCED HD technologies go beyond gaming to enhance recording and playback of music and other audio. "Audio Cleanup" is an advanced processing feature that automatically identifies and removes annoying pops and clicks, as well as background hiss from analog recordings made to cassette, or vinyl recordings (LPs). "Time Scaling" is another powerful EAX ADVANCED HD technology, that lets you vary the length or tempo of recorded audio but without changing the pitch. This is extremely useful for adjusting the playback speed of recorded speech, so that rapid dialog can be better distinguished, or slower dialog reviewed more quickly. "Bass Boost" can be used to add depth to less than perfect recordings, or even to compensate for a connected speaker system with limited bass response. These are just a few of the music enhancement features that EAX ADVANCED HD delivers. For more information please refer to the Reviewer's Guide or Whitepaper for the respective product.
A3D was a proprietary 3D audio API developed by Aureal prior to Microsoft's
DirectX 5 SDK becoming available. Up until DirectX 5 was released, there was
no way for a DirectSound3D application to take advantage of dedicated audio
hardware. Aureal solved the problem by creating their own alternative API
and driver, but in so doing, created a new problem. Now developers had to
choose whether to create 3D audio just for one soundcard. The problem was
eventually resolved with Microsoft's DirectX 5 implementation of 3D hardware
support, allowing soundcard manufacturers to develop DS3D drivers for their
products that developers could identify and take advantage of.
As for extending beyond basic 3D positional audio, Aureal tried to develop a real-time geometry based system, which they named "Wavetracing". This was an ambitious attempt to "model" the effect of sound paths in real time, but the few titles that used it were not able to demonstrate any effective improvement in the 3D audio experience, and gameplay suffered from the intensive CPU load that Wavetracing incurred.
The Open Audio Library (OpenAL) is a cross-platform, open standard 3D Audio Application Programming Interface (API) that unlocks the full potential of Sound Blaster cards. In Windows Vista, it can be used to enable hardware audio acceleration and real-time effects of the Sound Blaster Audigy and X-Fi ranges, with a complete fallback for host based audio rendering
Utilizing a technology called MLP, (a lossless compression format from the
Meridian Audio Group and MLP Ltd. of the UK, and licensed by Dolby Laboratories),
DVD-Audio discs can store up to six channels of 24-bit/96 kHz audio, including a separate low-frequency bass channel. Compare that to two channels of 16-bit/44.1 kHz sound from CDs. For real audiophiles, a DVD-Audio disc can additionally hold two channels of 24-bit/192 kHz audio, for stereo recordings that really put CD-Audio to shame.
What does DVD-Audio sound like?
The 24-bit/192kHz or 96kHz linear PCM sound exhibits sharp transient response, with clear reproduction of high frequency instruments such as cymbals and rich, authentic timbre in the mid and low frequencies. Rapid high-frequency passages show excellent definition of individual notes, while stereo imaging presents a precise sound stage with great depth and solidity. Also, multi-channel capability means you can experience the acoustics of the best concert hall or put yourself in the middle of an orchestra. The natural richness of the sound makes the performance come alive; delivering a convincing surround effect that immerses you in audio.
The difference between DVD-Audio and CD-audio
Although the introduction of the Audio-CD brought wide dynamic range, it also cut out the frequencies higher than 20kHz. As discussed previously, there is strong evidence to show that these higher frequencies affect human perception of the overall quality of music.
DVD-Audio extends high-frequency playback to 96kHz (compared to 20kHz for the audio CD) and expands maximum dynamic range to 144dB (compared to the audio CD's 96dB). Maximum playback time is 400 minutes (at audio CD resolution), or more than 74 minutes using 6 channels at 24-bit/96kHz resolution.
The difference between DVD-Audio and DVD-Video
DVD-Video is a digital video format that generates pictures in amazing detail, thanks to over 500 lines of horizontal resolution, more than twice the resolution of VHS tapes. In addition, DVD-Video produces audio at CD-quality in 2-channel stereo and Dolby Digital 5.1 surround sound (which is a compressed audio format and therefore lower than CD-Quality) with the proper optional components.
DVD-Audio is a new digital audio format, which will do for music what DVD-Video did for movies. It combines greater bit-depth, higher sampling frequencies, massive data storage capabilities, innovative multi-bit D/A converters, improved circuit design and improved electronic components to produce the greatest source of home audio that current technology has to offer.
DVD-Audio is different from the audio that is provided on DVD-Videos. Most DVD-Video based audio is 16-bit to 18-bit depth and 48kHz resolution. It is typically encoded into Dolby Digital format (Dolby Digital 5.1 and Surround EX), which incorporates a 12X lossy compression technique or DTS, which incorporates a 7.5X compression algorithm. This encoding technique allows for multiple channels of audio in multiple languages to fit along-side a movie on a DVD disc. Although adequate for movie sound tracks (consisting of primarily dialogue, sound effects and background music), it is inadequate for high quality audio reproduction. For this reason DVD-Audio incorporates a new technique called Meridian Lossless Packing (MLP). This allows the audio to be compressed, but unlike Lossy formats (where data removed during the compression stage is lost), Lossless compression is able to re-create and re-insert the removed data so that the de-compressed audio stream is exactly the same as the pre-compressed version. For more information see sections 3.2.2 & 3.2.3 below.
So DVD-Audio, with its higher sampling frequency, bit depth and superior compression algorithms, takes digital audio to an even higher level. It provides ultra-high performance 2-channel stereo sound for accurate reproduction of precise audio details, contributing to that feeling of "being there" and produces surround sound like you have never heard before.
What kind of content is being released on DVD-Audio?
Virtually every type of music is becoming available on DVD-Audio. Jazz, classical and all other genres that can benefit from outstanding sound quality is being represented, showing off the high resolution and wide dynamic range of 2-channel stereo and the creative possibilities of multi-channel recording. Famous recordings are expected to be re-mixed from multi-track master tapes, bringing classic performances to life with remarkable ambient realism. Interactive features are also being incorporated to create a new style of captivating entertainment. Even very mainstream artists, such as The Corrs and Britney Spears, are now releasing on DVD-Audio in parallel with CD.
Undoubtedly the music industry sees the DVD-Audio format as an exciting opportunity for them to introduce a new format, which will be able to deliver a significantly higher quality than even the highest quality compressed audio format such as WMA or MP3. In addition DVD-Audio content incorporates encryption and copy protection techniques that will not allow copying or conversion to compressed audio formats.
DVD-Audio and copy protection
DVD-Audio Discs are encrypted and are guarded with digital copy protection. It employs a technique called CPPM, which is different from CSS II used in DVD-Video Discs. CSS II protection for DVD-Video has been facing copy protection challenges by some "DeCSS" software hacks available easily from the web to download.
CPPM, which stands for Content Protection for Pre-recorded Media, has been
developed by 4C (comprising IBM, Intel, MEI and Toshiba) and includes watermarking
as part of the copy protection scheme. This system allows music tracks to
be recognized and the copy protection systems to be triggered. When the content
is supplied to an analog input on a recording device, the watermark will remain
intact in the analog domain but will not be noticeable in listening tests.
If this analog recording were to be written to a digital format the watermark
would then assert itself and make the digital copy unplayable
MLP vs. Compressed Audio (AC3, MP3, WMA)
Unlike MP3, WMA, Dolby Digital (AC3) and DTS, which are all based on perceptual encoding techniques, i.e. lossy compression technology, MLP (Meridian Lossless Packing) compression is a lossless compression technique which is able to produce bit-for-bit accuracy of the original high quality music content.
The DVD-Forum chose MLP in order to enable playing time of DVD-Audio discs
to be at least 74 minutes per layer at the highest quality. DVD-Audio offers
a maximum bit rate of 9.6Mbps, higher than the 6.144Mbs possible with DVD-Video
but not high enough for 6 channels of 24-bit/96kHz audio, which would require
a bit rate of 13.8 Mbps. MLP reduces the bit rate to less than half of this,
which increases the playing time from 65 minutes to at least 74 minutes and
still allows room for extras such as still images, menus, text and video.
MLP also provides additional features, including the possibility to choose
the quantization in one-bit steps and allow longer playing times without noticeably
DVD-A vs. CD vs. AC-3 vs. MP3 vs. WMA, etc.
There exist many digital audio formats; each audio format has its pros and cons but also a purpose. Some of the audio formats compete in the same space, i.e. MP3 vs. WMA or AC-3 vs. DTS, while some emphasize on the audio fidelity. The below table summarizes the key audio formats
Super Audio CD (SACD)
SACD is a format developed by Philips and Sony as a competitive solution to DVD-Audio. SACD offers similar features to DVD-Audio, including high-quality, multi-channel playback and incorporates copy-protection techniques, but it also incorporates a hybrid disk format as standard.
Sony and Philips, being part of the DVD Forum, used the DVD format as a basis for SACD. The SACD specification is contained in the ISO "Scarlet Book" format and uses the same file system, sector size, error correction and modulation as DVD discs. However on SACD one of the layers is used for CD backward compatibility. This means SACD discs can be played in a SACD player to listen to the high quality tracks, or in a traditional CD player to listen to the CD-quality tracks.
SACD offers high-quality, multi-channel playback and incorporates copy-protection techniques. The key specifications are as follows:
· Up to 100kHz bandwidth (Frequency Response)
· 120 dB dynamic range
· Full quality for all channels
· Hybrid disc (CD and DVD)
· Watermarking and copy protection
· Text, graphics and video
The copy protection scheme uses a watermarking technique called Pit Signal
Processing (PSP), which, it is claimed, cannot be copied by any known piracy
process. A visible watermark is also incorporated so as to allow immediate
visual identification of genuine discs.
Direct Stream Digital (DSD)
Whereas the DVD-Audio format uses MLP, Sony and Philips chose DSD as the encoding format for SACD because it avoids using PCM, which is deemed to be an unnecessary intermediate format. DSD is claimed to offer high quality audio with lossless compression and to be more future proof than PCM. DSD's specifications include 100kHz frequency response and 120dB dynamic range on all channels. Philips and Sony recommend that recording studios use DSD and convert to CD audio using a process called Super Bit Mapping Direct.
But independent studies have concluded that DSD (also called 1-bit sigma
delta) suffers from a number of problems that makes it unsuitable for archiving
and, possibly, distribution. These problems include non-linearity and high
frequency noise. DSD is also not easy to edit without converting to PCM. One
conclusion is that DSD makes digital to analogue conversion easier and cheaper
than past processes, but PCM provides a more reliable and accurate representation
of the music.
The hybrid CD/DVD disc format allows playback on both CD players and SACD players. This is possible by molding the CD-Audio pits on the outside of the otherwise blank DVD substrate (see Figure 1) and using a semi-reflective layer for the DVD metallization, thus allowing the CD-Audio layer to be read by a conventional CD player.
Backward compatibility of SACD on traditional CD players will mitigate consumer fears of format obsolescence. Unfortunately, SACD production is difficult and therefore more costly than normal CD. However despite this the DVD-Audio forum has also chosen to adopt a hybrid disk format in future DVD-Audio releases.
SACD vs. DVD-Audio
The unique characteristic of SACD's optical pick up requires a special laser, which can only be achieved by using a custom drive. In Comparison, DVD-Audio is built upon standard DVD-Disc structure that is already used on DVD-Video Discs. It is compatible with standard DVD-ROM drives, where the PC install base has grown exponentially over the last few years. Applications for DVD-Audio will undoubtedly be developed on the PC platform to complement DVD-Video in offering a very high quality format for surround sound music..
Dolby Pro-Logic was Dolby's first universally adopted technology for implementing a surround sound effect. It cleverly used the traditional analog stereo soundtrack channel to insert a left, right, center and surround channel using a matrix encoding process. This meant that if played through a Dolby Pro-Logic decoder you would hear these channels otherwise the content would be seen as a normal analog stereo channel. The surround channel is mono, meaning that the rear two speakers output exactly the same audio. This means that although a surround effect is attained it is not fully immersive and compelling.
Dolby Pro-Logic II
In response to consumer demand for a more compelling surround experience Dolby released Pro-Logic II. The key difference is that when playing back Dolby Pro-Logic encoded sound tracks, separate rear left and rear right channels are created by virtualizing the rear mono channel. For the first time a 5-channel surround experience could be created from any stereo source. Neither Dolby Pro-Logic nor Pro-Logic II made use of a discrete LFE channel (Subwoofer), although amplifiers could have a bass speaker attached using bass-management techniques built into the amplifier and speakers. Neither did they use a discrete center channel. As Dolby Pro-Logic II was a pre-cursor to the full Dolby Digital 5.1 experience Creative have not implemented support for this iteration of Dolby technology in its range of Sound Blaster products. Also for PC usage Dolby Pro-Logic II is redundant as a technology having been superceded by technologies such as CMSS (Creative Multi-Speaker Surround). More information on CMSS is available in section 1.2.6.
Dolby Digital 5.1
With the onset of digital media such as DVD, Digital Cable and Digital TV came the possibility to physically encode 5.1 discrete channels of audio into a movie or music sound track. This format, called Dolby Digital 5.1, uses an encoding format called AC-3 (Audio Coding 3). The 5 channels are laid out as shown in the picture on the right, with 5 discrete surround channels, while the ".1" provides a discrete LFE channel. This allows the audio to be cleaner while also being far more powerful - especially for movie effects that require heavy bass, like explosions. The audio is compressed up to 12X to allow it to fit onto a DVD disc. It is widely accepted that for music playback, Dolby Digital 5.1 is quite poor due to its low frequency response and dynamic range. However it has proven itself as the dominant format for multi-channel movie audio.
Dolby Digital Surround EX
The Dolby Digital 5.1 format is able to provide incredible audio movement in movies but there is still one shortcoming. Audio that moved around the back of a listener or over the top from rear to front or vice versa remained a little flat. For this reason, Dolby developed the Surround EX format. It uses the same encoding techniques as Dolby Digital 5.1, however an additional rear center channel is matrix-encoded onto the left and right surround channels. This means that a single rear center speaker (6.1setup) or pair of rear center speakers (7.1 setup) can be utilized and the rear-center channel is effectively virtualized from the left and right surround.
Choosing a 6.1 or 7.1 setup depends on the size of the room. In a small to medium room, such as a home living room, a 7.1 setup would serve little purpose as the whole rear surround effect will be muddled by having two speakers outputting the same audio stream. 7.1 setup is more important in larger rooms such as theaters, where you want all listeners to have the best effect no matter where they are seated.
Just like all previous implementations of Dolby encoding, Dolby Digital Surround EX is 100% backward and forward compatible. This means that you can play a Dolby Digital 5.1 track on a Dolby Digital Surround EX decoder or even play Dolby Digital Surround EX content on a 5.1 decoder (except the sixth channel will not be reproduced).
DTS-ES (Extended Surround) is a new multi-channel digital signal format developed to greatly improve the 360-degree surround impression and space expression of movie audio thanks to further expanded surround signals, while also offering high backward compatibility to conventional DTS Digital Surround formatted audio. In addition to the 5.1 surround channels (FL, FR, FC, SL, SR and LFE), DTS-ES Extended Surround also offers a "Rear" channel for surround playback with a total of 6.1 channels. This format has been used professionally in movie theaters since 1999 and includes two signal formats with different surround signal recording methods, as described below
DTS-ES Discrete 6.1
DTS-ES Discrete 6.1 is the newest recording format. With it, all 6.1 channels (including the Rear channel) are recorded independently using a digital discrete system. The main feature of this format is that because the SL, SR and Rear channels are fully independent, the sound can be designed with total freedom and it is possible to achieve a sense that the acoustic images are moving about freely among the background sounds surrounding the listener from 360 degrees. Though maximum performance is achieved when sound tracks recorded with this system are played using a DTS-ES decoder, when played with a conventional DTS decoder the Rear channel signals are automatically down-mixed to the SL and SR channels, so none of the signal components are lost.
DTS-ES Matrix 6.1
With this format, the additional SB channel signals undergo matrix encoding and are input to the SL and SR channels beforehand. Upon playback they are decoded to the SL, SR and Rear channels. The performance of the encoder used at the time of recording can be fully matched using a high precision digital matrix decoder developed by DTS, thereby achieving surround sound more faithful to the producer's sound design aims than with conventional 5.1- or 6.1-channel systems. In addition, the bit stream format is 100% compatible with conventional DTS signals, so the effect of the Matrix 6.1 format can be achieved even with 5.1-channel signal sources. Of course it is also possible to play DTS-ES Matrix 6.1 encoded sources with a DTS 5.1-channel decoder.
When DTS-ES Discrete 6.1 or Matrix 6.1 encoded sources are decoded with a
DTS-ES decoder, the format is automatically detected upon decoding and the
optimum playing mode is selected. However, some Matrix 6.1 sources may be
detected as having a 5.1-channel format, so the DTS-ES Matrix 6.1 mode must
be set manually to play these sources.
(For instructions on selecting the surround mode, see page 46.)
The DTS-ES decoder includes another function, the DTS Neo:6 surround mode for 6.1-channel playback of digital PCM and analog signal sources.
DTS Neo:6 Surround
This mode applies conventional 2-channel signals to the high precision digital matrix decoder used for DTS-ES Matrix 6.1 to achieve 6.1-channel surround playback. High precision input signal detection and matrix processing enable full band reproduction (frequency response of 20 Hz to 20 kHz or greater) for all 6.1 channels, and separation between the different channels is improved to the same level as that of a digital discrete system. DTS Neo:6 surround includes two modes for selecting the optimum decoding of the signal source.
DTS Neo:6 Cinema
This mode is optimum for playing movies. Decoding is performed with emphasis on separation performance to achieve the same atmosphere with 2-channel sources as with 6.1-channel sources. This mode is effective for playing sources recorded in conventional surround formats as well, because the in-phase component is assigned mainly to the center channel (C) and the reversed phase component to the surround (SL, SR and Rear channels).
DTS Neo:6 Music
This mode is suited mainly for playing music. The front channel (FL and FR) signals bypass the decoder and are played directly so there is no loss of sound quality, and the effect of the surround signals output from the center (C) and surround (SL, SR and rear) channels add a natural sense of expansion to the sound field.
CMSS was conceived as a way to allow users to take advantage of the new wave in multi-channel audio on the PC. For a long time before its inception the likes of Dolby had been touting their technologies, however these were very much planted in the consumer space. For the PC user there really was nothing available that allowed them to take standard two-channel audio and convert it to multi-channel in a meaningful way or to take multi-channel audio and enhance the down-mix so that it sounded better over headphones or two-speaker systems. CMSS filled that need and since its introduction in late 1998 it has developed from an interesting feature that some advanced users might use to a must-have feature that many users have permanently enabled so that they can enjoy their whole music experience in the multi-channel domain. The following sections outline the release timeline for each version/facet of CMSS along with
The original CMSS (now known as CMSS1) is shipped as part of the feature-set for the Sound Blaster Live! Its main features are:
CMSS1 Binaural Virtualization
In headphone playback, the Binaural Virtualization of 2-channel or multi-channel sources creates the illusion that the recording is actually being heard over loudspeakers located at the due positions in front or around the listener in the horizontal plane. This restores the natural acoustical cues at the ears when listening to conventional stereo or multi-channel recordings.
· Reduced sense of "inside-the-head" localization and reduced listening fatigue.
· Discrimination of frontal vs. rear sounds in multi-channel content.
This illusion is achieved via digital models of "head-related transfer functions" (HRTF) measured on human subjects. Since its initial release it has undergone several generations of improvements. For instance, the increased processing capacity of the Audigy chip enabled the implementation of more accurate HRTF models, resulting in a more convincing illusion.
CMSS1 Virtual Surround
When multi-channel sources are played back over two loudspeakers, the Virtual Surround process creates the illusion that the surround channels are actually being played over virtual surround loudspeakers located at the sides and rear of the listener.
· Restores the sense of immersion.
· Restores the discrimination of side and rear sounds vs. frontal sounds.
CMSS1 Virtual Surround technology relies on digital HRTF models (like the
CMSS1 Binaural Virtualization) and uses an acoustic cross-talk cancellation
process in order to accurately control the sound that reaches each ear of
the listener. The illusion is only effective for a single listener located
at the "sweet spot". This technology has also evolved and improved
since its original introduction in SoundBlaster Live!.
CMSS1 Virtual 3D Spatialization
When playing games over headphones or two loudspeakers, Virtual 3D Spatialization provides the ability to place and move sounds in 3D space around the listener. CMSS1 Virtual 3D Spatialization technology is also implemented in 4-channel playback mode to provide improved sound localization on the sides.
The methods used to achieve the CMSS1 Virtual 3D Spatialization effect are similar to those used in CMSS1 Binaural Virtualization and Virtual Surround. In loudspeaker playback, the spatialization effect is robust for a single listener located at the "sweet spot".
The Virtual 3D Spatialization technology has also evolved with Creative's
successive sound card product generations
With the launch of the Sound Blaster Audigy card (and chip) there was enough headroom to allow significant improvements in the existing CMSS1 features (as mentioned above). One additional algorithm was made available at this time:
The Multi-channel Upmix algorithm provides an immersive listening experience when playing two-channel sources over multi-channel surround sound systems, while preserving the integrity of the original stereo image. The front-rear balance can be adjusted via the Stereo Focus control.
· Adds a sense of immersion.
· More uniform coverage of the listening area.
The CMSS1 Multi-channel Upmix algorithm extracts ambience sounds from the two-channel recording and then applies additional digital processing before feeding them to the surround channels. This allows us to avoid spatial artifacts, such as spatial alterations of the original stereo image. This technology was improved in Audigy 2.
Sound Blaster Audigy 2 launches and delivers 6.1 audio. This called for a new algorithm to aid in refining the sound of audio upmixed to 6.1 (especially 5.1 content) and also to enhance the headphone experience. Also due to the further enhancements made to the original CMSS1 algorithms the name changes to CMSS 3D. The new algorithm was:
CMSS2 Virtual Acoustics
Virtual Acoustics simulates the reflections of sounds off the walls of a virtual listening room and can be used in any playback format.
· In headphone playback: more realistic listening experience, increased spaciousness
· In multi-channel loudspeaker playback: benefits similar to those of the Multi-channel upmix process (including adjustable Stereo Focus).
The headphone listening experience was further enhanced in 2003 for Audigy
2 ZS (by modifying the virtual room parameters).
The launch of the Sound Blaster Audigy 2 ZS saw a further addition to the CMSS 3D offering
In addition to CMSS1 and CMSS2, the Stereo Surround mode provides a third method for taking advantage of a multi-channel loudspeaker system when playing 2-channel sources. In this mode, no processing is applied to the incoming audio signal: it is simply routed simultaneously to all the available loudspeakers (with Stereo Focus control to adjust the front-rear balance). This allows users to simply turn their multi-channel speaker systems into a replicator for the standard stereo content.
Release Timeline vs. Product Releases
The releases of the CMSS technologies as compared to the product timelines
can be seen as follows:
CMSS/CMSS 3D Mode Selection
For Creative sound cards, the appropriate type of CMSS 3D enhancement is automatically selected according to the audio source format, the playback format and the CMSS 3D mode selected by the user. This is illustrated in the following table:
THX began life more than 15 years ago as a quality assurance and certification program for cinemas. It meant that for the first time customers could be assured of the same excellent audio experience when visiting different cinemas.
In 1985 Hollywood discovered that more people were watching movies on video than in cinemas. Certainly that was the low-point for movie theaters across the globe as the VCR became King, but since then equilibrium has been formed between cinema and home movie watching. Since 1985 THX Ltd, with acute foresight, has been investigating ways to ensure the same audio quality assurance in the home as they did for the cinemas.
For home theater, THX has become a seal of approval that identifies home
theater components such as sound processors and speaker systems that comply
with performance parameters of Lucasfilm THX and/or feature their proprietary
playback processes. These processes are applied to the signal after Dolby
Pro Logic or Dolby Digital decoding, and are intended to produce a listening
experience more like that of dubbing theaters where movie soundtracks are
Compressed Audio Technologies
Audio compression allows a digital audio file to reduce its file size significantly as compared to its original form. To demonstrate this concept let us look at a DVD-Movie disc. A DVD- Movie contains 6-channels of mono audio, or the equivalent of three stereo tracks. A stereo .WAV file recorded at 16-bit/48kHz will consume approximately 13.4MB of hard-disk space per minute. So if we wanted three 90-minutes worth of stereo channels, we would require 3.62GB of storage space. By comparison, the equivalent AC3 encoded material only requires approximately 302MB of space.
Traditional compression methods use so-called "Lossy" techniques to compress the audio. These techniques are very good at reducing the storage load, but poor when reproducing the original quality once decompressed. This is because some data is essentially "thrown away" when the file is being compressed. During the decompress stage, algorithms are applied to "reconstruct" the original data as much as possible. Imagine a Word document, if the document is compressed using a lossy format, the decompression program will have to analyze the data and reassemble it as closely to the original as possible. Inevitably, mistakes will occur and words might be mixed up so that "Fast" might become "Fact" or "Last". In a worst-case scenario, just one wrong "bit" can corrupt the whole document.
For this reason, compression programs that handle actual data files will
only use "Lossless" algorithms that, although will compress less,
is able to recreate the original data in its entirety. Audio unlike data can
lose a tremendous amount of "information" before it becomes too
horrible to listen to and new techniques like "Perceptual Coding"
allow data to be removed from the original material in a very intelligent
fashion that limits the degradation of the final audio.
In many ways human hearing is incredible, however it does have its floors. For instance dogs can hear a much higher range of frequencies than humans can - hence the fact that dogs will react to a dog whistle when we cannot even hear it! However this is not the only area where humans have "hearing issues". Another example is to imagine two people speaking. Person "A" is shouting whereas person "B" is whispering. When Person "A" is not talking, say in between breaths, you would be able to hear person "B". However when both are talking at the same time you would not be able to hear person "B" at all. In this scenario perceptual coding would work out the times when both sources were active and simply remove the obscured source. The principal is "if you cannot hear it then you don't need it". By removing this data from the source the eventual storage requirement is much smaller.
Let's now go back to the dog whistle. By removing the frequencies above the human hearing range (around 20kHz - 22kHz is the upper limit) the compression algorithm would again save a significant amount of space. In addition to improve quality as much as possible the frequencies at the edges of our hearing range (both low and high) are compressed more than those in the middle. In addition variable compression algorithms can be used to compress less aggressively when there is more happening and more aggressively when there is a less happening.
Using a combination of these techniques, along with others that are not discussed above, perceptual coding techniques have allowed formats like WMA and MP3 to offer vastly improved listening experiences to the original versions bringing them close to CD-Quality while reducing storage requirements significantly. However it is generally accepted that even with these improvements the playback quality of Compressed audio technologies cannot compete with that of DVD-Audio and so will therefore be seen as a portable format rather than a quality home music playback format.
MIDI stands for Musical Instrument Digital Interface. The MIDI protocol was developed to allow digital instruments to interface with other MIDI devices such as sound modules etc. MIDI is so effective because it requires very little bandwidth to operate. This is because the only data to be sent over the cable is "trigger" information. Each device in the MIDI chain has a Device ID number assigned, so a keyboard may be device 1 and a module may be device 2. By using respective software a user would set device 1 to trigger a sound in device 2. The sound itself would never travel over the cable, but when triggered would be played through speakers connected to the module. The types of MIDI control messages that are sent between devices are device selection, master volume, channel volume, modulation, effect selection, effect level, etc.
MIDI connectivity to soundcards is achieved by the dual usage of a joystick/MIDI
port, which is a 15-pin D-type connector or separate In/Out MIDI DIN connectors
(round type). Due to the exact timing nature of digital formats, MIDI has
also been widely used to trigger other devices. For example, light shows at
music concerts can be timed to precision or even Fireworks displays. However
the key usage for MIDI is for any type of music application.
Audio Stream Input/Output (ASIO)
Over the last few years PC performance and hard disk storage capacities in particular have improved. This meant that users were able to record audio into music creation applications (in many cases in conjunction with MIDI tracks). The problem occurred when simultaneous multiple track audio recording was required or simultaneous playback and recording of different tracks. In this case the soundcard drivers only allowed a stereo pair to be recorded while the operating system latency was too great to allow precise audio recording to occur. The latency is essentially the time difference from the sound leaving say a person's mouth, moving through the mic to the soundcard, being digitized and sent to the OS to be written to the HD. All this can take anywhere up to 400ms depending on the system. A latency of this magnitude would mean that the audio tracks would be out of alignment.
ASIO (Audio Stream Input Output) is a driver system that interfaces to the recording device and "opens up" all the I/O while also bypassing the OS' normal audio handling processes and allows extremely low latency audio recording to HD. Latency can be reduced to as low as 2ms, however anywhere around 10ms is sufficient while not being discernible to the human ear. ASIO was developed by Steinberg, a world-class supplier of Music sequencing and recording applications, the most famous of which is their Cubase range. It is a cross-platform protocol and has been adopted by all the key recording h/w and s/w suppliers. A more recent development was the release of the ASIO 2.0, the details of which are as follows:
A further development to the ASIO spec is ASIO 2.0. This new version supports two new features:
1. Direct Monitoring.
2. ASIO Positioning Protocol (Sample Accurate Positioning)
To be ASIO 2.0 compliant, software or hardware must support at least one of the above new features.
When this option in the Audio System Setup dialog is activated the monitored signal does not pass through the ASIO application, such as Cubase. Instead monitoring is handled by the actual audio hardware by instructing the ASIO driver for the hardware to send the audio from the monitored input directly back to a specified output. This allows virtually zero latency monitoring.
There are a number of options available in the Direct Monitoring options
window (as seen above). If "Tape Type" monitoring is selected, Direct
Monitoring will be activated for Record Enabled Tracks in Stop and Record
modes. If "Record Enable Type" monitoring is selected, Direct Monitoring
will be activated whenever a Track is Record Enabled in the Inspector
ASIO Positioning Protocol
On many occasions users wish to pull audio digitally from external devices, perhaps even while simultaneously recording into an ASIO application like Cubase and playing back other tracks recorded in Cubase. At these times it is very important to have all these various sources synced up. Typically an external device's clock and an ASIO applications clock would require syncing to allow accurate playback/positioning. The ASIO Positioning Protocol is a technology that ensures that audio in an ASIO application is in sample accurate sync with external devices.
This is generically known as "Word Clock Synchronisation" however the ASIO 2.0 Positioning protocol does in fact specify that synchronization occurs on two levels:
Level 1: Sample Rate (word clock sync)
If this type of sync isn't established, you may run into problems with for example
clicks and pops or distortion.
Level 2: Sample Position (time code sync)
If the two devices do not agree on time positions, inaccuracies in positioning
of the material will occur.
USB (Universal Serial Bus) is an interface standard that allows the connection
of up to 63 devices to a PC. These devices can vary from printers to hard-disks
to mice and it completely eliminates the need for the traditional parallel and
serial ports. As its name implies it is a serial based topology, which typically
operates in a master/slave mode, whereby the computer system contains a host
controller and the devices connect to it. Two slave devices (i.e. neither of
them contain a host controller) cannot therefore communicate. USB was proposed
by a consortium of companies (Compaq, Digital, IBM, Intel, Microsoft, NEC and
Northern Telecom) in 1995. USB is completely "plug and play", i.e.
it detects and configures all devices automatically, and allows "hot swapping"
of devices. The recent release of USB 2.0 brings three different speed variants
as well as an
"On-The-Go" variant as follows:
USB 1.0 (Known as Low-Speed USB)
This is the original USB implementation and supports only 1.5Mbps transfer. This is sufficient for very low bandwidth applications, such as mice, keyboards, floppy disks and memory card readers, etc. Motherboards with Low-Speed USB implementations cannot be found today (they will have migrated to Full-Speed USB or Hi-Speed USB implementations), although you may find Low-Speed USB devices.
USB 1.1 (Known as Full-Speed USB)
USB 1.1 supports up to 12Mbps transfer as well as being fully backward compatible with USB 1.0. The additional bandwidth allows more varied hardware applications such as printers, scanners, hard-disks and even audio products (although the available bandwidth limits the audio resolution to 24-bit/96kHz for Stereo or 24-bit/48kHz up to 5.1.
USB 2.0 (Known as Hi-Speed USB)
USB 2.0 supports up to 480Mbps (40X that of Full-Speed USB) and is suitable for virtually all applications. For instance there is no limitation as to the quality/resolution of the audio that can be passed over USB 2.0. For example Advanced Resolution DVD-Audio can be fully supported. The only limitation that remains is the lack of bandwidth to fully support audio features such as EAX ADVANCED HD real-time environment effects in games. For this reason the PCI platform will still be preferable for advanced gamers and musicians alike.
USB O-T-G is a new supplement to the USB 2.0 specification that augments the capability of existing mobile devices and other USB devices by adding host functionality for connection directly to other USB devices. This allows the devices (previously only as peripherals) to enable point-to-point connections with other devices. For example, a USB portable music player can transfer songs to a PDA (or mobile handset) and vice versa.
The USB consortium has devised the logos shown directly below. Products that support any of the USB standards may carry these logos if the manufacturer wishes to do so (although it is by no means mandatory).
ASIC (pronounced A-SICK) stands for Application Specific Integrated Circuit. In some cases a piece of hardware will require a very specific task to be carried out in an efficient and timely way such that a generic re-programmable chip will be inadequate and a custom designed chip solution has to be implemented. An ASIC is faster because it does not carry the overhead involved in fetching and storing stored instructions.
A Field Programmable Gate Array is a programmable logic chip device (PLD) containing a vast quantity of gates (transistors) that allow highly complex problems to be solved. FPGA can be highly sophisticated; including both programmable logic blocks and interconnects/switches between blocks. FPGAs can be used in situations where a task needs to be carried out, but does not require an ASIC solution.
Telephone Answering Device is an interface for internal connection to a standard voice modem.
Inter-IC Sound bus is a popular bus developed by Philips for interconnecting
digital audio between different integrated circuits.
HRTF (Head Related Transfer Function) uses a listener's body to create a simulated
surround effect. The shape of the human ear is optimized to efficiently locate
sound sources. They are also positioned to best catch the sound waves reflected
up off the chest and shoulders. Although many animals are able to physically
move or rotate their ears (which allows much better location of sound sources
than we are capable of), they are similar to humans in that you will often see
people turning their head from side to side to try to locate the source of a
sound. HRTF recreates this effect through two speakers. However it only works
perfectly if the listener is in the sweet spot of the speakers and providing
he does not move his head excessively.
Apple and Texas Instruments jointly created a technology called FireWire, which was later ratified to the IEEE1394 standard. It is also known as i-Link on newer Sony consumer devices, SB1394 on the Audigy and Audigy 2 range of cards and in some small cases as the High Performance Serial Bus (HPSB). IEEE1394 can operate at a peak speed of 400Mb/s, although different devices can operate in one of three speed modes, 100, 200 and 400Mb/s. IEEE1394 uses two different transfer methods depending on the device connected. Asynchronous transfer is the traditional computer memory-mapped, load and store interface. Data requests are sent to a specific address and an acknowledgement is returned. Isynchronous data channels provide guaranteed data transfer at a pre-determined rate. This is especially important for time-critical multimedia data where just-in-time delivery eliminates the need for costly buffering. It can support up to 63 devices, which can be anything from External hard disks to DV camcorders or even PC-to-PC LAN connections.