Primer on Audio PC
This page provides an introduction to various issues associated with PC based audio technology. Topics include:
Introduction to PC Based Audio
Today, almost every PC comes equipped with a sound card and associated speakers. The speakers and sound card, in conjunction with the computer operating system, facilitate the playing of audio sounds (music, speech, bird calls, or any other noises) so the sounds can be heard by the end user.
The sounds created by the speakers are actually waves of compressed air. The human ear is capable of discerning these waves, provided that their frequency resides within the audible spectrum. By their nature, these sound waves are continuous. Each wave has a discrete value of pressure associated with it and each of these possible values generates a different sound.
Before a computer can deal with sounds, the sounds must be converted to a series of ones and zeroes. This is a process called digitalization. To understand how digitalization works, let us imagine a microphone plugged into a computer sound card. The membrane on the mic moves back and forth according to the current air pressure. The microphone circuitry transforms this movement into a continuous and real-valued electric signal. The sound card in turn has a circuitry called an Analog-to-Digital converter or ADC. The ADC can convert this electric signal to a sequence of ones and zeroes. This conversion is done by taking periodic measurements of the electrical signal and converting it to an integer value. The frequency at which the measurements are taken is called a sample rate and the size of the integer (in bits) is called a sample size. The result of each measurement is called a sample.
The more samples per second, the greater the accuracy that will be achieved in reproducing the sounds, when they are re-played. The sample rate is measured in the number of thousands of samples per second and are expressed as a specific thousand of samples per cycle (second). A sample rate of 44.1 kHz is used for most CDs, 48kHz - for DVD. Some proposed future audio devices can go as high as 96kHz. A sample rate of 8.0 kHz would yield less than one fifth the samples of the CD quality recording and would be more suitable for dictation type applications than for recording music. Of course, a high sample rate means more data must be saved and the file size and disk space requirements increase accordingly.
The sample size indicates the size of the scale that will be used for each sample measurement and will usually either be 8 bits or 16 bits. A scale between 0 to 255 may be represented in 8 bits. A scale between 0 and 65,535 may be represented with 16 bits. So a 16 bit, sample size, provides much greater granularity and can represent the correct pressure much more accurately than an 8 bit sample size. Again, CD quality recordings are done with a sample size of 16 bits and recording such as dictation are done with a sample size of 8 bits. Some newer audio devices are capable of working with the sample size of 24 bits.
The last fundamental characteristic of the digital sound is a number of channels. To reflect the spatial properties of sound it must be measured in more then one place simultaneously. For example two microphones are generally needed to get stereo sound. Each independent measurement point produces one channel. In the vast majority of the cases, the number of channels is either one (mono sound) or two (stereo sound). New systems may utilize up to 6 channels (Dolby 5.1) and even more.
To summarize, each digital sound has three fundamental characteristics: the sample rate, the sample size and the number of channels.
Compression
Even in today's world of high speed Internet connections and enormous hard drives, audio files cannot be regarded as tiny or small. Competing compression methods have been developed to address various requirements, compressed music and compressed speech being two obvious examples. Each compression method will have its own advantages and disadvantages. The most often compared of these pros and cons will likely be the size of the resulting file weighed against the quality of the reproduced sound. Since the quality of the reproduced sound is most often a very personal judgment, users must always keep in mind the fact that their own experience with the various compression methods should be the most useful guide in assessing which compression method to employ.
Compression methods can broadly be categorized as either lossless or lossy. With lossless compression, the sounds are reproduced exactly as they were originally heard. Because reproduction is exact, lossless compression methods cannot compress a file to the same degree that a lossy method can achieve. In a Windows environment, the majority of compression methods use lossy techniques.
Lossy compression techniques actually eliminate some portion of the digital data stream that represents the sounds. The amount and type of data removed is dependent upon the purpose for which the compression method was developed. For example, a particular compression method may have been developed to allow for the playing of popular music. Such a method would attempt to reproduce the music with a great deal of quality. It may try to only remove the portions of the sounds that are near or beyond the hearing thresholds of most individuals. Using this approach, only a small difference in quality can be detected by most people.
A different compression method may be designed to record spoken words. With this type of compression, a much larger quantity of the sound spectrum may be discarded, because it is irrelevant to the purpose of recording speech. In this scenario, a much greater level of compression can often be achieved.
In addition to the sample rate, the sample size and the number of channels, the compressed sound has the characteristic called a bit rate. The bit rate tells how many bits per unit of time are needed to store the sound. It’s usually measured in Kbps (kilobit per second). For uncompressed sound, the bit rate is always the product of the sample rate, times the sample size times the number of channels. For compressed sound it’s lower than the above product and the difference can be viewed as a degree of compression.
Lossless Compression
A lossless codec compresses the sound data resulting in a smaller sound file that has exactly the same quality as the original file. Normally, a sound file compressed by a lossless codec has a larger size than that compressed by a lossy algorithm. Lossless codecs differ in their speed and degree of compression, and in the sound formats they support. You may use Total Recorder for lossless compression by encoding to the Windows Media Audio 9 Lossless format or to the Free Lossless Audio Codec (FLAC) format. Please note that it doesn't make any sense to apply lossless compression to a file which was previously encoded using a lossy codec. Such an action most likely will only result in a larger sound file, without any gain in sound quality.
Storing the Sound
Digital sound may be stored in many forms. Just to name a few:
- An audio file on a computer's hard-drive
- An audio CD
- A mini-disk
- A digital-audio tape
In the case of PC based digital audio, saving to a computer's hard-drive is the most important format. The source for PC audio will almost always be an audio file, of which there are numerous types and formats. Many software programs have been developed to play these audio files. The remainder of this page provides users with a basic understanding of some of the audio file formats and some of the associated audio software.
Introduction to Audio Files
Audio files that are recognized and played on a PC come in many different formats. Different file formats are usually associated with different file extensions. For example, just as an MS Word file might be named filename.doc (with the .doc extension), one format of audio file is a wav file and might be named filename.wav. Other audio file formats include mp3 files (filename.mp3), Ogg Vorbis files (filename.ogg), Real Audio files (filename.ra) and many more. All of these formats are digital representations of the original sounds. At some point, the sound in these files has been converted from a continuous "analog" signal, that may have come from a microphone, record player or other similar device, to a digital format. This conversion is done by circuitry usually found on the sound card within a PC. This circuitry is called an analog to digital converter (ADC). Similar circuitry is also used to convert the sounds back, from digital to analog, and is called a digital to analog converter (DAC). The DAC circuit is used to convert a digital file so it may be played back through the speakers on a PC.
As previously discussed, most digital audio files will share several attributes that describe the properties of the recorded sound. These properties include Sample Rate, Sample Size, Number of Channels, Bit Rate and the compression technique used.
WAV Files
WAV files are the most common standard for recording audio in a Windows environment. However, with the demand for various forms of audio increasing exponentially, many competing formats are available. Commonly, WAV files may be identified by their .wav extension. Within the WAV format, quite a number of different compression methods may be used. A number of these compression methods are available as a standard part of the Windows operating system.
Compression CODECs
In a Windows operating system environment, methods of audio compression can be implemented in special programs called CODECs. There are a number of CODECs that come as a standard part of the various Windows operating systems. CODEC compression programs commonly come in files with an .acm extension. For example, tssoft32.acm is the CODEC file for a compression algorithm called DSP Group TrueSpeech.
If a CODEC is installed properly, Windows will facilitate the use of the CODEC by any audio program running on the PC. The programs may use the CODEC to encode (record) or decode (play) audio files. The compression method that was used to compress an audio file can be identified with information stored within the audio file. Saving this information allows the correct CODEC to be selected later, when the file is decoded for playback.
A number of the CODECs that come as a standard part of the Windows operating system are included for compatibility reasons. Their inclusion allows for interoperability between the Windows environment and other specialized audio systems. Additionally, a number of more general purpose CODECs are included. A subjective overview of these general purpose CODECs is provided below. Users should remember that these explanations are provided on an "as is" basis. If there is any doubt, the user should review the available compression methods and make their own judgments about the one best suited for their application.
Comparison Chart of Audio Formats
Format |
Attributes |
Megabyte per hour (approximate) |
Characteristics of source sound |
DSP TrueSpeech |
8.0kHz,1 bit, mono |
4 |
Low-quality voice recording |
Lernout & Hauspie |
8.0kHz,16 bit, mono |
9 |
High-quality voice recording |
MP3* |
11.5kHz, 16kBit/s mono |
4.5 |
High-quality voice recording |
22.05kHz, 56kBit/s, stereo |
25 |
Low or middle quality music
|
44.1kHz, 128kBit/s, stereo |
56 |
Near high-quality music |
44.1kHz, 192kBit/s, stereo |
82 |
High-quality music |
PCM |
44.1kHz,16 bit, stereo |
605 |
High-quality (CD quality) recordings |
PCM (High Quality) |
96kHz, 24 bit, stereo
(available for Professional and Developer Edition users only) |
1978 |
DVD Audio, Super Audio CD recordings |
WMA voice |
20kBit/s, 22.05kHz, mono |
9 |
High-quality voice recording |
WMA lossless |
VBR Quality 100, 44 kHz, 2 channel 16 bit |
350 |
High-quality recordings |
Flac lossless |
96kHz, 24 bit, stereo |
1300
|
DVD Audio, Super Audio CD recordings |
* For recording in MP3 format, Total Recorder can use MP3 codecs, installed in your system. Please note that the different versions of Windows contain different MP3 codecs that support different MP3 formats. Total Recorder can also use other programs (such as dlls) to create high quality mp3 files. Please see our mp3 page for details.
Please also note that MP3 files can either have a standard RIFF-WAVE header (such files usually have a .wav extension) or not contain any special headers (these files usually have a .mp3 extension). Most MP3 files do not have a RIFF-WAVE header since MP3 format contains all data required for its decoding.
DSP Group TrueSpeech
The DSP Group TrueSpeech CODEC was written by DSP Group of Santa Clara California. This compression method was written specifically to address the requirement of recording human speech. This method removes a considerable portion of the potential sound spectrum. However, the removed data has little impact on a listener's ability to understand what was said. The algorithm "rounds off" many of the highs and lows associated with the tones found in the original spoken words. Listeners may no longer detect some of the emotions that may have been inferred by these tones, but the actual words will remain quite clear and discernable.
This CODEC is an excellent choice for recording dictation. The CODEC supports a Sample Rate of 8.0 kHz with a Sample Size of 8 bits and mono recording. One hour of TrueSpeech recording can fit into about 4.5mb of disk space. We suggest the use of TrueSpeech for Dictation type applications where the speaker's emotions are irrelevant.
Lernout & Hauspie
There are several Lernout & Hauspie CODECs included with Windows. All of these CODECs have a small sample rate and were written specifically to address the requirement of recording human speech.
The CODECs have a Sample Rate of 8.0 kHz with a Sample Size of 16 bits. All of the CODECs are mono. The extra size of the Sample Size provides additional tone to these recordings, but will also increases the file size. A file recorded with the Lernout & Hauspie SBC 16kbit/s CODEC will require less than 9mb of disk space for a one hour recording. We suggest the use of a Lernout and Hauspie CODEC for recording applications that require a level of quality beyond that provided by TrueSpeech. Recording telephone conversations would be a good example of a use for this CODEC.
PCM
PCM is a completely uncompressed sound format. Because it is uncompressed, there is no loss of quality due to the deletion of data. Total Recorder supports PCM files with sample rates from 8.0kHz up to 48.0 kHz, a sample size of both 8 and 16 bits and support for both mono and stereo. Professional/Developer Editions of Total Recorder can record and play high-quality PCM files (up to 192kHz, 24 and 32 bit float mono and stereo if these formats are supported by the soundcard installed on the PC.
This format is best used when the size of the file is not an issue. This would be the case if the file was to be quickly moved to CD.
It is also recommended that you record and save files in PCM format if you eventually plan to digitally process (e.g. mix, apply noise suppression, equalization etc) them.
MP3 Files
Please follow this link to review our page on MP3 files.
WMA Files
Please follow this link to review our page on WMA files.
Ogg Vorbis Files
Please follow this link to review our page on Ogg Vorbis files.
FLAC Files
Please follow this link to review our page on FLAC files.
Audio CDs
The audio found on CDs played in home and car stereos is not in a file format that is recognized by Windows. Windows Explorer will see .cda files on this type of CD. These .cda files are not audio files and you cannot copy the audio portions of this type of CD with Windows Explorer. These types of CDs are sometimes called CDA CDs or CD Audio CDs or Red Book Audio CDs (the original specification was published in a red book).
Users wanting to create their own CDA CDs must have a CD burner to write the CDs, software to control the CD burner and a source for the audio they want to place onto the CD. Usually, the software to control the CD burner comes packaged with the burner. The audio source is usually a wav file found on the PC's hard drive.
When writing to a CD, the software program that controls the burner will ask the user if they want a music CD or a data CD. If the user chooses a music CD, then the software will convert the source file/files to CDA format and the resulting CD may be played in a home or car stereo. If the user specifies that they want a data CD, then the program will not convert the file/files, but will instead copy them as a wav file or mp3 file or whatever file type was specified to be copied. The data type of CD cannot generally be played on a standard home stereo system (one exception to this is if an mp3 file is copied to CD and played on a player that supports the mp3 format).
Users should note that High Criteria develops a number of software programs that are designed to record audio into wav or mp3 files. These files may then be moved onto a CDA CD using the software that came with the CD burner. High Criteria does not supply the software that actually writes the wav files onto a CD. If you have problems moving your audio files onto a CD, please consult the documentation associated with your CD burner and the software that came with the burner.
Transfer of Audio from LPs and Cassettes to CDs
The transfer of recordings from tapes, records or other media, to CD is a two step process. The first step is to get the recording to a PC file in a format the PC understands. The second step is to copy the PC file to a CD. Our software, either Total Recorder or Dictation Buddy, will address the first step. You will also need a cable with audio plugs at both ends. This cable will go from the headphone/earphone jack of your stereo or cassette player to the mic or line-in jack of your computer's sound card. You should be able to get such a cable for a few dollars from Radio Shack or other similar store. Make sure you get plugs that fit the jacks at both ends!
Once you have the cable, use Total Recorder or Dictation Buddy to record from the stereo/cassette player to the PC. If you have Total Recorder go to "Recording Source and Parameters" and change the setting from "Software" to "Sound Board". You will then be able to record whatever is played into the mic or line-in jack from the output of the stereo/cassette player.
If you plan to transfer your recordings to audio CD, you need to record in PCM format with CD quality (PCM 44.1kHz,16 bit, stereo). Please note that many modern CD players can also reproduce data CDs with files recorded in MP3 format. For information on types of CDs supported by your CD player, you can refer to the documentation supplied with the equipment.
If you don't already have a CD burner, then you must get one. There are all sorts of burners on the market from many manufacturers. It would be best to source this from your local computer supplier.
You will get software with the burner that will allow you to copy the .wav files over to the CD. The important issue in this step is to decide the CD's format. The burner software will ask if you want a "data" format CD or a "music" format CD. If you specify data, then you will get a .wav file on the CD that can only be played by a PC. If, you specify "music", then the .wav file will be converted to a format that can be played by a standard home/car CD player. This will be a CDA or CD Audio CD. See the section above on Audio CDs for more information about this type of CD.
Generally, when a "music" CDA CD is created, the software that comes with the burner will create separate "tracks" for each .wav file that is copied onto the CD. So, if you want to your home stereo to recognize the different songs on a CD as separate tracks, each of the songs must be separated into different wav files, before they are copied to the CD.
Note, almost all commercial recordings are protected by copyright. In the past, most jurisdictions have allowed a user to make a small number of copies of copyrighted material, provided that the copies are created solely for use by the original purchaser. If you have any questions about the legality of creating a copy, we suggest you contact an authority on the copyright regulations in your jurisdiction.