Music and Sound for Interactive Games

Enhancing the power of your software

John W. Ratcliff

John is a graphic artist, designer, and programmer whose credits include computer games such as 688 Attack Sub and SSN-21 Seawolf from Electronic Arts. You can contact him on CompuServe at 70253,3237 or on his BBS at 314-939-0200.


Music and sound effects are the most powerful tools available for you to emotionally impact users. Without them, users don't know how to "feel." Music and sound help users understand context in your software. The shower scene in Psycho would be meaningless without the accompanying music. How would you know when to be scared in a horror film? On the edge of your seat in a suspense movie? Near tears at the end of Old Yeller? What would make you leap out of your seat screaming as a dinosaur rips through a Land Rover or an alien monster comes tearing down a hallway without the accompanying sound and music?

While we know this intuitively, you can still test this for yourself in a literal fashion. Rent a videotape of Terminator II, Conan the Barbarian, Star Wars, Aliens, or Jurassic Park. Every time you hear your heart racing during the film, close your eyes and think very hard about what you are hearing. Listen to how the primary melody of the film's music score is interwoven and allowed to build and evolve during different portions of the film. During a really loud action sequence, turn the volume off. You will feel the tension of the situation vanish as if you had closed the spigot to a faucet of rushing water.

Nothing is done with sound and music in film that we would not want to emulate with software--with one exception. In computer games, we want the sound effects, dialogue, foley, and music to be both interactive and contextual to the environment. By adding interactive elements to the soundtrack, the emotional content becomes magnified. A sound track has four major components: dialogue, sound effects, foley, and music. Let's take a brief look at each and examine how you should apply them to your game.

For years, PC developers have had to settle for audio devices that could do little more then beep, warble, and belch. The only emotional reaction we could elicit from the user was a deep desire to find the "turn music off" button. The first generation of sound cards wasn't much of an improvement. Although newer sound cards, such as the Adlib Personal Music System, did allow us to add important interactive audio cues to a game, they had limited emotional range. The fundamental weakness inherent in a cheesy FM synthesis device allowed our orchestrations to carry about as much emotional content as grade-schooler's FlutoFone.

With the proliferation of CD-ROM, digital sound cards, and wave-table synthesis MIDI devices, the situation has improved dramatically. Now we can use sound and music in ways that contain more emotional content than a Steven Spielberg movie which, compelling as it is, is a passive experience. We watch the dinosaur attack the Land Rover, but we have no control over the situation. In an interactive game, we are afforded the opportunity to try to get away from the dinosaur. As we attempt to escape the vicious beast, the music and sound effects communicate that emotional distress in direct correlation to our own actions. This results in a heightened sense of awareness that only an interactive environment can bring.

One of the best examples of interactive digital sound in a gaming environment is id Software's DOOM. How many of you have jumped back in your chair when you heard the eerie "growls" and "snorts" of a monster somewhere around a corner? Although you didn't see the monster, simply hearing it precipitated an emotional response so strong that when the beast lurches out and you cut it down in a hail of bullets, you feel a much greater sense of accomplishment. These kinds of subtle audio cues allow you to orchestrate the emotional response in the user. Done properly, this effect will bring the game player much deeper into the environment you are trying to create.

At this time I should sound a note of warning: While good use of sound and music can greatly enhance your software, it is easy to do it wrong. Sound and music that are of poor quality or that don't support the emotional direction of your product are a waste of time, money, and disk space. Bad or unprofessional production values, while they may not destroy a product, will leave the user with an overall poor impression, regardless of how well done the rest of elements might be.

Here are some suggestions of how you can make the sound and music in your game as effective as possible:

Effective use of sound and music in interactive games makes the difference between "experiencing" and merely "playing" them.

Types of Audio

The following are several ways to implement audio on the PC architecture:

Conclusion

Game developers are fortunate to be able to draw upon the cumulative experience of composers ranging from Mozart, John Williams, and Basil Poledouris, to the Beatles, Pink Floyd, and the Benedictine Monks of Santo Domingo De Silos. We can also leverage the expertise of third-party audio vendors who specialize in the mechanics of programming sound devices at the hardware level. Several systems exist that relieve you of this burden and allow you to focus on the sound and music you want to deliver.

Digital Sound Engineering for Game Development

Rob Wallace

Rob, who is executive producer of Wallace Music & Sound, can be contacted at WallMus@ix.netcom.

Back in 1990, I decided to expand my music services to include sound effects and voice tracks for game developers and publishers. I was experienced in creating analog foley sound, voice tracks, and sound effects for radio, TV, and film production, but I discovered that translating analog-audio engineering skills to the digital domain created some unexpected challenges. Here, I'll present techniques and recommend tools which should enable you to make your waveforms the best they can be.

To create and edit professional sounds for computer games, you'll first need the right equipment for waveform production. This equipment includes a sizable hard drive (750 Mbytes, minimum), off-line storage and shipment devices (QIC 80 drive, Syquest 270, or 2/4/8 gigabyte DAT drive), and a commercial-quality, 16-bit sound card (like the Turtle Beach Rio).

For high-resolution applications such as Redbook Audio, you'll need a commercial stereo compressor/limiter (I use a dbx 166) and a graphic equalizer with a minimum of 12 dB suppression/ attenuation. For low-resolution sound effects and voice tracks, I suggest the Alesis 3630 stereo compressor/limiter. You'll also want an analog mixer, such as the Mackie 1202, along with amps, connectors, and speakers.

As for software, you'll need one or more sound-effects libraries. I use the Sound Ideas Libraries, Hollywood Edge Cartoon Trax Library, and my own collection of foley and sound effects acquired over the years. Lastly, you'll want a waveform creator and editor. After working with all of the Macintosh- and PC-based toolkits, I've settled on Sound Forge 3.0, from Sonic Foundry, which reads/writes standard audio file formats, converts one format to another, changes sampling rates and bit depths, and synthesizes MIDI files into .WAV files. It also lets you capture sounds through sound boards or samples from external synthesizers. The program includes all standard audio-control features, including chorus, compress, double, echo, filter, limit, and stretch.

The inherent noise made by aliasing and downsampling waveforms demands effective algorithms and equalization and signal-compression techniques in order to achieve acceptable results. Nyquist's Theorem states that the highest frequency you can record is equal to 1/2 the highest value of the resolution of the waveform. At 11.025 kHz, the best high-end response you'll achieve is 5512.5 Hz. Since the bell curve of the audible frequencies of a human voice average between 1000 and 3000 Hz, you'd think that making waveforms of the human voice would be easy. Not so, because you first have to filter out all frequencies above 5512.5 Hz if you're going to make 11.025-kHz resolution waveforms. Also, by its nature, aliasing introduces undesirable parallel frequencies into the waveform. Aliasing is analogous to filming a whirling helicopter blade at 90 frames per second. The visual effect looks like the blade is moving in reverse, or has a "chunky" look--quite different than when viewing the blade live.

To prevent aliasing and produce the clearest, most-aesthetic low-resolution 8-bit digital waveform, you severely notch out the frequencies above the Nyquist-theorem number. The depth of the notching in dB that you apply depends on the complexity, timbre, and harmonics inherent in the original sound. This equalization must occur prior to digitizing the sound.

Once the equalization is applied, you digitize the sound at the bit depth and frequency rate needed. Listen to the playback carefully. If the results sound hollow or booming, you have applied too many dB of equalization suppression or notched too many frequencies above the Nyquist frequency. Here you begin to learn, in depth, the craft of creating usable waveforms. The guiding principle is: The lower the bit depth and sampling rate, the tougher it is to achieve an acceptable sound. The timbre, complexity, harmonics, original sample quality, and dynamic range of the sound you are digitizing will also influence the end result.

Experimenting with different levels of suppression or attenuation of frequencies will give you a sense of the most usable sample and will help you to make educated judgments when making new samples. The challenge becomes harder depending upon how low the bit depth and sampling frequency plunges.

A sample waveform can be further manipulated with digital signal processing (DSP). You can apply DSP to the sound using external hardware (like a Yamaha SPX 900) before the signal reaches the analog to digital converter (ADC) on your sound card. DSP can be algorithmically applied directly to the digitized waveform after it is created. For game development, I recommend the algorithmic approach. Unless you have a high-end professional-audio DSP unit (like the Lexicon), it is easy to introduce noise that becomes intolerable when aliased. This is particularly noticeable when creating pitch-change DSP. The only problem with algorithmic DSP is that processing time can become lengthy when your samples contain massive amounts of data (800K or greater). With the Lexicon, the DSP is done in real time, so you get to hear and tweak the effect prior to digitizing.

Delay, reverb, chorusing, flanging, noise gating, distortion, pitch change, and amplitude modulation are common DSP effects that enhance sound, voice, and musical waveforms. In game applications the waveform must be properly compressed. I have learned that digital samples in games need to be fat and almost always at maximum amplitude. You minimize inherent hiss in low-resolution application. Because the sound-wave data is as loud as it can be, you are making all the possible qualities of the sound available to the user by reducing the dynamic range of the effect.

Compression is best applied before digitizing. For game applications, the Alesis 3630 produces killer waveforms. You simply set compression to just under the highest ratio it will compress, then ensure that the output volume peaks around 0 dB and that input on the sound card matches the 0 dB level of the compressor output. To do this, get a Shure tone generator (model A15TG), which will produce a constant tone so you can balance and create a gain structure.

Always test your compression by making a sample and looking at the waveform to see that it is fat and peaks, even flattens a bit, at the top. Then listen for digital distortion, which appears as a crackling pop, or a hard-edged scratch resonance. To cure this, lower the output volume of the compressor or increase the compression ratio (or both). It is still possible to get some dynamic range in low-resolution applications. A cricket chirp loop doesn't need to be at maximum amplitude or even compressed very much because it is a subtle ambient effect--but voice tracks need to be fat and maximized.

It is now possible to alter pitch and compress or expand the same waveform in time. This means that you can make one actor sound like a different person by changing his delivery characteristics. By applying delay, flange, and chorusing, your own voice can be used to create sounds for horrific space beasts and demons of every description.

Nothing beats the actual experience of creating sound effects for a game and then hearing them while you run the application. This is the acid test for your sound design. You may have to go back and change or recreate sounds, but when it all comes together and works, the effect is dazzling.


Copyright © 1995, Dr. Dobb's Journal