Music and Sound for Interactive Games

Enhancing the power of your software

John W. Ratcliff

John is a graphic artist, designer, and programmer whose credits include computer games such as 688 Attack Sub and SSN-21 Seawolf from Electronic Arts. You can contact him on CompuServe at 70253,3237 or on his BBS at 314-939-0200.

Music and sound effects are the most powerful tools available for you to emotionally impact users. Without them, users don't know how to "feel." Music and sound help users understand context in your software. The shower scene in Psycho would be meaningless without the accompanying music. How would you know when to be scared in a horror film? On the edge of your seat in a suspense movie? Near tears at the end of Old Yeller? What would make you leap out of your seat screaming as a dinosaur rips through a Land Rover or an alien monster comes tearing down a hallway without the accompanying sound and music?

While we know this intuitively, you can still test this for yourself in a literal fashion. Rent a videotape of Terminator II, Conan the Barbarian, Star Wars, Aliens, or Jurassic Park. Every time you hear your heart racing during the film, close your eyes and think very hard about what you are hearing. Listen to how the primary melody of the film's music score is interwoven and allowed to build and evolve during different portions of the film. During a really loud action sequence, turn the volume off. You will feel the tension of the situation vanish as if you had closed the spigot to a faucet of rushing water.

Nothing is done with sound and music in film that we would not want to emulate with software--with one exception. In computer games, we want the sound effects, dialogue, foley, and music to be both interactive and contextual to the environment. By adding interactive elements to the soundtrack, the emotional content becomes magnified. A sound track has four major components: dialogue, sound effects, foley, and music. Let's take a brief look at each and examine how you should apply them to your game.

Dialogue. Most of the dialogue you hear in movies has been redone in a studio after the scene was shot. This allows actors to focus on how they sound and lets the sound designer control the exact balance of the audio in the finished sound track. Clearly, it is important that all of your dialogue be done professionally in a recording studio. However, unless you have been given a huge budget, you probably can't afford Hollywood actors and an expensive studio. The alternative is sound engineers who offer full audio services. These professionals not only provide you with composing services but also provide voice actors, custom digital sound effects, and mixing. They can even deliver the audio in computer-data format at the resolution you need, customized for the various hardware platforms.

Sound effects. While many sound-effects libraries are available, some games call for custom effects. Common sense suggests you use clip sound libraries where you can, but go to a professional sound engineer to get effects that exactly match the content of your game.

Foley. Foley effects are ambient, supporting sound effects that you don't even notice when you watch a film--but you would notice them if they weren't there. Foley effects include footsteps, cars, wind, birds, or any other environmental sounds that support the content. In Hollywood, every single footstep and rustle of fabric is added to the sound track and synchronized frame by frame to the film as a post-production process. Foley effects create a greater sense of "virtual reality" than the most exacting computer graphics. They are greatly enhanced if used with special processing like Qsound, reverb, and other digital-signal processing effects. Reverb is a technique where sounds are fed through a signal-processing phase to approximate the echo and reflections found in a real environment. For example, on the Creative Labs AWE32 sound card you can program the exact characteristics of the shape of a room through MIDI events. Instantly, all foley effects will sound as though there were occurring in a room of that shape and size.

Foley and digital sound effects are the most highly interactive tools you can apply to your sound track. With foley, you let the user hear footsteps, gunshots, growls of a monster around a corner, wind blowing, birds chirping, and street noise. As long as these sounds are in real time, contextual to where they are in your virtual reality, they will draw the user very deeply into the world you have created. This magnification of the virtual-reality experience through the use of interactive sound effects overpowers goggles, gloves, head-tracking devices, or any of the other virtual-reality gadgetry out there. Removing the soundtrack plunges the user back into the days of silent movies.

Music. In film, the musical score unfolds in a linear fashion. The composer knows exactly the amount of time required to build up to that great suspense scene. But in interactive games, the suspense scene is unknown--it depends on when the user opens the door marked "Pit From Hell." While some games simply score a different song for each level, providing almost no interactivity, others branch in and out of MIDI sequences to create a seamless transitions. Some have even attempted algorithmic music, which is actually created in real time by the computer.

Probably the best middle-ground approach is to come up with all possible variations of emotion you wish to communicate in the product, and then have your composer score as if it were for film. Your composer should provide branching points into and out of these sequences to communicate the emotional context in pseudo real time. These branches will not be instantaneous, but will model the underlying context of the game state very closely, such that when you enter a danger or suspense state, the music will branch to reflect that emotion.

Another approach is to simply use the music to communicate the base ambiance for the current level, and make heavy use of interactive foley, dialogue, and sound effects to communicate the action. Obviously, gunshots, explosions, and screams of terror will convey that information to the user very well.

For years, PC developers have had to settle for audio devices that could do little more then beep, warble, and belch. The only emotional reaction we could elicit from the user was a deep desire to find the "turn music off" button. The first generation of sound cards wasn't much of an improvement. Although newer sound cards, such as the Adlib Personal Music System, did allow us to add important interactive audio cues to a game, they had limited emotional range. The fundamental weakness inherent in a cheesy FM synthesis device allowed our orchestrations to carry about as much emotional content as grade-schooler's FlutoFone.

With the proliferation of CD-ROM, digital sound cards, and wave-table synthesis MIDI devices, the situation has improved dramatically. Now we can use sound and music in ways that contain more emotional content than a Steven Spielberg movie which, compelling as it is, is a passive experience. We watch the dinosaur attack the Land Rover, but we have no control over the situation. In an interactive game, we are afforded the opportunity to try to get away from the dinosaur. As we attempt to escape the vicious beast, the music and sound effects communicate that emotional distress in direct correlation to our own actions. This results in a heightened sense of awareness that only an interactive environment can bring.

One of the best examples of interactive digital sound in a gaming environment is id Software's DOOM. How many of you have jumped back in your chair when you heard the eerie "growls" and "snorts" of a monster somewhere around a corner? Although you didn't see the monster, simply hearing it precipitated an emotional response so strong that when the beast lurches out and you cut it down in a hail of bullets, you feel a much greater sense of accomplishment. These kinds of subtle audio cues allow you to orchestrate the emotional response in the user. Done properly, this effect will bring the game player much deeper into the environment you are trying to create.

At this time I should sound a note of warning: While good use of sound and music can greatly enhance your software, it is easy to do it wrong. Sound and music that are of poor quality or that don't support the emotional direction of your product are a waste of time, money, and disk space. Bad or unprofessional production values, while they may not destroy a product, will leave the user with an overall poor impression, regardless of how well done the rest of elements might be.

Here are some suggestions of how you can make the sound and music in your game as effective as possible:

Use professional sound effects. Either hire a sound-effects specialist or be extraordinarily choosy about utilizing clip sounds. Do not steal your sound effects from movies, records, or television; this is copyright violation. Your software will not be accepted by a publisher, and you may even get sued. Just because you pull a great Star Trek sound effect off of a BBS doesn't mean you have rights to use it.
Use professional music. Either hire an interactive-media composer or use high-quality music clips that fit your project. Remember, just as you wouldn't hire a bass player to play the saxophone, you should be aware that the talent to compose for MIDI and interactive environments is unique. Being a great musician and providing quality MIDI composition are not one and the same. Production values need to be high, and squeezing quality out of limited music devices is a talent your composer will need.
Make certain that all of the music supports the emotional content, theme, and direction of the game at any given time. Think about the interactive nature of your music and how you want it to shift in context, according to game play.
Look at games and movies of a similar genre. Every time you watch a movie, try to be aware of how your emotions are manipulated by the music and sound effects.
Listen to your composer. Look for a composer who has an established track record composing for the target hardware and is familiar with interactive media. Communicate very strongly to your composer exactly what you want. Give your composer specific music, either from CD or film, that matches the emotional content you want to communicate.

Effective use of sound and music in interactive games makes the difference between "experiencing" and merely "playing" them.

Types of Audio

The following are several ways to implement audio on the PC architecture:

Digital sound. Ever since the release of the Creative Labs' SoundBlaster, the PC architecture has had a solid platform for implementing digital sound. Other entries, including the Covox Speech Thing, the Walt Disney Sound Source, and the MediaVision ProAudio Spectrum card, all provide this capability. With digital sound, your program can play back anything that can be recorded with a microphone--digital-sound effects, human speech, music, and the like. Digital sound requires enormous amounts of memory and disk storage, but it has still been used very effectively as a method for delivering sound effects and voice-recorded responses.

FM synthesis. The earliest popular PC sound card was the Adlib Personal Music System, which contained a Yamaha YM3812 (OPL2) FM synthesis chip. This device can create waveforms by using oscillators that allow you to apply frequency modulation and attack, sustain, decay, and release operators to a semiprogrammable waveform, including a white-noise generator. If this sounds complicated, it is! This device is phenomenally difficult to program, and even your best programming efforts sound pretty lame. Fortunately, the importance of FM synthesis is declining in the wake of the new generation of General MIDI wave-table synthesis devices. A number of systems allow the YM3812 to emulate a MIDI device, thus saving you from having to deal with its arcane nature.

MIDI. The Musical Instrument Digital Interface (MIDI) specification is an internationally supported, de facto standard that defines a serial interface for connections between music synthesizers, musical instruments, and computers. MIDI, which is maintained by the MIDI Manufacturers Association (Los Angeles, CA), is based both on hardware (I/O channels, cables, and the like) and software (encoded messages defining device, pitch, volume, and so forth). According to the specification, the receiving device in a MIDI system interprets the musical data even though the sending device has no way of knowing what the receiver can do. But this can be a problem if the receiving device doesn't have the capability to interpret the data correctly. General MIDI addresses this problem by identifying hardware capabilities in advance.

All general-MIDI devices have 128 sound effects as well as musical-instrument and percussion sounds. General-MIDI systems support simultaneous use of 16 MIDI channels with a minimum of 24 notes each, and they have a specified set of music controllers. This means that with general MIDI, the sender knows what to expect of the receiver. Consequently, a file created with one general-MIDI device is recognizable when played on any other--without losing notes or changing instrumental balance.

General-MIDI synthesizers available include the Roland Sound Canvas, the Roland RAP-10, the Creative Labs Waveblaster, the Logitech SoundWave, the Ensoniq SoundScape, the Gravis Ultrasound, the Turtle Beach Multisound, the Turtle Beach Maui card, and the Sierra Semiconductor Aria card. Additionally, general-MIDI emulation is available for FM-synthesis devices such as the SoundBlaster via third-party developer toolkits like MIDPAK or the Audio Interface Library.

The future of interactive music appears to be the general-MIDI platform, which allows you to hire a composer to create fully orchestrated scores that will play back at high quality on a large installed base of sound cards. MIDI data streams are small, have a relatively low interrupt rate, and require low CPU bandwidth.

CD RedBook audio. One benefit of CD-ROM drives is that they can play standard CD audio tracks. However, you cannot have your software run from the CD and play music at the same time. Access to the data and audio portions of the CD is mutually exclusive, and you cannot switch CD audio tracks instantaneously to achieve any semblance of interactivity or smooth transition. However, many developers find benefits in placing portions of their music score on the CD as an audio track, and you may find some uses for it in your design.

Software digital mixing. With the exception of the Gravis Ultrasound and the Creative Labs AWE32, almost every sound card on the market supports only a single channel of digital audio. In the context of an interactive environment, you want to play many sound effects at once. The way to do this is to implement a software-based digital mixer. Since sound is additive, this is pretty simple: Take all the sounds playing at any given time, add them together into a buffer, clip for overflow, and pass that buffer off to the sound card. A number of development packages support software-based digital mixing in their API specification.

Customized, downloadable patches. On the Gravis Ultrasound and the Creative Labs AWE32, an application can download musical instruments or digital sound effects into memory on the sound card itself. Once on the card, you can trigger these sounds simply by issuing a MIDI event. This is a very powerful concept because not only do you get multichannel support, customized instruments, and a lower burden on both system RAM and CPU, but you can also manipulate those sound effects in real time using pitch shifting, pan-pot controls, and even chorus and reverb effects.

MOD files. MOD files are a proprietary music-file format originally developed for the Commodore Amiga. MOD files effectively create a specification for software-based wave-table synthesis. A MOD file contains multiple channels of music, as well as the actual wave files used to perform each instrument. A software interpreter digitally mixes and frequency shifts each sound effect in real time to produce a single, digital-audio stream to be sent to the sound card.

MOD files sound great on all sound cards. They don't require MIDI devices--any system with a digital channel will get the same high-quality music. However, MOD files lack adequate authoring tools and have huge memory requirements for quality files. MOD files used to be a major CPU burden due to the overhead of a multichannel digital mixer and interpreter, but today's PCs have a great deal more processor power, and recent MOD interpreters are extremely efficient.

Conclusion

Game developers are fortunate to be able to draw upon the cumulative experience of composers ranging from Mozart, John Williams, and Basil Poledouris, to the Beatles, Pink Floyd, and the Benedictine Monks of Santo Domingo De Silos. We can also leverage the expertise of third-party audio vendors who specialize in the mechanics of programming sound devices at the hardware level. Several systems exist that relieve you of this burden and allow you to focus on the sound and music you want to deliver.

Digital Sound Engineering for Game Development

Rob Wallace

Rob, who is executive producer of Wallace Music & Sound, can be contacted at WallMus@ix.netcom.

Back in 1990, I decided to expand my music services to include sound effects and voice tracks for game developers and publishers. I was experienced in creating analog foley sound, voice tracks, and sound effects for radio, TV, and film production, but I discovered that translating analog-audio engineering skills to the digital domain created some unexpected challenges. Here, I'll present techniques and recommend tools which should enable you to make your waveforms the best they can be.

To create and edit professional sounds for computer games, you'll first need the right equipment for waveform production. This equipment includes a sizable hard drive (750 Mbytes, minimum), off-line storage and shipment devices (QIC 80 drive, Syquest 270, or 2/4/8 gigabyte DAT drive), and a commercial-quality, 16-bit sound card (like the Turtle Beach Rio).

For high-resolution applications such as Redbook Audio, you'll need a commercial stereo compressor/limiter (I use a dbx 166) and a graphic equalizer with a minimum of 12 dB suppression/ attenuation. For low-resolution sound effects and voice tracks, I suggest the Alesis 3630 stereo compressor/limiter. You'll also want an analog mixer, such as the Mackie 1202, along with amps, connectors, and speakers.

As for software, you'll need one or more sound-effects libraries. I use the Sound Ideas Libraries, Hollywood Edge Cartoon Trax Library, and my own collection of foley and sound effects acquired over the years. Lastly, you'll want a waveform creator and editor. After working with all of the Macintosh- and PC-based toolkits, I've settled on Sound Forge 3.0, from Sonic Foundry, which reads/writes standard audio file formats, converts one format to another, changes sampling rates and bit depths, and synthesizes MIDI files into .WAV files. It also lets you capture sounds through sound boards or samples from external synthesizers. The program includes all standard audio-control features, including chorus, compress, double, echo, filter, limit, and stretch.

The inherent noise made by aliasing and downsampling waveforms demands effective algorithms and equalization and signal-compression techniques in order to achieve acceptable results. Nyquist's Theorem states that the highest frequency you can record is equal to 1/2 the highest value of the resolution of the waveform. At 11.025 kHz, the best high-end response you'll achieve is 5512.5 Hz. Since the bell curve of the audible frequencies of a human voice average between 1000 and 3000 Hz, you'd think that making waveforms of the human voice would be easy. Not so, because you first have to filter out all frequencies above 5512.5 Hz if you're going to make 11.025-kHz resolution waveforms. Also, by its nature, aliasing introduces undesirable parallel frequencies into the waveform. Aliasing is analogous to filming a whirling helicopter blade at 90 frames per second. The visual effect looks like the blade is moving in reverse, or has a "chunky" look--quite different than when viewing the blade live.

To prevent aliasing and produce the clearest, most-aesthetic low-resolution 8-bit digital waveform, you severely notch out the frequencies above the Nyquist-theorem number. The depth of the notching in dB that you apply depends on the complexity, timbre, and harmonics inherent in the original sound. This equalization must occur prior to digitizing the sound.

Once the equalization is applied, you digitize the sound at the bit depth and frequency rate needed. Listen to the playback carefully. If the results sound hollow or booming, you have applied too many dB of equalization suppression or notched too many frequencies above the Nyquist frequency. Here you begin to learn, in depth, the craft of creating usable waveforms. The guiding principle is: The lower the bit depth and sampling rate, the tougher it is to achieve an acceptable sound. The timbre, complexity, harmonics, original sample quality, and dynamic range of the sound you are digitizing will also influence the end result.

Experimenting with different levels of suppression or attenuation of frequencies will give you a sense of the most usable sample and will help you to make educated judgments when making new samples. The challenge becomes harder depending upon how low the bit depth and sampling frequency plunges.

A sample waveform can be further manipulated with digital signal processing (DSP). You can apply DSP to the sound using external hardware (like a Yamaha SPX 900) before the signal reaches the analog to digital converter (ADC) on your sound card. DSP can be algorithmically applied directly to the digitized waveform after it is created. For game development, I recommend the algorithmic approach. Unless you have a high-end professional-audio DSP unit (like the Lexicon), it is easy to introduce noise that becomes intolerable when aliased. This is particularly noticeable when creating pitch-change DSP. The only problem with algorithmic DSP is that processing time can become lengthy when your samples contain massive amounts of data (800K or greater). With the Lexicon, the DSP is done in real time, so you get to hear and tweak the effect prior to digitizing.

Delay, reverb, chorusing, flanging, noise gating, distortion, pitch change, and amplitude modulation are common DSP effects that enhance sound, voice, and musical waveforms. In game applications the waveform must be properly compressed. I have learned that digital samples in games need to be fat and almost always at maximum amplitude. You minimize inherent hiss in low-resolution application. Because the sound-wave data is as loud as it can be, you are making all the possible qualities of the sound available to the user by reducing the dynamic range of the effect.

Compression is best applied before digitizing. For game applications, the Alesis 3630 produces killer waveforms. You simply set compression to just under the highest ratio it will compress, then ensure that the output volume peaks around 0 dB and that input on the sound card matches the 0 dB level of the compressor output. To do this, get a Shure tone generator (model A15TG), which will produce a constant tone so you can balance and create a gain structure.

Always test your compression by making a sample and looking at the waveform to see that it is fat and peaks, even flattens a bit, at the top. Then listen for digital distortion, which appears as a crackling pop, or a hard-edged scratch resonance. To cure this, lower the output volume of the compressor or increase the compression ratio (or both). It is still possible to get some dynamic range in low-resolution applications. A cricket chirp loop doesn't need to be at maximum amplitude or even compressed very much because it is a subtle ambient effect--but voice tracks need to be fat and maximized.

It is now possible to alter pitch and compress or expand the same waveform in time. This means that you can make one actor sound like a different person by changing his delivery characteristics. By applying delay, flange, and chorusing, your own voice can be used to create sounds for horrific space beasts and demons of every description.

Nothing beats the actual experience of creating sound effects for a game and then hearing them while you run the application. This is the acid test for your sound design. You may have to go back and change or recreate sounds, but when it all comes together and works, the effect is dazzling.