Audio Signal Processing And Coding
TED PAINTER, PhD, obtained his doctorate at ASU in 2000. He is a multimedia software architect in the Mobility and Wireless Group at Intel Corporation. His work focuses on architectural analysis, high-performance multimedia software design for mobile handsets, and definition of industry standards. He is editor of the Khronos OpenMAX DL specification. His research interests include psychoacoustics and speech and audio processing. He is co-recipient of the IEEE Donald Fink Prize Paper Award for his work on perceptual coding of digital audio.
Audio Signal Processing and Coding
VENKATRAMAN ATTI, PhD, obtained his doctorate at ASU in 2006. He currently works as a senior engineer at Acoustic Technologies, Inc. While at ASU, he contributed to speech and audio coding, and to the Java-DSP package. His work in integrating perceptual criteria in linear predictive coding was nominated for an award at IEEE ICASSP-2005. At Acoustics Technologies, his work focuses on research and development of acoustic echo cancellation and noise reduction algorithms.
The motivation for audio signal processing began at the beginning of the 20th century with inventions like the telephone, phonograph, and radio that allowed for the transmission and storage of audio signals. Audio processing was necessary for early radio broadcasting, as there were many problems with studio-to-transmitter links.[1] The theory of signal processing and its application to audio was largely developed at Bell Labs in the mid 20th century. Claude Shannon and Harry Nyquist's early work on communication theory, sampling theory and pulse-code modulation (PCM) laid the foundations for the field. In 1957, Max Mathews became the first person to synthesize audio from a computer, giving birth to computer music.
Major developments in digital audio coding and audio data compression include differential pulse-code modulation (DPCM) by C. Chapin Cutler at Bell Labs in 1950,[2] linear predictive coding (LPC) by Fumitada Itakura (Nagoya University) and Shuzo Saito (Nippon Telegraph and Telephone) in 1966,[3] adaptive DPCM (ADPCM) by P. Cummiskey, Nikil S. Jayant and James L. Flanagan at Bell Labs in 1973,[4][5] discrete cosine transform (DCT) coding by Nasir Ahmed, T. Natarajan and K. R. Rao in 1974,[6] and modified discrete cosine transform (MDCT) coding by J. P. Princen, A. W. Johnson and A. B. Bradley at the University of Surrey in 1987.[7] LPC is the basis for perceptual coding and is widely used in speech coding,[8] while MDCT coding is widely used in modern audio coding formats such as MP3[9] and Advanced Audio Coding (AAC).[10]
An analog audio signal is a continuous signal represented by an electrical voltage or current that is analogous to the sound waves in the air. Analog signal processing then involves physically altering the continuous signal by changing the voltage or current or charge via electrical circuits.
Historically, before the advent of widespread digital technology, analog was the only method by which to manipulate a signal. Since that time, as computers and software have become more capable and affordable, digital signal processing has become the method of choice. However, in music applications, analog technology is often still desirable as it often produces nonlinear responses that are difficult to replicate with digital filters.
A digital representation expresses the audio waveform as a sequence of symbols, usually binary numbers. This permits signal processing using digital circuits such as digital signal processors, microprocessors and general-purpose computers. Most modern audio systems use a digital approach as the techniques of digital signal processing are much more powerful and efficient than analog domain signal processing.[11]
Audio signal processing is used when broadcasting audio signals in order to enhance their fidelity or optimize for bandwidth or latency. In this domain, the most important audio processing takes place just before the transmitter. The audio processor here must prevent or minimize overmodulation, compensate for non-linear transmitters (a potential issue with medium wave and shortwave broadcasting), and adjust overall loudness to the desired level.
Audio synthesis is the electronic generation of audio signals. A musical instrument that accomplishes this is called a synthesizer. Synthesizers can either imitate sounds or generate new ones. Audio synthesis is also used to generate human speech using speech synthesis.
Current and future research activities of the Moriya Research Laboratory are introduced. To date, various compression coding technologies for speech and audio have been used for convenient and economical communication systems. However, compression makes the sound quality more band-limited and contaminated with unnatural distortion. We are seeking to construct more comfortable and more convenient communications systems by making full use of the broadband network environment. To achieve this goal, we are focusing on the development of lossless compression coding and exploring new concepts in quality through the use of newly developed devices and our deepening understanding of human perception.
In the evolution of communications systems, compression has been essential because it allows users to share the limited capacity of communications channels and storage spaces. Various types of compression coding for speech and audio have been developed, and these have found important applications in cellular phone systems, music delivery over networks, and portable players. However, most of the speech coding and audio coding standards in ISO/IEC MPEG* (International Organization for Standardization and International Electrotechnical Commission Moving Picture Experts Group), such as MP3 (MPEG-1 audio layer 3) and AAC (advanced audio coder), achieve a high compression ratio at the sacrifice of minor waveform distortion and band limitation at the decoder.
Along with the evolution of the broadband network and digital audio equipment, information rates for delivery and storage have increased rapidly owing to the demands for high-quality audio signal (high sampling rates, high word resolution, and multichannel capability). In the broadband environment, we do not want to lose any quality as a result of data compression. However, as long as the original quality remains unchanged and the processing cost is low, compression will always be useful because the information rates might exceed the available transmission speed or storage capacity.
In this sense, our first endeavor for high-quality coding was the development of a lossless coding scheme that assures perfect reconstruction of the original waveform. This is essential for economically storing or transmitting high-quality signals without any degradation. For interoperability of various applications throughout the world and over time, international standardization is extremely useful. We have continually contributed to the establishment of a lossless coding standard in the MPEG community since 2002. The standard (MPEG-4 ALS) [1] was published in 2006 as part of ISO/IEC 14496-3. Even after the publication of this standard, we continued to make efforts for further improvement of the encoder and for commercialization.
We will continue our efforts to further compress audio signals without losing quality. Compression technology is sometimes dependent on the analysis method or model estimation. Efficient model estimation is also useful for recognition and search tasks. Maintaining a high level of compression technology is also essential for other types of signal processing.
We need to extend our research field toward improving the quality and comfort of communications as shown in Fig. 1. At present, a single-point-source single-channel band-limited audio signal is used for most communications systems. We want to extend the way that this signal, or information, is used in two ways. One is for human interaction. To explore comfortable communication and the sensation of real presence in music, we need to understand the characteristics of human perception. The other is to significantly increase the number of signal channels (super multichannel capability). There is a huge amount of information hidden in the sound field of a room. We can make use of new devices and hardware tools to facilitate cost-effective multichannel communications systems.
The interface to the real environment can be enhanced by introducing a massive number of channels for sound-field control. For this purpose, we need economical high-speed hardware as well as processing and control software. It is impossible to increase the number of channels beyond a few hundred with conventional parallel cable distribution of signals from microphone and loudspeaker arrays. A very promising solution is to use the rapidly developing technologies for high-speed transmission and multiplexing through optical fiber and in small devices. One interesting example is an array of microphones multiplexed in an optical fiber [4]. If a super multichannel sound system can be achieved, it will find general use in various applications such as noise control and environmental sensors.
The research activities at the Moriya Research Laboratory include the development of lossless coding and future exploration of human interaction and super multichannel signal processing. All are aimed at the creation of high-quality comfortable communications systems that make use of the rich information available through broadband networks. Our work will be carried out under flexible collaborations with other NTT laboratories in the fields of innovative communication devices and human sciences. In addition, we will continue to promote standardization and alliances, which are important for these new technologies. We hope these technologies will also contribute to other research fields besides the acoustical signal processing field.
Audio signals are information rich nonstationary signals that play an important role in our day-to-day communication, perception of environment, and entertainment. Due to its non-stationary nature, time- or frequency-only approaches are inadequate in analyzing these signals. A joint time-frequency (TF) approach would be a better choice to efficiently process these signals. In this digital era, compression, intelligent indexing for content-based retrieval, classification, and protection of digital audio content are few of the areas that encapsulate a majority of the audio signal processing applications. In this paper, we present a comprehensive array of TF methodologies that successfully address applications in all of the above mentioned areas. A TF-based audio coding scheme with novel psychoacoustics model, music classification, audio classification of environmental sounds, audio fingerprinting, and audio watermarking will be presented to demonstrate the advantages of using time-frequency approaches in analyzing and extracting information from audio signals. 041b061a72