AVT WG R. Zopf Internet Draft Lucent Technologies Document: draft-ietf-avt-rtp-cn-04.txt October 2001 Category: Standards Track RTP Payload for Comfort Noise Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract This document describes an RTP [2] payload format for transporting comfort noise (CN). The CN payload type is primarily for use with audio codecs that do not support comfort noise as part of the codec itself such as ITU-T Recommendations G.711 [3], G.726 [4], G.727 [5], G.728 [6], and G.722 [7]. Resolution of Open Issues This revision reverts the definition of the sampling rate back to that in revision -01 of this document. By making the sampling rate equal to that of the audio codec, it was essentially redefining the meaning of a static payload to say that it could have a variable RTP clock rate. The sampling rate is now defined to be 8000 Hz, and any other rate requires the use of a dynamic payload type. See Section 5 for details. No other issues are known or expected. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [8]. Zopf Standards Track - April, 2002 1 RTP Payload for Comfort Noise October 2001 3. Introduction This document describes an RTP payload format for transporting comfort noise. The payload format is based on Appendix II of ITU-T Recommendation G.711 [9] which defines a comfort noise payload format (or bit-stream) for ITU-T G.711 use in packet-based multimedia communication systems. The payload format is generic and may also be used with other audio codecs without built-in Discontinuous Transmission (DTX) capability such as ITU-T Recommendations G.726 [4], G.727 [5], G.728 [6], and G.722 [7]. The payload format provides a minimum interoperability specification for communication of comfort noise parameters. The comfort noise analysis and synthesis as well as the Voice Activity Detection (VAD) and DTX algorithms are unspecified and left implementation-specific. However, an example solution for G.711 has been tested and is described in the Appendix [9]. It uses the VAD and DTX of G.729 Annex B [10] and a comfort noise generation algorithm (CNG) which is provided in the Appendix for information. The comfort noise payload consists of a single octet description of the noise level and MAY contain spectral information in subsequent octets. An earlier version of the CN payload format consisting only of the noise level byte was defined in draft revisions of the RFC 1890. The extended payload format defined in this document should be backward compatible with implementations of the earlier version assuming that only the first byte is interpreted and any additional spectral information bytes are ignored. 4. CN Payload Definition The comfort noise payload consists of a description of the noise level and spectral information in the form of reflection coefficients. The use of spectral information is optional and the all-pole model order is left unspecified. The encoder can determine the appropriate model order based on such considerations as quality, complexity, expected environmental noise, and signal bandwidth. The model order is not explicitly transmitted since it can be derived from the length of the payload at the receiver. For complexity or other reasons, the decoder may reduce the model order by setting higher order reflection coefficients to zero. 4.1 Noise Level The magnitude of the noise level is packed into the least significant bits of the noise-level byte with the most significant bit unused and always set to 0 as shown below in Figure 1. The least significant bit of the noise level magnitude is packed into the least significant bit of the byte. The noise level is expressed in -dBov, with values from 0 to 127 representing 0 to -127 dBov. dBov is the level relative to the Zopf Standards Track - April, 2002 2 RTP Payload for Comfort Noise October 2001 overload of the system. (Note: Representation relative to the overload point of a system is particularly useful for digital implementations, since one does not need to know the relative calibration of the analog circuitry.) For example, in the case of a u-law system, the reference would be a square wave with values +/- 8031, and this square wave represents 0dBov. This translates into 6.18dBm0. 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |0| level | +-+-+-+-+-+-+-+-+ Figure 1: Noise Level Packing 4.2 Spectral Information The spectral information is transmitted using reflection coefficients [9]. Each reflection coefficient can have values between -1 and 1 and is quantized uniformly using 8 bits. The quantized value is represented by the 8 bit index N, where N=0..,254, and index N=255 is reserved for future use. Each index N is packed into a separate byte with the MSB first. The quantized value of each reflection coefficient, k_i, can be obtained from its corresponding index using: k_i(N_i) = 258*(N_i-127) for N_i = 0...254; -1 < k_i < 1 ------------- 32768 4.3 Payload Packing The first byte of the payload MUST contain the noise level as shown in Figure 1. Quantized reflection coefficients are packed in subsequent bytes in ascending order as in Figure 2 where M is the model order. The total length of the payload is M+1 bytes. Note that a 0th order model (i.e. no spectral envelope information) reduces to transmitting only the energy level. Byte 1 2 3 ... M+1 +-----+-----+-----+-----+-----+ |level| N1 | N2 | ... | NM | +-----+-----+-----+-----+-----+ Figure 2: CN Payload Packing Format 5. Usage of RTP The RTP header for the comfort noise packet SHOULD be constructed as if the comfort noise were an independent codec. Thus, the RTP timestamp designates the beginning of the silence period. A static payload type of 13 is assigned for a sampling rate of 8,000 Hz; if other sampling rates are needed, they MUST be defined through Zopf Standards Track - April, 2002 3 RTP Payload for Comfort Noise October 2001 dynamic payload types. The RTP packet SHOULD NOT have the marker bit set. Each RTP packet containing comfort noise MUST contain exactly one CN payload per channel. This is required since the CN payload has a variable length. If multiple audio channels are used, each channel MUST use the same spectral model order 'M'. 6. Guidelines for Use A audio codec with DTX capabilities generally includes VAD, DTX, and CNG algorithms. The job of the VAD is to discriminate between active and inactive voice segments in the input signal. During inactive voice segments, the role of the CNG is to sufficiently describe the ambient noise while minimising the transmission rate. A Silence Insertion Descriptor (SID) frame containing a description of the noise is packed into the CN payload and sent to the receiver. The DTX algorithm determines when a SID frame is transmitted. The SID frame is sent once at the beginning of a silence period, but the update rate is left implementation specific. For example, the SID frame may be sent periodically or only when there is a significant change in the background noise characteristics. The CNG algorithm at the receiver uses the information in the SID to update its noise generation model and then produce an appropriate amount of comfort noise. The CN payload format provides a minimum interoperability specification for communication of comfort noise parameters. The comfort noise analysis and synthesis as well as the VAD and DTX algorithms are unspecified and left implementation-specific. However, an example solution for G.711 has been tested and is described in Appendix II of ITU-T Recommendation G.711 [9]. It uses the VAD and DTX of G.729 Annex B [10] and a comfort noise generation algorithm (CNG), which is provided in the Appendix for information. Additional guidelines for use such as the factors affecting system performance in the design of the VAD/DTX/CNG algorithms are described in the Appendix. 7. MIME Media Type Registrations This section defines a new RTP payload name and associated MIME type, CN (audio/CN). 7.1 Registration of MIME media type audio/CN MIME media type name: audio MIME subtype name: CN Required parameters: None Optional parameters: rate Zopf Standards Track - April, 2002 4 RTP Payload for Comfort Noise October 2001 Encoding considerations: This type is only defined for transfer via RTP [RFC XXXX, draft- ietf-avt-rtp-new]. Security considerations: see Section 8 "Security Considerations". Interoperability considerations: none Published specification: This document and Appendix II of ITU-T Recommendation G.711 Applications which use this media type: Audio and video streaming and conferencing tools. Additional information: none Person & email address to contact for further information: Robert Zopf zopf@lucent.com Intended usage: COMMON Author/Change controller: Author: Robert Zopf Change controller: IETF AVT Working Group 8. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [2]. This implies that confidentiality of the media streams is achieved by encryption. Because the payload format is arranged end-to-end, encryption MAY be performed after encapsulation so there is no conflict between the two operations. As this format transports background noise, there are no significant security, confidentiality, or authentication concerns. 9. References 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. 2 H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889. 3 ITU Recommendation G.711 (11/88) - Pulse code modulation (PCM) of voice frequencies. 4 ITU Recommendation G.726 (12/90) - 40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code Modulation (ADPCM). Zopf Standards Track - April, 2002 5 RTP Payload for Comfort Noise October 2001 5 ITU Recommendation G.727 (12/90) - 5-, 4-, 3- and 2-bits sample embedded adaptive differential pulse code modulation (ADPCM). 6 ITU Recommendation G.728 (09/92) - Coding of speech at 16 kbits/s using low-delay code excited linear prediction. 7 ITU Recommendation G.722 (11/88) - 7 kHz audio-coding within 64 kbit/s. 8 Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 9 Appendix II to Recommendation G.711 (to be published) - A comfort noise payload definition for ITU-T G.711 use in packet-based multimedia communication systems. 10 Annex B (08/97) to Recommendation G.729 - C source code and test vectors for implementation verification of the algorithm of the G.729 silence compression scheme. 9. Author's Address Robert Zopf Lucent Technologies INS Access VoIP Networks 480 Red Hill Road Middletown, NJ 07748 USA e-mail: zopf@lucent.com Tel: 1-732-615-4157 Fax: 1-732-615-4526 Zopf Standards Track - April, 2002 6 RTP Payload for Comfort Noise October 2001 Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into Zopf Standards Track - April, 2002 7