ITU-T G Series: Transmission systems and media, digital systems and networks


Coding of voice and audio signals
Rec. ITU-T G.711.0 |Rec. ITU-T G.711.1 | Rec. ITU-T G.718 | Rec. ITU-T G.719 | Rec. ITU-T G.720.1 | Rec. ITU-T G.722 | Rec. ITU-T G.722.1 |
Rec. ITU-T G.722.2 | Rec. ITU-T G.723.1 | Rec. ITU-T G.726 | Rec. ITU-T G.727 | Rec. ITU-T G.728 | Rec. ITU-T G.729 | Rec. ITU-T G.729.1
Multimedia Quality of Service and performance – Generic and user-related aspects
Rec. ITU-T G.1050

Recommendation ITU-T G.711.0: Lossless compression of G.711 pulse code modulation

Rec. ITU-T G.711.0 describes a lossless compression scheme of G.711 bitstream, mainly aimed for transmission over IP (e.g., VoIP).
The coder operates on frame lengths of 40, 80, 160, 240 and 320 samples, has a maximum algorithmic delay equals to the frame length, and has a worst-case computational complexity of less than 1.7 weighted million operations per second (WMOPS) for encoder plus decoder.
This Recommendation includes an electronic attachment containing the ANSI C code (fixed-point arithmetic implementation of the specification) , as well as a non-exhaustive set of test signals for use with it.

Download data set


Recommendation ITU-T G.711.1: Wideband embedded extension for ITU-T G.711 pulse code modulation

Rec. ITU-T G.711.1 describes an Rec. ITU-T G.711 embedded wideband speech and audio coding algorithm operating at 64, 80 and 96 kbit/s.
The encoder input and decoder outputs are sampled at 16 kHz by default, but 8-kHz sampling is also supported. When sampled at 16 kHz, the output of the ITU-T G.711.1 coder can encode signal with a bandwidth of 50-7000 Hz at 80 and 96 kbit/s, and for 8-kHz sampling, the output may produce signal with a bandwidth ranging from 50 up to 4000 Hz, operating at 64 and 80 kbit/s (the bandwidth of the narrowband signal output from the decoder is characterized by the built-in split-band filterbank which has cut-off frequency of 4000 Hz). At 64 kbit/s, ITU-T G.711.1 is compatible with Rec. ITU-T G.711. The coder operates on 5 ms frames, has a maximum algorithmic delay of 11.875 ms, and has a worst-case computational complexity of 8.70 WMOPS.
The encoder produces an embedded bitstream structured in three layers corresponding to three available bit rates: 64, 80 and 96 kbit/s. The bitstream can be truncated at the decoder side or by any component of the communication system to adjust the bit rate to the desired value, but since it does not contain any information on which layers are contained, an implementation would require outband signalling on which layers are available.
The underlying algorithm has a three layer coding structure: log companded pulse code modulation (PCM) of the lower band including noise feedback, embedded PCM extension with adaptive bit allocation for enhancing the quality of the base layer in the lower band, and weighted vector quantization coding of the higher band based on modified discrete cosine transformation (MDCT).

Download data set


Recommendation ITU-T G.718: Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8-32 kbit/s

Rec. ITU-T G.718 describes a narrow-band (NB) and wideband (WB) embedded variable bit-rate coding algorithm for speech and audio operating in the range from 8 to 32 kbit/s which is designed to be robust to frame erasures.
This codec provides state-of-the-art NB speech quality over the lower bit rates and state-of-the-art WB speech quality over the complete range of bit rates. In addition, the ITU-T G.718 codec is designed to be highly robust to frame erasures, thereby enhancing the speech quality when used in IP transport applications on fixed, wireless and mobile networks. Despite its embedded nature, the codec also performs well with both NB and WB generic audio signals.
This codec has an embedded scalable structure, enabling maximum flexibility in the transport of voice packets through IP networks of today and in future media-aware networks. In addition, the embedded structure of ITU-T G.718 will easily allow the codec to be extended to provide a super-wideband and stereo capability through additional layers which are currently under development. The bitstream may be truncated at the decoder side or by any component of the communication system to instantaneously adjust the bit rate to the desired value without the need for out-of-band signalling. The encoder produces an embedded bitstream structured in five layers corresponding to the five available bit rates: 8, 12, 16, 24 and 32 kbit/s.
The ITU-T G.718 encoder can accept WB sampled signals at 16 kHz, or NB signals sampled at either 16 or 8 kHz. Similarly, the decoder output can be 16 kHz WB, in addition to 16 or 8 kHz NB. Input signals sampled at 16 kHz, but with bandwidth limited to NB, are detected by the encoder.
The output of the ITU-T G.718 codec is capable of operating with a bandwidth of 300-3400 Hz at 8 and 12 kbit/s and 50-7000 Hz from 8 to 32 kbit/s.
The high quality codec core represents a significant performance improvement, providing 8 kbit/s wideband clean speech quality equivalent to the ITU-T G.722.2 codec at 12.65 kbit/s whilst the 8 kbit/s narrow-band codec operating mode provides clean speech quality equivalent to the ITU-T G.729E codec at 11.8 kbit/s.
The codec operates on 20-ms frames and has a maximum algorithmic delay of 42.875 ms for wideband input and wideband output signals. The maximum algorithmic delay for narrow-band input and narrow-band output signals is 43.875 ms. The codec may also be employed in a low-delay mode when the encoder and decoder maximum bit rates are set to 12 kbit/s. In this case, the maximum algorithmic delay is reduced by 10 ms.
The codec also incorporates an alternate coding mode, with a minimum bit rate of 12.65 kbit/s, which is bitstream interoperable with Recommendation ITU-T G.722.2, 3GPP AMR-WB and 3GPP2 VMR-WB mobile WB speech coding standards. This option replaces layer 1 and layer 2, and the layers 3-5 are similar to the default option with the exception that in layer 3 fewer bits are used to compensate for the extra bits of the 12.65 kbit/s core. The decoder is further able to decode all other ITU-T G.722.2 operating modes. Furthermore, a new annex to this Recommendation is under development that will efficiently enable bit-stream interoperability with the 3GPP2 EVRC-WB codec. This Recommendation also includes discontinuous transmission mode (DTX) and comfort noise generation (CNG) algorithms that enable bandwidth savings during inactive periods. An integrated noise reduction algorithm can be used provided that the communication session is limited to 12 kbit/s.
The underlying algorithm is based on a two-stage coding structure: the lower two layers are based on code-excited linear prediction (CELP) coding of the band (50-6400 Hz) where the core layer takes advantage of signal classification to use optimized coding modes for each frame. The higher layers encode the weighted error signal from the lower layers using overlap-add modified discrete cosine transformation (MDCT) transform coding. Several technologies are used to encode the MDCT coefficients to maximize performance for both speech and music.
Annex A defines an alternative implementation of the ITU-T G.718 algorithm using floating point arithmetic to be used for implementation on DSP hardware optimized for floating-point operations. The accompanying floating point ANSI C source code is fully interoperable with the fixed-point code.
Annex B contains the description of an algorithm for extending the G.718 codec for superwideband speech and audio signals operating at the total bit rates of 36, 40 and 48 kbit/s. An option to omit G.718 Layer 5 is included lowering the bit rates to 28, 32 and 40 kbit/s respectively.
Annex C defines an alternative floating-point implementation of the superwideband monaural extension found in G.718 Annex B.

Download data set


Recommendation ITU-T G.719: Low-complexity, full-band audio coding for high-quality, conversational applications

Rec. ITU-T G.719 provides a bit-exact, fixed-point specification of a fullband speech and audio coding algorithm operating from 32 kbit/s up-to 128 kbit/s.
Although the primary use case for the codec is a two way interactive voice communication, there are several instances in which the storage of ITU-T G.719 compressed audio is necessary. Such use cases include:
  • Recording of a teleconferencing session, e.g. education.
  • Voice mail
  • Call waiting music playback
  • Recording of online "jam"-sessions
Rec. ITU-T G.719 does not specify a particular storage format for ITU-T G.719; it only uses the ITU-T G.192 bitstream for algorithmic simulation purposes.
Annex A specifies the use of the ISO base media file format as container for the ITU-T G.719 bitstream addresses the aforementioned use cases and allow the ITU-T G.719 codec to benefit from a widely spread file format.
Annex B is an alternative implementation of the ITU-T G.719 coding algorithm in floating-point operations, which is highly desirable for DSP platforms optimized for floating point.

Download data set


Recommendation ITU-T G.720.1: Generic sound activity detector

Rec. ITU-T G.720.1 describes an independent front-end processing module implementing a generic sound activity detector (GSAD) that can be applied prior to signal processing applications and can operate on narrowband or wideband audio input using 10 ms frame length (without lookahead), such as used by speech or audio codecs. The primary function of the GSAD is to indicate the input frame activity for performing voice activity detection (VAD). For an active frame, it further indicates if the input frame is speech or music (speech/music discrimination), and for an inactive frame it indicates whether the frame is a silence frame or an audible noise frame (silence detection). The GSAD can also operate when only the primary function of indicating the input frame activity is used.
An external control signal indicates to the GSAD algorithm which one of three different operating points to use, namely: bandwidth-saving, balanced and quality-preferred operating points. For the activity detection functionality, these operating points provide selectable balancing between bandwidth saving and audio quality, which can be utilized for high-performance silence compression schemes that can balance between the end-users speech and audio subjective quality needs and the system and network traffic requirements.
The three different operating points also control the GSAD emphasis and balance between speech and music classification for the active frames, which can be utilized for fine-tuning of source-controlled audio compression systems.
The VAD module uses a dual-parameters classification scheme, where one parameter is a differential zero crossing rate measure and the other parameter is a modified segmental SNR measure. An initial VAD decision is made with a pair of inequalities, with factors that are adaptive to the long term SNR of the input signal. A final VAD decision is obtained by an adaptive hangover scheme. The Speech/Music Discrimination module calculates the variance of a spectral deviation measure and applies an adaptive threshold to make an initial decision between speech and music. Two spectral peakiness measures further modify that initial decision and a one-frame hangover is used to obtain the final speech/music discrimination decision. The Silence Detection module uses an energy threshold to discriminate between a silence frame and an audible noise frame.
This Recommendation provides a detailed description of the overall GSAD configuration, including the operating points; the VAD module; the speech/music discrimination module and the silence detection module. The Recommendation also contains an electronic attachment with the ANSI C source code which forms an integral part of this Recommendation, and a set of test vectors.

Download data set


Recommendation ITU-T G.722: 7 kHz audio-coding within 64 kbit/s

Rec. ITU-T G.722 describes the characteristics of an audio (50 to 7 000 Hz) coding system which may be used for a variety of higher quality speech applications. The coding system uses sub-band adaptive differential pulse code modulation (SB-ADPCM) within a bit rate of 64 kbit/s. The system is henceforth referred to as 64 kbit/s (7 kHz) audio coding. In the SB-ADPCM technique used, the frequency band is split into two sub-bands (higher and lower) and the signals in each sub-band are encoded using ADPCM. The system has three basic modes of operation corresponding to the bit rates used for 7 kHz audio coding: 64, 56 and 48 kbit/s. The latter two modes allow an auxiliary data channel of 8 and 16 kbit/s respectively to be provided within the 64 kbit/s by making use of bits from the lower sub-band.
Rec. ITU-T G.722 Appendix II describes digital test sequences for the verification of the ITU-T G.722 64 kbit/s SB-ADPCM 7 kHz codec. This guide gives information concerning the digital test sequences which should be used to aid verification of implementation of the ADPCM codec part of the wideband coding algorithm.

Download data set


Recommendation ITU-T G.722.1: Coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss

Rec. ITU-T G.722.1 describes a low complexity encoder and decoder that may be used for 7 kHz bandwidth audio signals working at 24 kbit/s or 32 kbit/s. Furthermore, this algorithm is recommended for use in hands-free applications such as conferencing where there is a low probability of frame loss. It may be used with speech or music inputs.
The digital input to the coder may be in a 14-, 15- or 16-bit 2's complement format, at a sampling rate of 16 kHz (handled in the same way as in Recommendation ITU-T G.722). The analogue and digital interface circuitry at the encoder input and decoder output should conform to the same specifications described in Recommendation ITU-T G.722. The algorithm is based on transform technology, using a Modulated Lapped Transform (MLT). It operates on 20 ms frames (320 samples) of audio. Because the transform window (basis function length) is 640 samples and a 50 percent (320 samples) overlap is used between frames, the effective look-ahead buffer size is 20 ms. Hence the total algorithmic delay of 40 ms is the sum of the frame size plus look-ahead. All other delays are due to computational and network transmission delays.
Rec. ITU-T G.722.1 includes a software package which contains the encoder and decoder source code and a set of test vectors for developers. These vectors are a tool that can provide an indication of success in implementing this codec.

Download data set


Recommendation ITU-T G.722.2: Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)

Rec. ITU-T G.722.2 describes the high quality Adaptive Multi-Rate Wideband (AMR-WB) encoder and decoder that is primarily intended for 7 kHz bandwidth speech signals. AMR-WB operates at a multitude of bit rates ranging from 6.6 kbit/s to 23.85 kbit/s. The bit rate may be changed at any 20-ms frame boundary.
Annex C includes an integrated C source code software package which contains the implementation of the ITU-T G.722.2 encoder and decoder and its Annexes A and B and Appendix I.
A set of digital test vectors for developers is provided in Annex D. These test vectors are a verification tool that can provide an indication of success in implementing this codec. Digital test sequences are necessary to test for a bit-exact implementation of the adaptive, multi-rate wideband (AMR-WB) speech-transcoder; voice-activity detection; comfort noise generation; and source controlled rate operation.

Download data set


Recommendation ITU-T G.723.1: Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s

Rec. ITU-T G.723.1 specifies a coded representation that can be used for compressing the speech or other audio signal component of multimedia services at a very low bit rate. In the design of this coder, the principal application considered was very low bit-rate, visual telephony as part of the overall ITU-T H.324 family of Recommendations. This coder has two bit rates associated with it (5.3 and 6.3 kbit/s).

Download data set


Recommendation ITU-T G.726: 40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code Modulation (ADPCM)

The characteristics below are recommended for the conversion of a 64 kbit/s A-law or mu-law pulse code modulation (PCM) channel to and from a 40, 32, 24 or 16 kbit/s channel. The conversion is applied to the PCM bit stream using an ADPCM transcoding technique. The relationship between the voice frequency signals and the PCM encoding/decoding laws is fully specified in Recommendation ITU-T G.711.
The principal application of 24 and 16 kbit/s channels is for overload channels carrying voice in Digital Circuit Multiplication Equipment (DCME).
The principal application of 40 kbit/s channels is to carry data modem signals in DCME, especially for modems operating at greater than 4800 kbit/s.
The Appendix II describes the test sequences (vectors) for the ADPCM algorithms of Rec. ITU-T G.726 at the four fixed bit rates (16 kbit/s, 24 kbit/s, 32 kbit/s, 40 kbit/s) for both A-law and m-law.
NOTE: Rec. ITU-T G.726 is the consolidation of Rec. ITU-T G.721 (1988) and Rec. ITU-T G.723 (1988), which are now superseded as individual Recommendations.

Download data set


Recommendation ITU-T G.727: 5-, 4-, 3- and 2-bit/sample embedded adaptive differential pulse code modulation (ADPCM)

Rec. ITU-T G.727 contains the specification of an embedded, Adaptive, Differential Pulse Code Modulation (ADPCM) algorithms with 5-, 4-, 3- and 2-bits per sample (i.e., at rates of 40, 32, 24 and 16 kbit/s). The characteristics below are recommended for the conversion of 64 kbit/s. A-law or mu-law PCM channels to/ from variable rate-embedded ADPCM channels.
The Recommendation defines the transcoding law when the source signal is a pulse-code modulated signal at a pulse rate of 64 kbit/s, developed from voice frequency analogue signals, as fully specified by Rec. ITU-T G.711.
Applications that can benefit from other embedded ADPCM algorithms include those in which:
  • the encoder is aware and the decoder is not aware of the way in which the ADPCM codeword bits have been altered,
  • both the encoder and decoder are aware of the ways the codewords are altered,
  • neither the encoder nor the decoder are aware of the ways in which the bits have been altered.
Appendix I describes the test sequences (vectors) for the embedded ADPCM algorithms of Rec. ITU-T G.727.

Download data set


Recommendation ITU-T G.728: Coding of speech at 16 kbit/s using low-delay code excited linear prediction (LD-CELP)

Rec. ITU-T G.728 contains the description of an algorithm for the coding of speech signals at 16 kbit/s using low-delay, code-excited, linear prediction.
The LD-CELP algorithm consists of an encoder and a decoder. The essence of CELP techniques, which is an analysis-by-synthesis approach to codebook search, is retained in LD-CELP. The LD-CELP however, uses backward adaptation of predictors and gain to achieve an algorithmic delay of 0.625 ms. Only the index to the excitation codebook is transmitted. The predictor coefficients are updated through LPC analysis of previously quantized speech. The excitation gain is updated by using the gain information embedded in the previously quantized excitation. The block size for the excitation vector and gain adaptation is five samples only. A perceptual weighting filter is updated using LPC analysis of the unquantized speech.

Download data set


Recommendation ITU-T G.729: Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP)

Rec. ITU-T G.729 contains the description of an algorithm for the coding of speech signals at 8 kbit/s using Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-ACELP). This coder is designed to operate with a digital signal obtained by first performing telephone bandwidth filtering (Rec. ITU-T G.712) of the analogue input signal, then sampling it at 8000 Hz, followed by conversion to 16-bit linear PCM for the input to the encoder. The output of the decoder should be converted back to an analogue signal by similar means. Other input/output characteristics, such as those specified by Rec. ITU-T G.711 for 64 kbit/s PCM data, should be converted to 16-bit linear PCM before encoding, or from 16-bit linear PCM to the appropriate format after decoding. The bitstream from the encoder to the decoder is defined within this Recommendation.
Rec. ITU-T G.729 and its Annexes and Appendices offer different functionalities in terms of various bit rates and/or DTX operations using either fixed point or floating point arithmetic. The table below summarizes these functionalities.

 

Annex

Functionality

-

A

B

C

D

E

F

G

H

I

C+

Low Complexity

 

X

X

 

 

 

 

 

 

 

 

Fixed-point

X

X

X

 

X

X

X

X

X

X

 

Floating-point

 

 

 

X

 

 

 

 

 

 

X

8 kbit/s

X

X

X

X

X

X

X

X

X

X

X

6.4 kbit/s

 

 

 

 

X

 

X

 

X

X

X

11.8 kbit/s

 

 

 

 

 

X

 

X

X

X

X

DTX

 

 

X

 

 

 

X

X

 

X

X

Download data set


Recommendation ITU-T G.729.1: ITU-T G.729 based embedded variable bit-rate coder: An 8-32 kbit/s, scalable wideband, coder-bitstream interoperable with ITU-T G.729 codecs

Rec. ITU-T G.729.1 describes an 8-32 kbit/s, scalable, wideband speech and audio coding algorithm interoperable with ITU-T G.729, ITU-T G.729A and ITU-T G.729B codecs. The output of the ITU-T G.729.1 coder has a bandwidth of 50-4000 Hz when operated at 8 and 12 kbit/s and 50-7000 Hz when operated from 14 to 32 kbit/s. At 8 kbit/s, ITU-T G.729.1 codecs are fully interoperable with codecs conforming to Recommendation ITU-T G.729, Recommendation ITU-T G.729 Annex A and Recommendation ITU-T G.729 Annex B. The coder operates on 20 ms frames and has an algorithmic delay of 48.9375 ms. By default, the encoder input and decoder output are sampled at 16 kHz. The encoder produces an embedded bitstream structured in 12 layers corresponding to 12 available bit rates from 8 to 32 kbit/s. The bitstream can be truncated at the decoder side or by any component of the communication system to adjust "on the fly" the bit rate to the desired value with no need for outband signalling. The underlying algorithm is based on a three-stage coding structure: embedded Code-Excited Linear Predictive (CELP) coding of the lower band (50-4000 Hz), parametric coding of the higher band (4000-7000 Hz) by Time-Domain Bandwidth Extension (TD-BWE), and enhancement of the full band (50-7000Hz) by a predictive transform coding technique referred to as Time-Domain Aliasing Cancellation (TDAC).

Download data set


Recommendation ITU-T G.1050: Network model for evaluating multimedia transmission performance over Internet Protocol

Rec. ITU-T G.1050 describes escribes an IP Network Model that can be used for of evaluating the performance of IP streams. The focus is on packet delay, delay variation, and loss. IP streams from any type of network device can be evaluated using this model.

Download data set