Summary

Recommendation ITU-T G.718 describes a narrow‑band (NB) and wideband (WB) embedded variable bit-rate coding algorithm for speech and audio operating in the range from 8 to 32 kbit/s which is designed to be robust to frame erasures.

This codec provides state-of-the-art NB speech quality over the lower bit rates and state-of-the-art WB speech quality over the complete range of bit rates. In addition, the ITU-T G.718 codec is designed to be highly robust to frame erasures, thereby enhancing the speech quality when used in IP transport applications on fixed, wireless and mobile networks. Despite its embedded nature, the codec also performs well with both NB and WB generic audio signals.

This codec has an embedded scalable structure, enabling maximum flexibility in the transport of voice packets through IP networks of today and in future media-aware networks. In addition, the embedded structure of ITU-T G.718 will easily allow the codec to be extended to provide a super‑wideband and stereo capability through additional layers which are currently under development. The bitstream may be truncated at the decoder side or by any component of the communication system to instantaneously adjust the bit rate to the desired value without the need for out-of-band signalling. The encoder produces an embedded bitstream structured in five layers corresponding to the five available bit rates: 8, 12, 16, 24 and 32 kbit/s.

The ITU-T G.718 encoder can accept WB sampled signals at 16 kHz, or NB signals sampled at either 16 or 8 kHz. Similarly, the decoder output can be 16 kHz WB, in addition to 16 or 8 kHz NB. Input signals sampled at 16 kHz, but with bandwidth limited to NB, are detected by the encoder.

The output of the ITU-T G.718 codec is capable of operating with a bandwidth of 300-3400 Hz at 8 and 12 kbit/s and 50-7000 Hz from 8 to 32 kbit/s.

The high quality codec core represents a significant performance improvement, providing 8 kbit/s wideband clean speech quality equivalent to the ITU-T G.722.2 codec at 12.65 kbit/s whilst the 8 kbit/s narrow‑band codec operating mode provides clean speech quality equivalent to the ITU‑T G.729E codec at 11.8 kbit/s.

The codec operates on 20‑ms frames and has a maximum algorithmic delay of 42.875 ms for wideband input and wideband output signals. The maximum algorithmic delay for narrow‑band input and narrow‑band output signals is 43.875 ms. The codec may also be employed in a low-delay mode when the encoder and decoder maximum bit rates are set to 12 kbit/s. In this case, the maximum algorithmic delay is reduced by 10 ms.

The codec also incorporates an alternate coding mode, with a minimum bit rate of 12.65 kbit/s, which is bitstream interoperable with Recommendation ITU-T G.722.2, 3GPP AMR-WB and 3GPP2 VMR-WB mobile WB speech coding standards. This option replaces layer 1 and layer 2, and the layers 3-5 are similar to the default option with the exception that in layer 3 fewer bits are used to compensate for the extra bits of the 12.65 kbit/s core. The decoder is further able to decode all other ITU‑T G.722.2 operating modes. Furthermore, a new annex to this Recommendation is under development that will efficiently enable bit-stream interoperability with the 3GPP2 EVRC‑WB codec. This Recommendation also includes discontinuous transmission mode (DTX) and comfort noise generation (CNG) algorithms that enable bandwidth savings during inactive periods. An integrated noise reduction algorithm can be used provided that the communication session is limited to 12 kbit/s.

The underlying algorithm is based on a two-stage coding structure: the lower two layers are based on code-excited linear prediction (CELP) coding of the band (50-6400 Hz) where the core layer takes advantage of signal classification to use optimized coding modes for each frame. The higher layers encode the weighted error signal from the lower layers using overlap-add modified discrete cosine transformation (MDCT) transform coding. Several technologies are used to encode the MDCT coefficients to maximize performance for both speech and music.

Corrigendum 1 (11/2008) corrects a number of minor problems that have been identified in the fixed-point ANSI C source code of the base text of this Recommendation.

Amendment 1 (03/2009) introduces some additional minor corrections to the fixed-point ANSI C source code and to the text of the Recommendation. It also describes an addition of a verification of the default value of the layer 5 unused bit, and the procedure of erasure of layer 5 if the bit does not have the default value. Amendment 1 also introduces the new Annex A, which defines an alternative implementation of the ITU-T G.718 algorithm using floating point arithmetic to be used for implementation on DSP hardware optimized for floating-point operations. The accompanying floating point ANSI C source code is fully interoperable with the fixed-point code.

While Corrigendum 2 (08/2009) includes further corrections to address minor problems found in both the fixed and floating-point implementations, its main benefit is in the streamlining of the fixed‑point implementation which reduces the complexity of the codec from 69 to 57 WMOPS whilst remaining bit-exact with the original code on both steps of the characterization text. This 17% complexity reduction is significant and will clearly make the G.718 more attractive to implement.

This Recommendation contains an electronic attachment with the ANSI C source code, which is an integral part of this Recommendation.

This edition integrates all changes introduced by Corrigendum 1 (11/2008), Amendment 1 (03/2009) and Corrigendum 2 (08/2009), including the associated updated ANSI C source code.