Rec. ITU-T P.565 (01/2020) Framework for creation and performance testing of machine learning based models for the assessment of transmission network impact on speech quality for mobile packet-switched voice services
Summary
History
FOREWORD
Table of Contents
1 Scope
2 References
3 Definitions
4 Abbreviations and acronyms
5 Conventions
6 Applications for models developed based on the framework
7 High level overview of the framework
     7.1 Framework architecture
     7.2 Generic jitter files
     7.3 Reference speech file
8 Learning and validation database generator for EVS use case
     8.1 Simulate network block
     8.2  EVS coding and decoding blocks: EVS codec and codec parameters
     8.3 MOS grading block
     8.4 EVS process jitter file block: processing of the jitter file
          8.4.1 DTX cleaning
          8.4.2 Add–on codec information
     8.5 Learning and validation databases
9 Machine learning module for EVS use case
     9.1 ML algorithm
     9.2 ML features
          9.2.1 ML features creation
          9.2.2 ML features selection
10 Statistical evaluation module
11 Framework's inputs and outputs
12 Aspects related to the run-time of models developed based on the framework
     12.1 Operation mode
     12.2 Reference speech samples
     12.3 Pre-processing at run time
     12.4 The measurement procedure
13 Requirements for models developed based on the framework
     13.1 Mandatory conditions and procedures
     13.2 Minimum performance requirements
Annex A  AMR WB codec use case
     A.1 Differences caused by AMR WB use case
     A.2 Learning and validation database generator: simulator
     A.3 Machine learning
     A.4 Performance results
Annex B  OTT codec use case
     B.1 Differences caused by voice OTT use case
     B.2 Learning and validation database generator: simulator
     B.3 Machine learning module
     B.4 Performance results
Annex C  ML overfitting/underfitting test
Annex D  Check list of requirements for a model developed based on the framework
Annex E  Conditions and requirements of an additional independent validation  of a model developed based on the framework
     E.1 Conditions and requirements of an independent validation
     E.2 Validation procedure
Annex F  Electronic attachments
Appendix I  Procedure for feature extraction based on machine learning
     I.1 Create statistical features
     I.2 Create jitter buffer-based features
     I.3 Codec based features (rate and channel aware)
     I.4 Create reference speech-based features
          I.4.1 Types of reference speech-based features
          I.4.2 Features' weighting function calculation
Appendix II   Example of number of scores versus models' performance
Appendix III  Descriptions of generic jitter files creation
     III.1 Learning and validation generic jitter files
          III.1.1 Live (drive test) data modulated with simulations
          III.1.2 Gilbert burst packet loss and burst jitter
          III.1.3 Gilbert severe burst jitter
          III.1.4 Random packet loss and random jitter
          III.1.5 Manually designed test cases
     III.2 Unknown validation live data sets description
Appendix IV  Justification of the minimum requirements based  on performance results' analysis
     IV.1 Enhanced voice services use case
     IV.2 AMR WB use case
     IV.3 Voice OTT use case
Bibliography