Page 92 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 92

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020






          (UHD) video resolutions. The others, e.g. IVP and parts   Table 1 – Evaluation results for Pearson linear correlation.    High-

          of [10],[11] and [20] (HD), end up in between. Further   er values indicate higher correlation with associated MOS values.
          information on the sets is planned to be hosted at [42].
                                                                Set   PSNR  SSIM  MS-SSIM  VMAF  WPSNR  XPSNR
          The VMAF data for the aforementioned datasets were
          obtained with software version 1.3.15 (Sep. 2019) and   Yonsei  0.822  0.789  0.765   0.942  0.916   0.919
          VQA model version 0.6.1 (4K subvariant for UHD con-

                                                                LIVE  0.539  0.626  0.675   0.729  0.637   0.702

          tent). The VMAF software [5] also provided the PSNR

          and SSIM statistics. The MS-SSIM, WPSNR, and XPSNR    IVP   0.632  0.570  0.546   0.591  0.686   0.707
          values were calculated with proprietary C++ software,   ECVQ  0.733  0.879  0.853   0.830  0.848   0.840
          using only the luma component of the video sequences
          to derive the model output. Further metrics like those   EVVQ  0.727  0.881  0.874   0.937  0.880   0.898

          of [22],[26] as well as the handling of the chroma com-  SJTU  0.721  0.765  0.810   0.827  0.783   0.829
          ponents are planned to be added in a follow-up study   CfP   0.717  0.794  0.743   0.862  0.692   0.863
          and will also be provided at [42] once available.
                                                                [10]  0.722  0.826  0.799   0.855  0.759   0.817
          Tables 1 and 2 contain the dataset-wise (in rows) PLCC
          and SROCC results, respectively, for the comparisons of   mean  0.702  0.766  0.758   0.822  0.775   0.822
          the subjective MOS annotations and the results of each
          objective VQA measure (in columns) examined herein.
                                                               Table 2 – Evaluation results for Spearman rank order correlation.
          The closer the correlation value for a VQA metric is to   As in Table   1, smoothing is used only on the ECVQ and EVVQ sets.
          one, the better the metric succeeds in predicting aver-
          age (across frames and videos) quality as judged by a   Set   PSNR  SSIM  MS-SSIM  VMAF  WPSNR  XPSNR

          group of human viewers. For each dataset, the best re-
                                                                Yonsei  0.860  0.949  0.925   0.915  0.939   0.935

          sult is written in bold type. The LIVE and IVP sets also
          contain distortion types resembling error concealment   LIVE  0.523  0.694  0.732   0.752  0.605   0.675


          techniques applied, e. g., during IP packet loss. As the   IVP   0.647  0.635  0.574   0.580  0.690   0.709
          tested VQA methods were not explicitly developed for
          such scenarios, the correlation values here are some-  ECVQ  0.762  0.916  0.881   0.736  0.859   0.847
          what lower than those for the other sets.             EVVQ  0.764  0.908  0.911   0.874  0.905   0.926
          Overall, it can be observed that the original WPSNR of   SJTU  0.739  0.807  0.799   0.791  0.814   0.877
          [14],[17] reaches considerably higher correlation with
                                                                CfP   0.739  0.810  0.881   0.867  0.724   0.866
          the MOS data than the PSNR metric and that the XPSNR

          algorithm further increases this advantage. Moreover,   [10]  0.703  0.848  0.832   0.850  0.730   0.813
          the performance of the XPSNR is, for both PLCC as well
                                                                mean  0.717  0.821  0.817   0.796  0.783   0.831
          as SROCC, very similar to that of the best of the other
          evaluated VQA methods (VMAF for PLCC and SSIM for
          SROCC) on average, as tabulated in the “mean” rows, a   8.  DISCUSSION AND CONCLUSION
          result which indicates the clear merit of this approach.
                                                               This paper reviewed, and proposed extensions to, the
                                                               authors’ previous work on perceptually weighted peak
          7.2  Comparison of computational complexity
                                                               signal-to-noise (WPSNR) assessments of compressed
          To evaluate the computational complexity of XPSNR in   video material. More precisely, a low-complexity tem-

          relation to that of the other VQA methods tested, soft-  poral visual activity model (Sec.3),an alternative frame


          ware runtime analysis (using virtual in-RAM I/O) with   averaging method (Sec. 4), bit depth independent out-
          single-threaded execution was carried out. The results,   put value scaling (Sec. 5), and spatial 4× downsampling
          albeit not very accurate, indicate that the WPSNR and   (for very high resolutions) and in-place smoothing (for
          SSIM algorithms are about 2× as complex as the PSNR   very low resolutions) while calculating the block-wise
          design, XPSNR is about 3× as complex as PSNR, and the   visual sensitivity values   (Sec. 6) were incorporated
          MS-SSIM and VMAF methods are about 9× as complex.    into the WPSNR design. The resulting extended WPSNR

          Note that, due to independent calculability of the per-  algorithm,called XPSNR,was demonstrated to perform


          block    and       data, the WPSNR and XPSNR can     at least as well as the best competing state-of-the-art


          easily be optimized for fast, multi-threaded execution,   VQA solutions in predicting the subjective judgments
          possibly with “single instruction multiple data” (SIMD)   of a number of MOS annotated coded-video databases.
          operations on modern CPUs thanks to the inter-block   This was achieved with a notably lower computational
          independence during the spatio-temporal activity cal-  complexity than that required for obtaining, e. g., VMAF

          culation. However, unlike in VMAF, such optimizations   or MS-SSIM results, and without optimizing the XPSNR
          have not been implemented in XPSNR as of this writing.   parameters (      ,       ,  , and  ) for the particular sets



          70                                 © International Telecommunication Union, 2020
   87   88   89   90   91   92   93   94   95   96   97