Page 88 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 88
ITU Journal: ICT Discoveries, Vol. 3(1), June 2020
ment of the PSNR measure, termed WPSNR, which was Section 7 then summarizes the results of experimental
examined further in JVET-K0206 [14] and finalized in evaluation of the respectively extended WPSNR, called
JVET-M0091 [15]. More recently, model-free weighting XPSNR in this paper, on various MOS annotated video
was also studied [16]. The WPSNR output values were databases and Section 8 concludes the paper. Note that
found to correlate with subjective mean opinion score parts of this paper were previously published in [18].
(MOS) data at least as well as (MS-)SSIM; see [17],[18].
One particular advantage of the WPSNR is its backward 2. REVIEW OF BLOCK-BASED WPSNR
compatibility with the conventional PSNR. Specifically,
by defining the exponent 0 ≤ ≤ 1[13] controling the The WPSNR output for a codec and a video frame (or
still image) stimulus s is defined, similarly to PSNR, as
impact of the local visual activity measure on the block-
wise distortion weighting parameter (see Sec. 2) as
∙ ∙
WPSNR = 10 ∙ log , (3)
= 0, (1) ∑
where W and H are the luma-channel width and height,
all weights reduce to 1 and, as a result, the WPSNR respectively, of s, BD is the coding bit depth per sample,
becomes equivalent to the PSNR [13],[17]. It is shown
in [13],[19] that a block-wise perceptual weighting of = ∙ ∑ , − , (4)
the local distortion, i.e., the sum of squared errors SSE , ∈
between the decoded and original picture block signal,
is the equivalent of (2) for block at index k, with x, y
as the horizontal and vertical sample coordinates, and
= ∙ = ∙ ∑ , , − , , (2)
can readily be utilized to govern the quantization step- = with exponent = (5)
size in an image or video encoder’s bit allocation unit.
In this way, an encoder can optimize its compression represents the visual sensitivity weight (a scale factor)
result for maximum performance (i.e., minimum mean associated with the N×N sized and calculated from
weighted block SSE and, thus, maximum visual recon- the block’s spatial activity measure and an average
struction quality) according to the WPSNR. overall activity . Details can be found in [17]–[19].
Although, as noted above, the WPSNR proved useful in
∙
the context of still-image coding and achieved similar, = round 128 ∙ ∙ (6)
or even better, subjective performance than MS-SSIM-
based visually motivated bit allocation in video coding was chosen since, for the commonly used HD and UHD
[19], its use as a general-purpose VQA metric for video resolutions of 1920×1080 and 3840×2160 pixels, this
material of varying resolution, bit depth, and dynamic choice conveniently aligns with the largest block size in
range is limited. This is evident from the relatively low modern video codecs. is defined empirically such
correlation between the WPSNR output values and the that, on average, ≈ 1 over a large set of test images
corresponding MOS data available, e.g., from the study and video frames with a specified resolution W·H and
published in [10],[11] or the results of JVET’s 2017 Call
bit depth BD [14]; see also Sec. 5. Hence, as indicated
for Proposals (CfP) on video compression technologies
in Sec. 1.1, the WPSNR is a generalization of the PSNR
with capability beyond HEVC [20]. In fact, this correla- by means of a block-wise weighting of the assessed SSE.
tion was found to be worse than that of (MS-)SSIM and For video signals, the frame-wise logarithmic WPSNRs
VMAF, particularly for ultra-high-definition (UHD) and
values are averaged arithmetically to get a single result:
mixed 8-bit/10-bit video content with a resolution of
more than, say, 2048×1280 luma samples.
WPSNR = ∙ ∑ WPSNR , (7)
1.2 Outline of this paper
with F denoting the evaluated number of video frames.
Given the necessity for an improvement of the WPSNR
metric as indicated in Sec.1.1 above, this paper focuses 3. EXTENSION FOR MOVING PICTURES
on and proposes modifications to several details of the
The spatially adaptive WPSNR method of [17],[19] and
WPSNR algorithm. After summarizing the block-wise
operation of the WPSNR in Section 2, the paper follows Sec. 2 can easily be extended to motion picture signals
, where i is the frame index in the video sequence, by
up with descriptions of low-complexity extensions for
introducing a temporal adaptation into the calculation
motion picture processing (Section 3), improved per-
formance in case of varying video quality (Section 4) or of the visual activity . Given that in our prior studies,
input/output bit depth (Section 5), and the handling of
videos with very high and low resolutions (Section 6). = max , ∑ , ∈ ℎ , , (8)
66 © International Telecommunication Union, 2020