Page 127 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media

P. 127

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020

sitivities and the sound pressure levels measured at the ears BS.1770 directional weights, no insights could be gained
of the HATS during the calibration stage. from effects related to elevation and, consequently, no ad-
vancements were made on this front. Reproduction of the
The problem was to ﬁnd a vector ~g that takes a scalar ob-
procedure in [6, 8], which consists on estimating gains per
jective function f(~g) to a minimum, subject to constraints:
subject and taking the average, yielded an overall summa-
( tion gain of g = 3.54 dB.
c(g i ) ≤ 0, ∀i
min f(~g) such that (3)
~g lb ≤ g i ≤ ub,∀i
3.2 Regression problem
where f(~g) is the SSE between responses and predictions
Treating gain estimation as a regression problem means
of Equation (2), upper bound is perfect loudness summa-
training regression models to predict the sensitivity re-
tion (ub = 10 dB), lower bound is no summation at all
sponse variable. Predictors are localization cues chosen
(lb = 0 dB), and the non-linear constraint c(g i ) is deﬁned
according to their correlation with subject responses. Then
such that, in every i-th direction, equivalent monaural SPLs
follows training and cross-validation of a regression model,
computed by Equation (1) cannot exceed the maximum
until it is considered adequate under some performance cri-
sound pressure level reproduced in the listening test:
teria.
!
L left,i L right,i
c(g i ) = g i ×log 2 g i +2 g i −75 ≤ 0, ∀i. Interaural Level Difference (ILD) and Interaural Time Dif-
2
ference (ITD) are major localization cues in spatial hear-
(4) ing. While the former is accounted for in Equation (1), the
Although the estimated gain for rear incidence hit the up- latter is considered when computing the Interaural Cross-
per bound, solutions from all azimuths on the bottom plane, Correlation Coefﬁcient (IACC), a measure of similarity be-
◦
◦
frontal incidence, azimuths 0 and 180 on the upper plane, tween ear signals given by the Interaural Cross-Correlation
◦
◦
and from the top loudspeaker (0 ,90 ); converged to a Function (IACF):
global minimum of 0.69 dB. After small gain adjustments
R 80 ms
in order to make them symmetric with respect to the sagit- 0 ms s left (t)s right (t +τ)dt , (5)
tal plane, and normalization of the largest gains in the set, IACF(τ) =r h R 80 ms 2 ih R 80 ms 2 i
◦
◦
corresponding to (±90 ,0 ) directions, to 1.5 dB, the re- 0 ms s left (t)dt 0 ms s right (t)dt
sulting weights are listed in Table 5.
where t is time, τ is the interaural delay and s left and s right
are the signals from the left and right ears, respectively.
Table 5 – Directional weights estimated by solving a constraint
The IACC is deﬁned as the maximum absolute value within
minimization problem.
τ ±1 ms:
◦
◦
Azimuths θ ( ) Elevations φ ( ) Gain~g (dB)
IACC = max |IACF(τ)| . (6)
−45 −30 0.00 ∀τ∈[−1ms, 1ms]
0 −30 0.00
+45 −30 0.00 The interaural delay τ is an estimate of ITD when IACF(τ)
−135 0 0.44 is maximum. The quantity 1−IACC is associated with the
−90 0 1.50 magnitude of spatial impression of a sound [13].
−60 0 1.10
−30 0 0.65 Moreover, the effect of contralateral incidence observed in
+0 0 0.00 [8] was taken into account in a recent update of Glasberg
+30 0 0.65 and Moore’s loudness model, which incorporated binaural
+60 0 1.10
inhibition [14]. Being IF ipsilateral the inhibition factor by
+90 0 1.50
which the short-term loudness of the ipsilateral signal is
+135 0 0.44
reduced by the effect of the contralateral signal, the inhibi-
+180 0 0.00
tion model can be written in the form:
−135 +30 0.28
−90 +30 0.93 2
−45 +30 1.06 IF ipsilateral = h n o i , (7)
γ
0 +30 0.00 1+ sech STL contralateral
STL ipsilateral
+45 +30 1.06
+90 +30 0.93 where STL contralateral and STL ipsilateral are vectors of short-
+135 +30 0.28 term loudness values for contralateral and ipsilateral ears,
+180 +30 0.00 and γ = 1.598. In the updated model, short-term loudness
0 +90 0.00 STL left and STL right are then divided by IF left and IF right ,
respectively. The value of γ was deﬁned such that for diotic
1.598
Note that all incidences with 0 dB weighting come from the sounds the term in braces yields [sech(1)] = 0.5, and a
bottom and median sagittal planes. Even though the esti- diotic sound is predicted 1.5 times louder than its monaural
mated values seemed reasonable when compared to ITU-R equivalent.
© International Telecommunication Union, 2020 105

122 123 124 125 126 127 128 129 130 131 132