Page 127 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 127

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020





          sitivities and the sound pressure levels measured at the ears  BS.1770 directional weights, no insights could be gained
          of the HATS during the calibration stage.           from effects related to elevation and, consequently, no ad-
                                                              vancements were made on this front. Reproduction of the
          The problem was to find a vector ~g that takes a scalar ob-
                                                              procedure in [6, 8], which consists on estimating gains per
          jective function f(~g) to a minimum, subject to constraints:
                                                              subject and taking the average, yielded an overall summa-
                                (                             tion gain of g = 3.54 dB.
                                  c(g i ) ≤ 0, ∀i
                 min f(~g) such that                  (3)
                  ~g              lb ≤ g i ≤ ub,∀i
                                                              3.2 Regression problem
          where f(~g) is the SSE between responses and predictions
                                                              Treating gain estimation as a regression problem means
          of Equation (2), upper bound is perfect loudness summa-
                                                              training regression models to predict the sensitivity re-
          tion (ub = 10 dB), lower bound is no summation at all
                                                              sponse variable. Predictors are localization cues chosen
          (lb = 0 dB), and the non-linear constraint c(g i ) is defined
                                                              according to their correlation with subject responses. Then
          such that, in every i-th direction, equivalent monaural SPLs
                                                              follows training and cross-validation of a regression model,
          computed by Equation (1) cannot exceed the maximum
                                                              until it is considered adequate under some performance cri-
          sound pressure level reproduced in the listening test:
                                                              teria.
                                           !
                            L left,i  L right,i
          c(g i ) = g i ×log  2  g i  +2  g i  −75 ≤ 0, ∀i.   Interaural Level Difference (ILD) and Interaural Time Dif-
                       2
                                                              ference (ITD) are major localization cues in spatial hear-
                                                      (4)     ing. While the former is accounted for in Equation (1), the
          Although the estimated gain for rear incidence hit the up-  latter is considered when computing the Interaural Cross-
          per bound, solutions from all azimuths on the bottom plane,  Correlation Coefficient (IACC), a measure of similarity be-
                                        ◦
                                ◦
          frontal incidence, azimuths 0 and 180 on the upper plane,  tween ear signals given by the Interaural Cross-Correlation
                                         ◦
                                     ◦
          and from the top loudspeaker (0 ,90 ); converged to a  Function (IACF):
          global minimum of 0.69 dB. After small gain adjustments
                                                                              R  80 ms
          in order to make them symmetric with respect to the sagit-          0 ms  s left (t)s right (t +τ)dt  , (5)
          tal plane, and normalization of the largest gains in the set,  IACF(τ) =r h R  80 ms 2  ih R  80 ms 2  i
                            ◦
                               ◦
          corresponding to (±90 ,0 ) directions, to 1.5 dB, the re-         0 ms  s left (t)dt  0 ms  s right (t)dt
          sulting weights are listed in Table 5.
                                                              where t is time, τ is the interaural delay and s left and s right
                                                              are the signals from the left and right ears, respectively.
          Table 5 – Directional weights estimated by solving a constraint
                                                              The IACC is defined as the maximum absolute value within
          minimization problem.
                                                              τ ±1 ms:
                                      ◦
                        ◦
               Azimuths θ ( )  Elevations φ ( )  Gain~g (dB)
                                                                        IACC =    max    |IACF(τ)| .      (6)
                   −45          −30          0.00                              ∀τ∈[−1ms, 1ms]
                    0           −30          0.00
                   +45          −30          0.00             The interaural delay τ is an estimate of ITD when IACF(τ)
                  −135           0           0.44             is maximum. The quantity 1−IACC is associated with the
                   −90           0           1.50             magnitude of spatial impression of a sound [13].
                   −60           0           1.10
                   −30           0           0.65             Moreover, the effect of contralateral incidence observed in
                   +0            0           0.00             [8] was taken into account in a recent update of Glasberg
                   +30           0           0.65             and Moore’s loudness model, which incorporated binaural
                   +60           0           1.10
                                                              inhibition [14]. Being IF ipsilateral the inhibition factor by
                   +90           0           1.50
                                                              which the short-term loudness of the ipsilateral signal is
                  +135           0           0.44
                                                              reduced by the effect of the contralateral signal, the inhibi-
                  +180           0           0.00
                                                              tion model can be written in the form:
                  −135          +30          0.28
                   −90          +30          0.93                                         2
                   −45          +30          1.06                  IF ipsilateral = h  n           o i ,  (7)
                                                                                                     γ
                    0           +30          0.00                              1+ sech   STL contralateral
                                                                                         STL ipsilateral
                   +45          +30          1.06
                   +90          +30          0.93             where STL contralateral and STL ipsilateral are vectors of short-
                  +135          +30          0.28             term loudness values for contralateral and ipsilateral ears,
                  +180          +30          0.00             and γ = 1.598. In the updated model, short-term loudness
                    0           +90          0.00             STL left and STL right are then divided by IF left and IF right ,
                                                              respectively. The value of γ was defined such that for diotic
                                                                                               1.598
          Note that all incidences with 0 dB weighting come from the  sounds the term in braces yields [sech(1)]  = 0.5, and a
          bottom and median sagittal planes. Even though the esti-  diotic sound is predicted 1.5 times louder than its monaural
          mated values seemed reasonable when compared to ITU-R  equivalent.
                                             © International Telecommunication Union, 2020                   105
   122   123   124   125   126   127   128   129   130   131   132