2023 |
J-H. Hanschke, D. Arteaga, G. Cengarle, J. B. Lando, M. R. P. Thomas; A. J. Seefeldt Improved Panning on Non-Equidistant Loudspeakers with Direct Sound Level Compensation Conference Proc. Audio Engineering Soc. Conv., New York, USA, 2023. @conference{Hanschke2023, |
M. R. P. Thomas; J-H. Hanschke Inverted Cardioid Topology for Multi-Radius Spherical Microphone Arrays Conference Proc. Workshop on App. of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, New York, USA, 2023. @conference{Thomas2023, |
2022 |
K. Kjoerling; D. S. McGrath; H. Purnhagen; M. R. P. Thomas Methods and devices for coding soundfield representation signals Patent 2022. @patent{Kjoerling2022, The present document describes a method (40) for encoding a soundfield representation (SR) input signal(101, 301) describing a soundfield at a reference position, wherein the SR input signal (101, 301) comprises a plurality of channels for a plurality of different directivity paterns of the sound field at the reference position. The method (40)comprises extracting (401) one or more audio objects (103, 303) from the SR input signal (101,301). Furthermore, the method (40) comprises determining (402) a residuals signal (102, 302) based on the SR input signal (101 ,301) and based on the one or more audio objects (103, 303). The method(40) also comprises performing joint coding of the one or more audio objects (103, 303) and/or the residual signal (102, 302). In addition, the method (40) comprises generating (403) a bitstream (701) based on data generated in the context of joint coding of the one or more audio objects (103, 303) and/or the residual signal(102,302). |
N. R. Tsingos; M. R. P. Thomas; C. Fersch Methods, apparatus and systems for encoding and decoding of directional sound sources Patent 2022. @patent{Tsingos2022, Some disclosed methods involve encoding or decoding directional audio data. Some encoding methods may involve receiving a mono audio signal corresponding to an audio object and a representation of a radiation pattern corresponding to the audio object. The radiation pattern may include sound levels corresponding to a plurality of sample times, a plurality of frequency bands and a plurality of directions. The methods may involve encoding the mono audio signal and encoding the source radiation pattern to determine radiation pattern metadata. Encoding the radiation pattern may involve determining a spherical harmonic transform of the representation of the radiation pattern and compressing the spherical harmonic transform to obtain encoded radiation pattern metadata. |
2021 |
C. Q. Robinson; M. R. P. Thomas; M. J. Smithers Methods and devices for bass management Patent 17/286,313, 2021. @patent{Robinson2021, Some disclosed methods involve multi-band basa management. Some such examples may involve aplying multiple high-pasand low-pas filter frequencies for the purpose of bass input management. Some disclosed methods treat at least some low-frequency signals asaudio objects that can be panned. Some disclosed methods involve paninng low and high frequencies separately. Folowing high-pass rendering, a power audit may determine a low-frequency deficit factor that is to be reproduced by subwofers or other low-frequency-capable loudspeakers. |
N. Akbar; G. N. Dickins; M. R. P. Thomas; P. Samarasinghe; T. Abhayapala Reducing Modal Error Propagation through Correcting Mismatched Microphone Gains Using Rapid Proceedings Article In: Proc. International Conf. Acoustics, Speech, and Signal Process. (ICASSP) , 2021. @inproceedings{Akbar2021, |
2020 |
N. Akbar, G. N. Dickins, M. R. P. Thomas A Practical Approach for Microphone Array Calibration in Augmented and Virtual Reality Applications Proceedings Article In: Proc. International Conf. on 3D Immersion (IC3D), 2020. @inproceedings{Akbar2020b, |
M. R. P. Thomas; J-H. Hanschke Methods, apparatus and systems for audio sound field capture Patent US 10,721,559, 2020. @patent{Thomas2020, |
N. Akbar; G. Dickins; M. R. P. Thomas; P. Samarasinghe; T. Abhayapala A Novel Method for Obtaining Diffuse Field Measurements for Microphone Calibration Proceedings Article In: Proc. International Conf. Acoustics, Speech, and Signal Process. (ICASSP), Barcelona, Spain, 2020. @inproceedings{Akbar2020, We propose a straightforward and cost-effective method to perform diffuse soundfield measurements for calibrating the magnitude response of a microphone array. Typically, such calibration is performed in a diffuse soundfield created in reverberation chambers, an expensive and time-consuming process. A method is proposed for obtaining diffuse field measurements in untreated environments. First, a closed-form expression for the spatial correlation of a wideband signal in a diffuse field is derived. Next, we describe a practical procedure for obtaining the diffuse field response of a microphone array in the presence of a non-diffuse soundfield by the introduction of random perturbations in the microphone location. Experimental spatial correlation data obtained is compared with the theoretical model, confirming that it is possible to obtain diffuse field measurements in untreated environments with relatively few loudspeakers. A 30 second test signal played from 4-8 loudspeakers is shown to be sufficient in obtaining a diffuse field measurement using the proposed method. An Eigenmike ® is then successfully calibrated at two different geographical locations. |
2019 |
P T Bilinski; J Ahrens; M R P Thomas; I J Tashev; J C Platt; D E Johnston HRTF personalization based on anthropometric features Patent 9,900,722, 2019. @patent{Bilinski2019, The derivation of personalized HRTFs for a human subject based on the anthropometric feature parameters of the human subject involves obtaining multiple anthropometric feature parameters and multiple HRTFs of multiple training subjects. Subsequently, multiple anthropometric feature parameters of a human subject are acquired. A representation of the statistical relationship between the plurality of anthropometric feature parameters of the human subject and a subset of the multiple anthropometric feature parameters belonging to the plurality of training subjects is determined. The representation of the statistical relationship is then applied to the multiple HRTFs of the plurality of training subjects to obtain a set of personalized HRTFs for the human subject. |
P T Bilinski; J Ahrens; M R P Thomas; I J Tashev; J C Platt; D E Johnston HRTF Personalization Based on Anthropometric Features Patent 10,313,818, 2019. @patent{Bilinski2019b, The derivation of personalized HRTFs for a human subject based on the anthropometric feature parameters of the human subject involves obtaining multiple anthropometric feature parameters and multiple HRTFs of multiple training subjects. Subsequently, multiple anthropometric feature parameters of a human subject are acquired. A representation of the statistical relationship between the plurality of anthropometric feature parameters of the human subject and a subset of the multiple anthropometric feature parameters belonging to the plurality of training subjects is determined. The representation of the statistical relationship is then applied to the multiple HRTFs of the plurality of training subjects to obtain a set of personalized HRTFs for the human subject. |
M. R. P. Thomas Practical Concentric Open Sphere Cardioid Microphone Array Design for Higher Order Sound Field Capture Proceedings Article In: Proc. {IEEE} Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019. @inproceedings{Thomas2019, The problem of higher order sound field capture with spherical microphone arrays is considered. While A-format cardioid designs are commonplace for first order capture, interest remains in the increased spatial resolution delivered by higher order arrays. Spherical arrays typically use omnidirectional microphones mounted on a rigid baffle, from which higher order spatial components are estimated by accounting for radial mode strength. This produces a design trade-off between with small arrays for spatial aliasing performance and large arrays for reduced amplification of instrument noise at low frequencies. A practical open sphere design is proposed that contains cardioid microphones mounted at multiple radii to fulfill both criteria. A design example with a two spheres of 16-channel cardioids at 42 mm and 420 mm radius produces white noise gain above unity on third order components down to 200 Hz, a decade lower than a rigid 32-channel 42 mm sphere of omnidirectional microphones. |
P T Bilinski; J Ahrens; M R P Thomas; I J Tashev; J C Platt; D E Johnston HRTF personalization based on anthropometric features Patent 10,284,992, 2019. @patent{Bilinski2019a, The derivation of personalized HRTFs for a human subject based on the anthropometric feature parameters of the human subject involves obtaining multiple anthropometric feature parameters and multiple HRTFs of multiple training subjects. Subsequently, multiple anthropometric feature parameters of a human subject are acquired. A representation of the statistical relationship between the plurality of anthropometric feature parameters of the human subject and a subset of the multiple anthropometric feature parameters belonging to the plurality of training subjects is determined. The representation of the statistical relationship is then applied to the multiple HRTFs of the plurality of training subjects to obtain a set of personalized HRTFs for the human subject. |
H. Gamper; D. Johnston; I. J. Tashev; A. Politis; M. R. P. Thomas Systems and Methods for Non-Parametric Processing of Head Geometry for HRTF Personalization Patent US 10,278,002, 2019. @patent{Gamper2018, Systems and methods for HRTF personalization are provided. More specifically, the systems and methods provide HRTF personalization utilizing non-parametric processing of three-dimensional head scans. Accordingly, the systems and methods for HRTF personalization generate a personalized set of HRTFs for a user without having to extract specific geometric and/or anthropometric features from a three dimensional head scan of a user and/or from the three dimensional head scans of training subjects in a database. |
2018 |
M. R. P. Thomas; C. Q. Robinson Amplitude Panning and the Interior Pan Proceedings Article In: Proc. Tonmeistertagung, Cologne, Germany, 2018. @inproceedings{Thomas2018, The perception of source location using multi-loudspeaker amplitude panning is considered. While there have been many studies on the localization of pairwise panned sound sources, relatively few studies investigate the multi-loudspeaker case. This paper evaluates panning scenarios in which a source is panned on the boundary or within the area bounded by discrete loudspeakers, referred to as boundary and interior pans respectively. Subjective testing of a variety of pan locations reveal the following: (1) as expected, pans to a single loudspeaker yield lowest localization error, (2) pairwise pans tend to be consistently localized closer to the listener than single loudspeaker pans, (3) largest errors occur when the virtual source is panned close to the listener, and (4) interior pans are accurately perceived and, surprisingly, in some cases more accurately than pairwise boundary pans. |
S Kordon; J-M Batke; A Kreuger; M R P Thomas 10,021,508, 2018. @patent{Kordon2018, |
2017 |
M R P Thomas; C Q Robinson Amplitude Panning and the Interior Pan Proceedings Article In: Proc. Audio Eng. Soc. Convention, New York, USA, 2017. @inproceedings{Thomas2017a, |
M R P Thomas Fast Computation of Cubature Formulae for the Sphere Proceedings Article In: Proc. Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA), San Francisco, CA, 2017. @inproceedings{Thomas2017, The near-uniform distribution of nodes on the surface of a sphere has found many uses in numerical integration, physics, chemistry, crystallography, and more recently in the capture, representation and reproduction of spatial audio. A popular solution posed by Fliege and Meyer treats nodes as charged particles that are constrained to lie on the surface of a sphere that undergo mutual repulsion. The potential energy contained within the system forms a cost function that is minimized by numerical optimization using simulated annealing followed by Limited memory-BFGS (L-BFGS). In this work the cost function’s numerical gradient is replaced with an analytical gradient and it is shown that this is sufficient for a L-BFGS alone to achieve similar results. It is also proposed to limit iterations once the solution has converged beyond a threshold for use cases in mechanical designs with much higher error tolerances. A single-core C implementation on a modern machine shows up to 146 times faster convergence times compared with the Fliege implementation. In the limited precision case, all numbers of nodes under 1600 are calculated in under one minute and under 380 in under a second. |
H Gamper; M R P Thomas; I J Tashev; D E Johnston Systems and Methods for Audio Creation and Delivery Patent US 9,609,436, 2017. @patent{Gamper2017, |
2016 |
H Gamper; M R P Thomas; L Corbin; I J Tashev Synthess of Device-Independent Noise Corpora for Realistic ASR Evaluation Proceedings Article In: Proc. Interspeech Conf., San Francisco, CA, 2016. @inproceedings{Gamper2016, In order to effectively evaluate the accuracy of automatic speech recognition (ASR) with a novel capture device, it is important to create a realistic test data corpus that is representative of real-world noise conditions. Typically, this involves either recording the output of a device under test (DUT) in a noisy environment, or synthesizing an environment over loudspeakers in a way that simulates realistic signal-to-noise ratios (SNRs), reverberation times, and spatial noise distributions. Here we propose a method that aims at combining the realism of in-situ recordings with the convenience and repeatability of synthetic corpora. A device-independent spatial recording containing noise and speech is combined with the measured directivity pattern of a DUT to generate a synthetic test corpus for evaluating the performance of an ASR system. This is achieved by a spherical harmonic decomposition of both the sound field and the DUT’s directivity patterns. Experimental results suggest that the proposed method can be a viable alternative to costly and cumbersome device-dependent measurements. The proposed simulation method predicted the SNR of the DUT response to within about 3 dB and the word error rate (WER) to within about 20%, across a range of test SNRs, target source directions, and noise types. |
A Politis; M R P Thomas; H Gamper; I J Tashev Application of 3D Spherical Transforms To Personalization Of Head-Related Transfer Functions Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016. @inproceedings{Politis2016, Head-related transfer functions (HRTFs) depend on the shape of the human head and ears, motivating HRTF personalization methods that detect and exploit morphological similarities between subjects in an HRTF database and a new user. Prior work determined similarity from sets of morphological parameters. Here we propose a non-parametric morphological similarity based on a harmonic expansion of head scans. Two 3D spherical transforms are explored for this task, and an appropriate shape similarity metric is defined. A case study focusing on personalisation of interaural time differences (ITDs) is conducted by applying this similarity metric on a database of 3D head scans. |
M R P Thomas; H Gamper; I J Tashev BFGUI: An Interactive Tool for the Synthesis and Analysis of Microphone Array Beamformers Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016. @inproceedings{Thomas2016, Microphone arrays are beneficial for distant speech capture because the signals they capture can be exploited with beamforming to suppress noise and reverberation. The theory for the design and analysis of microphone arrays is well established, however the performance of a microphone array beamformer is often subject to conflicting criteria that need to be assessed manually. This paper describes BFGUI, a interactive graphical tool for MATLAB, for simulating microphone arrays and synthesizing beamformers, and whose parameters can be modified and performance metrics monitored in real-time. Primarily aimed at teaching and research, this tool provides the user with an intuitive insight into the effects of microphone types, number and geometry, and the influence of design constraints such as regularization and white noise gain on derived metrics. The resulting directivity pattern, directivity index and front-back ratio are examples of such metrics. Multiple analytic microphone models are supported and external measured microphone directivity patterns can also be loaded. The designs can be then exported in a variety of formats for processing of real-world data. |
2015 |
H Gamper; M R P Thomas; I J Tashev Anthropometric parameterisation of a spherical scatterer ITD model with arbitrary ear angles Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, 2015. @inproceedings{Gamper2015a, Accurate modelling of the interaural time difference (ITD) is crucial for rendering localised sound. Parametric models allow personalising ITDs using anthropometrics. However, the mapping between anthropometric features and model parameters is not straightforward. Here, we propose deriving personalised ITD model parameters from a sphere fitted to a 3-D head scan. The proposed ITD personalisation is evaluated on an HRTF database containing 181 subjects, for a simple spherical ITD model as well as a frequency and elevation-dependent spherical scatterer model with arbitrary ear angles. |
F Lim; M R P Thomas; P A Naylor; I J Tashev Acoustic Blur Kernel with Sliding Window for Blind Estimation of Reverberation Time Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, 2015. @inproceedings{Lim2015a, Reverberation time, or T60, is a key parameter used for characterizing acoustic spaces. Blind T60 estimation is useful for many applications including speech intelligibility estimation, acoustic scene analysis and dereverberation. In our previous work, a single-channel blind T60 estimator was proposed employing spectral analysis in the modulation frequency domain. It was shown that the estimation accuracy is crucially affected by the window lengths used for transformation to the modulation domain. In this work, we propose the use of a sliding window length that is dynamically updated depending on the length of the detected decay region. Experimental results demonstrated that in the presence of noise, estimation accuracy was improved over our previous work for T60 up to 700 ms. When compared against two alternative algorithms from the literature, the proposed approach demonstrated higher accuracy for T60 between 500 ms and 1 s. Finally, the proposed approach was shown to be more computationally efficient compared to two of the three alternative algorithms. |
F Lim; M R P Thomas; I J Tashev Blur Kernel Estimation Approach to Blind Reverberation Time Estimation Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015. @inproceedings{Lim2015b, Reverberation time is an important parameter for characterizing acoustic environments. It is useful in many applications including acoustic scene analysis, robust automatic speech recognition and dereverberation. Given knowledge of the acoustic impulse response, reverberation time can be measured using Schroeder's backward integration method. Since it is not always practical to obtain impulse responses, blind estimation algorithms are sometimes desirable. In this work, the reverberation problem is viewed as an image blurring problem. The blur kernel is estimated through spectral analysis in the modulation domain and the T60 is subsequently estimated from the blur kernel's parameters. It is shown through experimental results that the proposed approach is able to improve robustness to higher T60s especially with increasing levels of additive noise up to an signal-to-noise ratio (SNR) of 10 dB. |
M R P Thomas; H Gamper; I J Tashev Dereverberation Sweet Spot Dilation with Combined Channel Equalization and Beamforming Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015. @inproceedings{Thomas2015, Beamforming and channel equalizers can be formulated as optimal multichannel filter-and-sum operations with different objective criteria. It has been shown in previous studies that the combination of both concepts under a common framework can yield results that combine both the spatial robustness of beamforming and the dereverberation performance of channel equalization. This paper introduces an additional method for leveraging both approaches that exploits channel estimates in a wanted spatial location and derives robustness from knowledge of the array geometry alone. Experiments with an objective assessment of speech quality as a function of source perturbation reveal that the proposed technique can be viewed as a sweet spot dilator when compared with the MINT channel equalizer. |
H Gamper; M R P Thomas; I J Tashev Estimation of Multipath Propagation Delays and Interaural Time Differences from 3-D Head Scans Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015. @inproceedings{Gamper2015, The estimation of acoustic propagation delays from a sound source to a listener's ear entrances is useful for understanding and visualising the wave propagation along the surface of the head, and necessary for individualised spatial sound rendering. The interaural time difference (ITD) is of particular research interest, as it constitutes one of the main localisation cues exploited by the human auditory system. Here, an approach is proposed that employs ray tracing on a 3-D head scan to estimate and visualise the propagation delays and ITDs from a sound source to a subject's ear entrances. Experimental results indicate that the proposed approach is computationally efficient, and performs equally well or better than optimally tuned parametric ITD models, with a mean absolute ITD estimation error of about 14μs. |
2014 |
M R P Thomas; F Lim; I J Tashev; P A Naylor Optimal Beamforming as a Time Domain Equalization Problem with Application to Room Acoustics Proceedings Article In: Proc. Intl. Workshop Acoust. Echo Noise Control (IWAENC), Juan Les Pins, France, 2014. @inproceedings{Thomas2014b, Signals captured by microphone arrays provide spatial diversity that can be exploited by multichannel processing algorithms to suppress noise and reverberation. Beamforming is a class of approaches that treats the problem with respect to the spatial location of wanted and competing sources, leveraging properties of propagation of waves in free space. A related class of algorithms is channel equalization that exploits knowledge of the acoustic impulse response between a source and microphones with a view to near-perfect dereverberation. Beamforming has been shown to be a very powerful and practical tool in a number of domains, whereas channel equalizers are notoriously sensitive to noise and channel mismatch leading to limited practical applicability. This paper investigates some of the common properties of these algorithms and presents a solution incorporating approaches from both disciplines. |
M R P Thomas; J Ahrens; I J Tashev A Method for Converting Between Cylindrical and Spherical Harmonic Representations of Sound Fields Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 2014. @inproceedings{Thomas2014a, Spherical microphone and circular microphone arrays are useful for sampling sound fields that may be resynthesized with loudspeaker arrays. Spherical microphone arrays are desirable because of their ability to capture three-dimensional sound fields, however it is often more practical to construct loudspeaker arrays in the form of a closed circle located in the horizontal plane. This leads to a spatial undersampling as such a circular sampling can only yield a perfect representation of a height-invariant sound field. This paper investigates the consequences of such spatial undersampling by converting between cylindrical and spherical harmonic decompositions of solutions to the wave equation. We show analytically and via numerical simulations that 1) the result of the spatial undersampling is a purely horizontally propagating sound field, and 2) the ratio of travelling and standing components in the undersampled sound field varies depending on the incidence colatitude. The conversion is also used in a beamforming scenario and shows that the beamformer response becomes increasingly omnidirectional as the source moves away from the horizontal plane. |
P Bilinski; J Ahrens; M R P Thomas; I J Tashev; J C Platt HRTF Magnitude Synthesis via Sparse Representation of Anthropometric Features Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 2014. @inproceedings{Bilinski2014, We propose a method for the synthesis of the magnitudes of Head-related Transfer Functions (HRTFs) using a sparse representation of anthropometric features. Our approach treats the HRTF synthesis problem as finding a sparse representation of the subject's anthropometric features w.r.t. the anthropometric features in the training set. The fundamental assumption is that the magnitudes of a given HRTF set can be described by the same sparse combination as the anthropometric data. Thus, we learn a sparse vector that represents the subject's anthropometric features as a linear superposition of the anthropometric features of a small subset of subjects from the training data. Then, we apply the same sparse vector directly on the HRTF tensor data. For evaluation purpose we use a new dataset, containing both anthropometric features and HRTFs. We compare the proposed sparse representation based approach with ridge regression and with the data of a manikin (which was designed based on average anthropometric data), and we simulate the best and the worst possible classifiers to select one of the HRTFs from the dataset. For instrumental evaluation we use log-spectral distortion. Experiments show that our sparse representation outperforms all other evaluated techniques, and that the synthesized HRTFs are almost as good as the best possible HRTF classifier. |
2013 |
J Ahrens; M R P Thomas; I J Tashev Efficient Implementation of the Spectral Division Method for Arbitrary Virtual Sound Fields Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, 2013. @inproceedings{Ahrens2013b, The Spectral Division Method is an analytic approach for sound field synthesis that determines the loudspeaker driving function in the wavenumber domain. Compact expressions for the driving function in time-frequency domain or in time domain can only be determined for a low number of special cases. Generally, the involved spatial Fourier transforms have to be evaluated numerically. We present a detailed description of the computational procedure and minimize the number of required computations by exploiting the following two aspects: 1) The interval for the spatial sampling of the virtual sound field can be selected for each time-frequency bin, whereby low time-frequency bins can be sampled more coarsely, and 2) the driving function only needs to be evaluated at the locations of the loudspeakers of a given array. The inverse spatial Fourier transform is therefore not required to be evaluated at all initial spatial sampling points but only at those locations that coincide with loudspeakers. |
F Lim; M R P Thomas; P A Naylor MINTFormer: A Spatially Aware Channel Equalizer Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, 2013. @inproceedings{Lim2013, Reverberation is a process that distorts a wanted signal and impairs perceived speech quality. In the context of multichannel dereverberation, channel-based methods and beamforming are two common approaches. Channel-based methods such as the multiple input/output inverse theorem (MINT) can provide perfect dereverberation provided the exact acoustic impulse responses (AIRs) are known. However, they have been shown to be very sensitive to AIR estimation errors for which several modifications have consequently been proposed. Conversely, beamformers are significantly more robust but provide comparatively modest dereverberation. While the two approaches are conventionally considered independent, both can be formulated as a filter-and-sum operation with differing filter design criteria. We propose a unified framework, termed MINT-Forming, that exploits this similarity and introduces a mixing parameter to control the tradeoff between the potential performance of MINT and the robustness of beamforming. Empirical results show that the mixing parameter is a monotonic function of channel estimation error, whereby a MINT solution is preferred when channel estimation error is low. |
J Ahrens; M R P Thomas; I J Tashev Gentle Acoustic Crosstalk Cancelation Using the Spectral Division Method and Ambiophonics Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, 2013. @inproceedings{Ahrens2013a, We propose the concept of gentle acoustic crosstalk cancellation, which aims at reducing the crosstalk between a loudspeaker and the listener’s contralateral ear instead of eliminating it completely as aggressive methods intend to do. The expected benefit is higher robustness and a tendency to collapse less unpleasantly. The proposed method employs a linear loudspeaker array and exhibits two stages: 1) Use the Spectral Division Method to illuminate the ipsilateral ear using constructive interference of the loudspeaker signals. This approach provides only little channel separation between the listener’s ears at frequencies below approximately 2000 Hz. 2) There we additionally use destructive interference by Recursive Ambiophonics Crosstalk Elimination (RACE). RACE was chosen because of its tendency to collapse gently. In a sample scenario with realistic parameters, the proposed method achieves around 20 dB of channel separation between 700 Hz and 9000 Hz, which appears to be sufficient to achieve full perceived lateralization when only one ear is illuminated. |
2012 |
D P Jarrett; E A P Habets; M R P Thomas; P A Naylor Rigid sphere room impulse response simulation: algorithm and applications Journal Article In: J. Acoust. Soc. Am., vol. 132, no. 3, pp. 1462–1472, 2012. @article{Jarrett2012, Simulated room impulse responses have been proven to be both useful and indispensable for comprehensive testing of acoustic signal processing algorithms while controlling parameters such as the reverberation time, room dimensions, and source–array distance. In this work, a method is proposed for simulating the room impulse responses between a sound source and the microphones positioned on a spherical array. The method takes into account specular reflections of the source by employing the well-known image method, and scattering from the rigid sphere by employing spherical harmonic decomposition. Pseudocode for the proposed method is provided, taking into account various optimizations to reduce the computational complexity. The magnitude and phase errors that result from the finite order spherical harmonic decomposition are analyzed and general guidelines for the order selection are provided. Three examples are presented: an analysis of a diffuse reverberant sound field, a study of binaural cues in the presence of reverberation, and an illustration of the algorithm’s use as a mouth simulator. |
M R P Thomas; J Ahrens; I Tashev Optimal 3Đ Beamforming Using Measured Microphone Directivity Patterns Proceedings Article In: Proc. Intl. Workshop Acoust. Signal Enhancement, Aachen, Germany, 2012. @inproceedings{Thomas2012b, The design of time-invariant beamformers is often posed as an optimization problem using practical design constraints. In many scenarios it is sufficient to assume that the microphones have an omnidirectional directivity pattern, a flat frequency response in the range of interest, and a 2D environment in which wavefronts propagate as a function of azimuth angle only. In this paper we consider a generalized solution for those cases in which one or more of these assumptions do not hold, yielding a beamformer that is optimized on measured directivity patterns as a function of azimuth, elevation and frequency. A comparative study is made with the 4-element cardioid microphone array employed in Microsoft Kinect for Windows, whose beamformer weights are calculated with directivity patterns using (a) 2D cardioid models, (b) 3D cardioid models and (c) 3D measurements. Results on a recorded noisy speech corpus show similar PESQ and speech recognition accuracy comparing (a) and (b), but a 50% relative improvement in word error rate using measured directivity patterns. |
M R P Thomas; N D Gaubitch; E A P Habets; P A Naylor An Insight into Common Filtering in Noisy SIMO Blind System Identification Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Kyoto, Japan, 2012. @inproceedings{Thomas2012c, The effect of additive sensor noise on single-input-multiple-output (SIMO) blind system identification (BSI) algorithms based upon cross-relation (CR) error is investigated. Previous studies have shown that additive noise in the observed signal results in systems comprising the true estimated channels convolved with an erroneous `common filter', and additionally that identification and removal of this filter significantly improves estimation error. However, the source of the common filter remained an open question. This paper explains the common filter through a first-order perturbation analysis of the CR matrix, showing that it be estimated from the perturbation and the eigenvectors of the noiseless CR matrix. The analysis given in this paper provides a new insight into the effect of noise on SIMO BSI algorithms and forms the first step towards an overall noise robust solution. |
J Ahrens; M R P Thomas; I J Tashev HRTF Modeling Using a Non-Regularized Least-Squares Fit of Spherical Harmonics Coefficients Proceedings Article In: Proc. Asia-Pacific Signal and Information Process. Assoc. Annu. Summit, Aachen, Germany, 2012. @inproceedings{Ahrens2012, Head-related transfer functions (HRTFs) represent the acoustic transfer function from a sound source at a given location to the ear drums of a human. They are typically measured from discrete source positions at a constant distance. Spherical harmonics decompositions have been shown to provide a flexible representation of HRTFs. Practical constraints often prevent the retrieval of measurement data from certain directions, a circumstance that complicates the decomposition of the measured data into spherical harmonics. A least-squares fit of coefficients is a potential approach to determining the coefficients of incomplete data. However, a straightforward non-regularized fit tends to give unrealistic estimates for the region were no measurement data is available. Recently, a regularized least-squares fit was proposed, which yields well-behaved results for the unknown region at the expense of reducing the accuracy of the data representation in the known region. In this paper, we propose using a lower-order non-regularized least-squares fit to achieve a well-behaved estimation of the unknown data. This data then allows for a high-order non-regularized least-squares fit over the entire sphere. We compare the properties of all three approaches applied to modeling the magnitudes of the HRTFs measured from a manikin. The proposed approach reduces the normalized mean-square error by approximately 7 dB in the known region and 11 dB in the unknown region compared to the regularized fit. |
F Antonacci; J Filos; E A P Habets; A Sarti; P A Naylor; S Tubaro Inference of Room Geometry from Acoustic Impulse Responses Journal Article In: IEEE Trans. Audio, Speech, Lang. Process., 2012. @article{Antonacci2012, Acoustic scene reconstruction is a process that aims to infer characteristics of the environment from acoustic measurements. We investigate the problem of locating planar reflectors in rooms, such as walls and furniture, from signals obtained using distributed microphones. Specifically, localization of multiple two- dimensional (2-D) reflectors is achieved by estimation of the time of arrival (TOA) of reflected signals by analysis of acoustic impulse responses (AIRs). The estimated TOAs are converted into elliptical constraints about the location of the line reflector, which is then localized by combining multiple constraints. When multiple walls are present in the acoustic scene, an ambiguity problem arises, which we show can be addressed using the Hough transform. Additionally, the Hough transform significantly improves the robustness of the estimation for noisy measurements. The proposed approach is evaluated using simulated rooms under a variety of different controlled conditions where the floor and ceiling are perfectly absorbing. Results using AIRs measured in a real environment are also given. Additionally, results showing the robustness to additive noise in the TOA information are presented, with particular reference to the improvement achieved through the use of the Hough transform. |
T Drugman; M R P Thomas; J Gudnason; T Dutoit; P A Naylor Detection of Glottal Closing Instants from Voiced Speech: A Quantitative Review Journal Article In: IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 3, pp. 994–1006, 2012. @article{Drugman2012, The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the glottal closure instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art GCI detection algorithms are compared using six different databases with contemporaneous electroglottographic recordings as ground truth, and containing many hours of speech by multiple speakers. The five techniques compared are the Hilbert Envelope-based detection (HE), the Zero Frequency Resonator-based method (ZFR), the Dynamic Programming Phase Slope Algorithm (DYPSA), the Speech Event Detection using the Residual Excitation And a Mean-based Signal (SEDREAMS) and the Yet Another GCI Algorithm (YAGA). The efficacy of these methods is first evaluated on clean speech, both in terms of reliabililty and accuracy. Their robustness to additive noise and to reverberation is also assessed. A further contribution of the paper is the evaluation of their performance on a concrete application of speech processing: the causal-anticausal decomposition of speech. It is shown that for clean speech, SEDREAMS and YAGA are the best performing techniques, both in terms of identification rate and accuracy. ZFR and SEDREAMS also show a superior robustness to additive noise and reverberation. |
M R P Thomas; J Ahrens; I Tashev Beamformer Design Using Measured Microphone Directivity Patterns: Robustness to Modelling Error Proceedings Article In: Proc. Asia-Pacific Signal and Information Process. Assoc. Annu. Summit, San Fransisco, USA, 2012. @inproceedings{Thomas2012c, Abstract: The design process for time-invariant acoustic beamformers often assumes that the microphones have an omni-directional directivity pattern, a flat frequency response in the range of interest, and a 2D environment in which wavefronts propagate as a function of azimuth angle only. In this paper we investigate those cases in which one or more of these assumptions do not hold, considering a Minimum Variance Distortionless Response (MVDR)-based solution that is optimized using measured directivity patterns as a function of azimuth, elevation and frequency. Robustness to modelling error is controlled by a regularization parameter that produces a suboptimal but more robust solution. A comparative study is made with the 4-element cardioid microphone array employed in Microsoft Kinect for Windows, whose beamformer weights are calculated with directivity patterns using (a) 2D cardioid models, (b) 3D cardioid models and (c) 3D measurements. Speech recognition and PESQ results are used as evaluation criteria with a noisy speech corpus, revealing empirically optimal regularization parameters for each case and up to a 70% relative improvement in word error rate comparing (a) and (c). |
M R P Thomas; J Gudnason; P A Naylor Estimation of Glottal Closing and Opening Instants in Voiced Speech using the YAGA Algorithm Journal Article In: IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 82–91, 2012. @article{Thomas2012a, Accurate estimation of glottal closing instants (GCIs) and opening instants (GOIs) is important for speech processing applications that benefit from glottal-synchronous processing including pitch tracking, prosodic speech modification, speech dereverberation, synthesis and study of pathological voice. We propose the Yet Another GCI/GOI Algorithm (YAGA) to detect GCIs from speech signals by employing multiscale analysis, the group delay function, and N-best dynamic programming. A novel GOI detector based upon the consistency of the candidates' closed quotients relative to the estimated GCIs is also presented. Particular attention is paid to the precise definition of the glottal closed phase, which we define as the analysis interval that produces minimum deviation from an all-pole model of the speech signal with closed-phase linear prediction (LP). A reference algorithm analyzing both electroglottograph (EGG) and speech signals is described for evaluation of the proposed speech-based algorithm. In addition to the development of a GCI/GOI detector, an important outcome of this work is in demonstrating that GOIs derived from the EGG signal are not necessarily well-suited to closed-phase LP analysis. Evaluation of YAGA against the APLAWD and SAM databases show that GCI identification rates of up to 99.3% can be achieved with an accuracy of 0.3 ms and GOI detection can be achieved equally reliably with an accuracy of 0.5 ms. |
2011 |
P Annibale; F Antonacci; P Bestagini; A Brutti; A Canclini; L Cristoforetti; E A P Habets; J Filos; W Kellermann; K Kowalczyk; A Lombard; E Mabande; D Markovic; P A Naylor; M Omologo; R Rabenstein; Sarti A P Svaizer; M R P Thomas The SCENIC Project: Space-Time Audio Processing for Environment-Aware Acoustic Sensing and Rendering Proceedings Article In: Proc. Audio Eng. Soc. Convention, New York, 2011. @inproceedings{Sarti2011, SCENIC is an EC-funded project aimed at developing a harmonized corpus of methodologies for environment-aware acoustic sensing and rendering. The project focusses on space-time acoustic processing solutions that do not just accommodate the environment in the modeling process but that make the environment help towards achieving the goal at hand. The solutions developed within this project cover a wide range of applications, including acoustic self-calibration, aimed at estimating the parameters of the acoustic system; environment inference, aimed at identifying and characterizing all the relevant acoustic reflectors in the environment. The information gathered through such steps is then used to boost the performance of wavefield rendering methods as well as source localization/characterization/extraction in reverberant environments. |
A Canclini; F Antonacci; M R P Thomas; J Filos; A Sarti; P A Naylor; S Tubaro Exact Localization of Acoustic Reflectors from Quadratic Constraints Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, 2011. @inproceedings{Canclini2011, In this paper we discuss a method for localizing acoustic reflectors in space based on acoustic measurements on source-to-microphone reflective paths. The method converts Time of Arrival (TOA) and Time Difference of Arrival (TDOA) into quadratic constraints on the line corresponding to the reflector. In order to be robust against measurement errors we derive an exact solution for the minimization of a cost function that combines an arbitrary number of quadratic constraints. Moreover we propose a new method for the analytic prediction of reflector localization accuracy. This method is sufficiently general to be applicable to a wide range of estimation problems. |
M R P Thomas; N D Gaubitch; P A Naylor Application of Channel Shortening to Acoustic Channel Equalization in the Presence of Noise and Estimation Error Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, USA, 2011. @inproceedings{Thomas2011, The inverse-filtering of acoustic impulse responses (AIRs) can be achieved with existing methods provided a good estimate of the channel is available and the observed signals contain little or no noise. Such assumptions are not generally valid in practical scenarios, leading to much interest in the issue of robustness. In particular, channel shortening (CS) techniques have been shown to be more robust to channel estimation error than existing approaches. In this paper we investigate CS using the relaxed multichannel least-squares (RMCLS) algorithm in the presence of both channel error and additive noise. It is shown quantitatively that shortening the acoustic channel to a few ms duration is more robust than attempting to equalize the channel fully, giving better resultant sound quality for dereverberation. A key point of this paper is to provide an explanation for this added robustness in terms of the equalization filter gain. We provide simulation results and results for practical settings using speech recordings and room impulse response measurements from a real acoustic environment. |
D P Jarrett; E A P Habets; M R P Thomas; P A Naylor Simulating Room Impulse Responses for Spherical Microphone Arrays Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2011. @inproceedings{Jarrett2011b, A method is proposed for simulating the sound pressure signals on a spherical microphone array in a reverberant enclosure. The method employs spherical harmonic decomposition and takes into account scattering from a solid sphere. An analysis shows that the error in the decomposition can be made arbitrarily small given a sufficient number of spherical harmonics. |
J Filos; A Canclini; M R P Thomas; F Antonacci; A Sarti; P A Naylor Robust Inference of Room Geometry from Acoustic Impulse Responses Proceedings Article In: Proc. European Signal Processing Conf. (EUSIPCO), Barcelona, Spain, 2011. @inproceedings{Filos2011, The problem of localizing reflective boundaries in an acoustic environment from acoustic measurements is considered. Specifically, localization of multiple two-dimensional (2-D) line reflectors is achieved by estimation of the time of arrival (TOA) of reflected signals by analysis of acoustic impulse responses (AIRs). The estimated TOAs are used in conjunction with the source and receiver locations to find the loci of solutions whose common tangents correspond to the location of a reflector. The solution to the common tangent estimation is a nonlinear and non-convex problem that can yield local sub-optimal solutions using existing approaches. We therefore propose an analytic method, based on a closed-form estimator, that is guaranteed to converge to the global minimum in an error-free scenario. We further improve the robustness of the approach when errors are introduced in the estimated TOAs by using the Hough transform to find the optimal solution. The proposed approach is evaluated through Monte Carlo runs, using simulated rooms, that demonstrate the feasibility of the proposed approach. |
J Gudnason; M R P Thomas; D P W Ellis; P A Naylor Data-Driven Voice Source Waveform Analysis and Synthesis Journal Article In: Speech Communication, vol. 54, no. 2, pp. 199–211, 2011. @article{Gudnason2011, A data-driven approach is introduced for studying, analyzing and processing the voice source signal. Existing approaches parameterize the voice source signal by using models that are motivated, for example, by a physical model or function-fitting. Such parameterization is often difficult to achieve and it produces a poor approximation to a large variety of real voice source waveforms of the human voice. This paper presents a novel data-driven approach to analyze different types of voice source waveforms using principal component analysis and Gaussian mixture modeling. This approach models certain voice source features that many other approaches fail to model. Prototype voice source waveforms are obtained from each mixture component and analyzed with respect to speaker, phone and pitch. An analysis/synthesis scheme was set up to demonstrate the effectiveness of the method. Compression of the proposed voice source by discarding 75% of the features yields a segmental signal-to-reconstruction error ratio of 13 dB and a Bark spectral distortion of 0.14. |
D P Jarrett; E A P Habets; M R P Thomas; N D Gaubitch; P A Naylor Dereverberation Performance of Rigid and Open Spherical Microphone Arrays: Theory & Simulation Proceedings Article In: Proc. Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA), Edinburgh, UK, 2011. @inproceedings{Jarrett2011b, Linear microphone arrays have been extensively used for dereverberation. In this paper we look at the dereverberation performance of two types of spherical microphone array: the open array (microphones suspended in free space) and the rigid array (microphones mounted on a rigid baffle). Dereverberation is performed in the spherical harmonic domain using a technique similar to the commonly used delay-and-sum beamformer (DSB). We analyse the theoretical performance with respect to the direct-to-reverberant ratio (DRR), and we also present simulation results obtained using a simulation tool for spherical arrays. The performance of the spherical DSB is found to increase with the radius of the sphere, and to be 1-2 dB higher for the rigid array. These results serve as a baseline for evaluating the performance of future dereverberation algorithms for spherical arrays. |
2010 |
M R P Thomas; N D Gaubitch; E A P Habets; P A Naylor Supervised Identification and Removal of Common Filter Components in Adaptive Blind SIMO System Identification Proceedings Article In: Proc. Intl. Workshop Acoust. Echo Noise Control (IWAENC), Tel Aviv, Israel, 2010. @inproceedings{Thomas2010b, Adaptive blind system identification with LMS-type algorithms is prone to misconvergence in the presence of noise. In this paper we consider the hypothesis that such misconvergence is due to the introduction of a common filter to the estimated impulse respones. A technique is presented for identifying and removing the common filter using prior knowledge of the true channels. Experimental results with this approach show an improved rate of convergence and reduced system error. Furthermore, misconvergent behaviour is no longer observed, offering a plausible explanation as to the source of misconvergence in adaptive blind system identification. |
M R P Thomas Glottal-Synchronous Speech Processing PhD Thesis Imperial College London, 2010. @phdthesis{Thomas2010c, Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speech. |
Publications
2023 |
Improved Panning on Non-Equidistant Loudspeakers with Direct Sound Level Compensation Conference Proc. Audio Engineering Soc. Conv., New York, USA, 2023. |
Inverted Cardioid Topology for Multi-Radius Spherical Microphone Arrays Conference Proc. Workshop on App. of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, New York, USA, 2023. |
2022 |
Methods and devices for coding soundfield representation signals Patent 2022. |
Methods, apparatus and systems for encoding and decoding of directional sound sources Patent 2022. |
2021 |
Methods and devices for bass management Patent 17/286,313, 2021. |
Reducing Modal Error Propagation through Correcting Mismatched Microphone Gains Using Rapid Proceedings Article In: Proc. International Conf. Acoustics, Speech, and Signal Process. (ICASSP) , 2021. |
2020 |
A Practical Approach for Microphone Array Calibration in Augmented and Virtual Reality Applications Proceedings Article In: Proc. International Conf. on 3D Immersion (IC3D), 2020. |
Methods, apparatus and systems for audio sound field capture Patent US 10,721,559, 2020. |
A Novel Method for Obtaining Diffuse Field Measurements for Microphone Calibration Proceedings Article In: Proc. International Conf. Acoustics, Speech, and Signal Process. (ICASSP), Barcelona, Spain, 2020. |
2019 |
HRTF personalization based on anthropometric features Patent 9,900,722, 2019. |
HRTF Personalization Based on Anthropometric Features Patent 10,313,818, 2019. |
Practical Concentric Open Sphere Cardioid Microphone Array Design for Higher Order Sound Field Capture Proceedings Article In: Proc. {IEEE} Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019. |
HRTF personalization based on anthropometric features Patent 10,284,992, 2019. |
Systems and Methods for Non-Parametric Processing of Head Geometry for HRTF Personalization Patent US 10,278,002, 2019. |
2018 |
Amplitude Panning and the Interior Pan Proceedings Article In: Proc. Tonmeistertagung, Cologne, Germany, 2018. |
10,021,508, 2018. |
2017 |
Amplitude Panning and the Interior Pan Proceedings Article In: Proc. Audio Eng. Soc. Convention, New York, USA, 2017. |
Fast Computation of Cubature Formulae for the Sphere Proceedings Article In: Proc. Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA), San Francisco, CA, 2017. |
Systems and Methods for Audio Creation and Delivery Patent US 9,609,436, 2017. |
2016 |
Synthess of Device-Independent Noise Corpora for Realistic ASR Evaluation Proceedings Article In: Proc. Interspeech Conf., San Francisco, CA, 2016. |
Application of 3D Spherical Transforms To Personalization Of Head-Related Transfer Functions Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016. |
BFGUI: An Interactive Tool for the Synthesis and Analysis of Microphone Array Beamformers Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016. |
2015 |
Anthropometric parameterisation of a spherical scatterer ITD model with arbitrary ear angles Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, 2015. |
Acoustic Blur Kernel with Sliding Window for Blind Estimation of Reverberation Time Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, 2015. |
Blur Kernel Estimation Approach to Blind Reverberation Time Estimation Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015. |
Dereverberation Sweet Spot Dilation with Combined Channel Equalization and Beamforming Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015. |
Estimation of Multipath Propagation Delays and Interaural Time Differences from 3-D Head Scans Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015. |
2014 |
Optimal Beamforming as a Time Domain Equalization Problem with Application to Room Acoustics Proceedings Article In: Proc. Intl. Workshop Acoust. Echo Noise Control (IWAENC), Juan Les Pins, France, 2014. |
A Method for Converting Between Cylindrical and Spherical Harmonic Representations of Sound Fields Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 2014. |
HRTF Magnitude Synthesis via Sparse Representation of Anthropometric Features Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 2014. |
2013 |
Efficient Implementation of the Spectral Division Method for Arbitrary Virtual Sound Fields Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, 2013. |
MINTFormer: A Spatially Aware Channel Equalizer Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, 2013. |
Gentle Acoustic Crosstalk Cancelation Using the Spectral Division Method and Ambiophonics Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, 2013. |
2012 |
Rigid sphere room impulse response simulation: algorithm and applications Journal Article In: J. Acoust. Soc. Am., vol. 132, no. 3, pp. 1462–1472, 2012. |
Optimal 3Đ Beamforming Using Measured Microphone Directivity Patterns Proceedings Article In: Proc. Intl. Workshop Acoust. Signal Enhancement, Aachen, Germany, 2012. |
An Insight into Common Filtering in Noisy SIMO Blind System Identification Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Kyoto, Japan, 2012. |
HRTF Modeling Using a Non-Regularized Least-Squares Fit of Spherical Harmonics Coefficients Proceedings Article In: Proc. Asia-Pacific Signal and Information Process. Assoc. Annu. Summit, Aachen, Germany, 2012. |
Inference of Room Geometry from Acoustic Impulse Responses Journal Article In: IEEE Trans. Audio, Speech, Lang. Process., 2012. |
Detection of Glottal Closing Instants from Voiced Speech: A Quantitative Review Journal Article In: IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 3, pp. 994–1006, 2012. |
Beamformer Design Using Measured Microphone Directivity Patterns: Robustness to Modelling Error Proceedings Article In: Proc. Asia-Pacific Signal and Information Process. Assoc. Annu. Summit, San Fransisco, USA, 2012. |
Estimation of Glottal Closing and Opening Instants in Voiced Speech using the YAGA Algorithm Journal Article In: IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 82–91, 2012. |
2011 |
The SCENIC Project: Space-Time Audio Processing for Environment-Aware Acoustic Sensing and Rendering Proceedings Article In: Proc. Audio Eng. Soc. Convention, New York, 2011. |
Exact Localization of Acoustic Reflectors from Quadratic Constraints Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, 2011. |
Application of Channel Shortening to Acoustic Channel Equalization in the Presence of Noise and Estimation Error Proceedings Article In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, USA, 2011. |
Simulating Room Impulse Responses for Spherical Microphone Arrays Proceedings Article In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2011. |
Robust Inference of Room Geometry from Acoustic Impulse Responses Proceedings Article In: Proc. European Signal Processing Conf. (EUSIPCO), Barcelona, Spain, 2011. |
Data-Driven Voice Source Waveform Analysis and Synthesis Journal Article In: Speech Communication, vol. 54, no. 2, pp. 199–211, 2011. |
Dereverberation Performance of Rigid and Open Spherical Microphone Arrays: Theory & Simulation Proceedings Article In: Proc. Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA), Edinburgh, UK, 2011. |
2010 |
Supervised Identification and Removal of Common Filter Components in Adaptive Blind SIMO System Identification Proceedings Article In: Proc. Intl. Workshop Acoust. Echo Noise Control (IWAENC), Tel Aviv, Israel, 2010. |
Glottal-Synchronous Speech Processing PhD Thesis Imperial College London, 2010. |