ELO-SPHERES Project
Description - Summary - Publications - Tools and Datasets
Description
Environment and Listener Optimised Speech Processing for Hearing Enhancement in Real Situations (ELO-SPHERES) was a project funded by the EPSRC at University College London and Imperial College London between September 2019 and March 2023.
Principal Investigators were Mark Huckvale, Patrick Naylor and Stuart Rosen. Research Fellows include Tim Green and Gaston Hilkhuysen at UCL, and Alastair Moore, Sina Hafezi, Xue Wang, Rebecca Vos and Pierre Guiraud at IC.
Lay Summary
Although modern hearing aids offer the potential to exploit advanced signal processing techniques, the experience and capabilities of hearing impaired listeners are still unsatisfactory in many everyday listening situations. In part this is because hearing aids reduce or remove subtle differences in the signals received at the two ears. The normally-hearing auditory system uses such differences to determine the location of sound sources in the environment, separate wanted from unwanted sounds and allow attention to be focused on a particular talker in a noisy, multi-talker environment.
True binaural hearing aids - in which the sound processing that takes place in the left and right ears is coordinated rather than independent - are just becoming available. However practitioners' knowledge of how best to match their potential to the requirements of impaired listeners and listening situations is still very limited. One significant problem is that typical existing listening tests do not reflect the complexity of real-world listening situations, in which there may be many sound sources which may move around and listeners move their heads and also use visual information. A second important issue is that HI listeners vary widely in their underlying spatial hearing abilities.
This project aims to understand better the problems of hearing impaired listeners in noisy, multiple-talker conversations, particularly with regard to (i) their abilities to attend to and recognise speech coming from different directions while listening through binaural aids, (ii) their use of audio-visual cues. We will develop new techniques for coordinated processing of the signals arriving at the different ears that will allow identification of the locations and characteristics of different sound sources in complex environments and tailor the information presented to match the individual listener's pattern of hearing loss. We will build virtual reality simulations of complex listening environments and develop audio-visual tests to assess the abilities of listeners. We will investigate how the abilities of hearing-impaired listeners vary with their degree of impairment and the complexity of the environment.
This research project is a timely and focussed addition to knowledge and techniques to realise the potential of binaural hearing aids. Its outcomes will provide solutions to some key problems faced by hearing aid users in noisy, multiple-talker situations.
Publications
2020
- Moore A H, Rebecca R. Vos, Patrick A. Naylor, Mike Brookes. (2020). Evaluation of the performance of a model-based adaptive beamformer. Speech in Noise Workshop.
2021
- D'Olne E, Moore A, Naylor P. (2021). Model-based Beamforming for Wearable Microphone Arrays. EUSIPCO-21 doi: 10.23919/eusipco54536.2021.9616252
- Hafezi S, Moore AH, Naylor PA. (2021). Narrowband multi-source direction-of-arrival estimation in the spherical harmonic domain. The Journal of the Acoustical Society of America, 149(4), pp. 2292. doi: 10.1121/10.0004214
- Hogg A, Neo V, Weiss S, Evers C, Naylor P. (2021). A Polynomial Eigenvalue Decomposition Music Approach for Broadband Sound Source Localization. WASPAA-21 doi: 10.1109/waspaa52581.2021.9632789
- Jones D, Sharma D, Kruchinin S, Naylor P. (2021). Spatial Coding for Microphone Arrays Using Ipnlms-Based RTF Estimation. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics doi: 10.1109/waspaa52581.2021.9632747
- Moore, A, Sina Hafezi, Rebecca Vos, Mike Brookes, Patrick A. Naylor, Mark Huckvale, ... Gaston Hilkhuysen. (2021). A binaural MVDR beamformer for the 2021 Clarity Enhancement Challenge: ELO-SPHERES consortium system description. The Clarity Workshop on Machine Learning Challenges for Hearing Aids (Clarity-2021).
- Moore A, Vos R, Naylor P, Brookes M. (2021). Processing Pipelines for Efficient, Physically-Accurate Simulation of Microphone Array Signals in Dynamic Sound Scenes. ICASSP-21 doi: 10.1109/icassp39728.2021.9413354
- Neo V, Evers C, Naylor P. (2021). Enhancement of Noisy Reverberant Speech Using Polynomial Matrix Eigenvalue Decomposition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, doi: 10.1109/taslp.2021.3120630
- Neo V, Evers C, Naylor P. (2021). Polynomial Matrix Eigenvalue Decomposition-Based Source Separation Using Informed Spherical Microphone Arrays. WASPAA-21 doi: 10.1109/waspaa52581.2021.9632722
- Neo V, Evers C, Naylor P. (2021). Speech Dereverberation Performance of a Polynomial-EVD Subspace Approach. EUSIPCO-21 doi: 10.23919/eusipco47968.2020.9287869
- Neo V, Evers C, Naylor P. (2021). Polynomial Matrix Eigenvalue Decomposition of Spherical Harmonics for Speech Enhancement. ICASSP-21 doi: 10.1109/icassp39728.2021.9414011
- Xue W, Moore A, Brookes M, Naylor P. (2021). Speech Enhancement Based on Modulation-Domain Parametric Multichannel Kalman Filtering. IEEE/ACM Transactions on Audio, Speech, and Language Processing, doi: 10.1109/taslp.2020.3040850
2022
- Green T, Hilkhuysen G, Huckvale M, Rosen S, Brookes M, Moore A, ... Xue W. (2022). Speech recognition with a hearing-aid processing scheme combining beamforming with mask-informed speech enhancement. Trends in hearing, 26, pp. 23312165211068629. doi: 10.1177/23312165211068629
- Guiraud P, Moore A, Vos R, Naylor P, Brookes M. (2022). Machine Learning for Parameter Estimation in the MBSTOI Binaural Intelligibility Metric. doi: 10.1109/iwaenc53105.2022.9914725
- Hilkhuysen G, Green T, Rosen S, Huckvale M. (2022). Remote Audiological assessment: The Hearing-Aid Listening Test. IHCON 2022 International Hearing Aid Research Conference.
- Hilkhuysen G, Green T, Rosen S, Huckvale M. (2022). Evaluating the Intelligibility of Hearing-Aids in Realistic scenes: The Hearing-Aid Listening Test. IHCON 2022 International Hearing Aid Research Conference.
- Hilkhuysen G, Green T, Rosen S, Huckvale M. (2022). Spatial release of masking in daily life while using commercial hearing aids. 3rd Joint Conference on Binaural and Spatial Hearing.
- Huckvale MA, Hilkhuysen G. (2022). ELO-SPHERES intelligibility prediction model for the Clarity Prediction Challenge 2022. Interspeech 2022.
- Jones D, Sharma D, Kruchinin S, Naylor P. (2022). Microphone Array Coding Preserving Spatial Information for Cloud-based Multichannel Speech Recognition. doi: 10.23919/eusipco55093.2022.9909679
- Moore AH, Green T, Brookes D.M., Naylor P.A. (2022). Measuring audio-visual speech intelligibility under dynamic listening conditions using virtual reality. AES 2022 International Audio for Virtual and Augmented Reality Conference (August 2022).
- Moore AH, Green T, Naylor P, Brookes M. (2022). SEAT: A new platform for audiovisual speech intelligibility tests in virtual reality. Hearing, Audio and Audiology Sciences Meeting, Southampton.
- Moore A, Hafezi S, Vos R, Naylor P, Brookes M. (2022). A Compact Noise Covariance Matrix Model for MVDR Beamforming. IEEE/ACM Transactions on Audio, Speech, and Language Processing, doi: 10.1109/taslp.2022.3180671
- Vos R.R., Moore A. H., Guiraud P., Naylor P.A., Brookes M. (2022). Using the Compact Model of the Noise Covariance For Acoustic Scene Analysis. UK Hearing Audiology and Sciences Meeting.
2023
- Guiraud P., Moore A, Vos R, Naylor P, Brookes M. (2023). The MBSTOI Binaural Intelligibility Metric using a Close-Talking Microphone Reference. ICASSP 2023.
- Hilkhuysen G, Green T, Rosen S, Huckvale M. (2023). Evaluating the intelligibility of hearing-aids in realistic scenes: The Hearing-Aid Listening Test. Speech In Noise Workshop (SPIN2023).
- Huckvale MA, Hilkhuysen G, Green T. (2023). Video database for intelligibility testing in Virtual Reality. Speech In Noise Workshop (SPIN2023).
Tools and Data sets
HearVR - Virtual Reality Video database for audiovisual speech intelligibility assessment
British English matrix sentences were recorded in our anechoic chamber against a green screen, using a 360° camera and a high-quality condenser microphone. These sentences contain 5 slots with 10 choices in each slot. Different sets of 200 sentences were recorded by 5 male and 5 female talkers sitting at a table. The individual talkers have been cropped from the videos and the green screen and table replaced by a transparent background to allow compositing of speakers in new 360° scenes. The individual matrix sentences have been extracted as 11s videos together with level-normalised monophonic audio. We have developed scripts that build multi-talker videos from these elements in a format that can be played on VR Headsets such as the Oculus 2. We have also developed a Unity application for the headsets, which will play the 360° composite videos and create spatialised sound sources for the talkers together with background noise delivered from multiple virtual loudspeakers.
Materials used in "Measuring audio-visual speech intelligibility under dynamic listening conditions using virtual reality"
The materials in this record are the audio and video files, together with various configuration files, used by the "SEAT" software in the study described in Moore, Green, Brookes & Naylor (2022) "Measuring audio-visual speech intelligibility under dynamic listening conditions using virtual reality" They are shared in this form so that the experiment may be reproduced. For any other use please contact the authors to obtain the original database(s) from which these materials are derived.
The ELOSPHERES binaural room impulse response database (eBrIRD) is a resource for generating audio for binaural hearing-aid (HA) experiments. It allows testing the performance of new and existing audio processing algorithms under ideal as well as real-life auditory scenes in different environments: an anechoic chamber, a restaurant, a kitchen and a car cabin. The database consists of a collection of binaural room impulse responses (BRIRs) measured with six microphones. Two microphones represent the listener's eardrums. Four microphones are located at the front and back of two behind-the-ear hearing aids placed over the listener's pinnae. The database allows simulations of head movement in the transverse plane.
eBrIRD - ELOSPHERES binaural room impulse response database
1>