I am an incoming Senior Researcher at Microsoft Research (MSR) in Vancouver, Canada.
I just graduated with DPhil (aka Ph.D.) in Computer Science Department (opens new window), University of Oxford (opens new window), and a member of St Hugh's (opens new window) college, where I was co-advised by Prof. Andrew Markham (opens new window) and Prof. Niki Trigoni (opens new window). I interned at Mitsubishi Electric Research Laboratories (MERL (opens new window)) from June 2023 to Dec. 2023, and Microsoft Applied Sciences Group in Munich, Germany (Microsoft (opens new window)) from May 2024 to August 2024. I received B.Eng. from Wuhan University (opens new window), China.
My research interest lies in Multimodal Embodied AI; Audio-visual Multimodal Learning; 3D Multimodal AR/VR; Signal Processing (Fourier/Wavelet Transform) inspired Deep Learning; Physics-informed Deep Learning, etc.
Drop me an email (yuhang.he[at]cs.ox.ac.uk) if you want to contact. I write blogs as part of my research notes, you are welcome to support a cup of coffee (opens new window) if you find them helpful.
For full publication list, please refer to Google Scholar (opens new window) or → Full list
SPEAR: Receiver-to-Receiver Acoustic Neural Warping Field
Yuhang He, Shitong Xu, Jia-Xing Zhong, Sangyun Shin, Niki Trigoni, Andrew Markham.
We propose a receiver-to-receiver neural spatial acoustic effects prediction for an arbitrary target position from a reference position. It requires neither sound source position nor room acoustic properties.
Deep Neural Room Acoustics Primitive
Yuhang He, Anoop Cherian, Gordon Wichern, Andrew Markham.
The 41st International Conference on Machine Learning (ICML), 2024.
We introduce a novel framework to learn a continuous neural room acoustics field that implicitly encodes all essential sound propagation primitives for each enclosed 3D space.
SoundCount: Sound Counting from Raw Audio with Dyadic Decomposition Neural Network
Yuhang He, Zhuangzhuang Dai, Long Chen, Niki Trigoni, Andrew Markham.
The 38th Annual AAAI Conference on Artificial Intelligence (AAAI), 2024.
We introduce a learnable dyadic decomposition framework that learns more representative time-frequency representation from highly polyphonic and loudness varying sound waveform. It dyadically decomposes the waveform in multi-stage hierarchical manner.
Sound3DVDet: 3D Sound Source Detection Using Multiview Microphone Array and RGB Images
Yuhang He, Sangyun Shin, Anoop Cherian, Niki Trigoni, Andrew Markham.
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024.
We introduce a novel 3D sound source localization and classification from multiview acoustic-camera recordings task. The sound source lies on object's physical surface but visually non-observable, which reflects some application cases like gas leaking.
Metric-Free Exploration for Topological Mapping by Task and Motion Imitation in Feature Space
Yuhang He, Irving Fang, Yiming Li, Rushi Bhavesh Shah, Chen Feng.
Robotics: Science and Systems (RSS), 2023.
We propose metric-free DeepExplorer to efficiently construct topological map to represent an environment. DeepExplorers exhibits strong sim2sim and sim2real generalization capability.
SoundSynp: Sound Source Detection from Raw Waveforms with Multi-Scale Synperiodic Filterbanks
Yuhang He, Andrew Markham
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023.
We propose a novel framework to construct learnable sound signal processing filter banks that achieve multi-scale processing in both time and frequency domain.
SoundDoA: Learn Sound Source Direction of Arrival and Semantics from Sound Raw Waveforms
Yuhang He, Andrew Markham
Interspeech, 2022.
We propose a novel sound event direction of arrival (DoA) estimation framework with a novel filter bank to jointly learn sound event semantics and spatial location relevant representations.
SoundDet: Polyphonic Moving Sound Event Detection and Localization from Raw Waveform
Yuhang He, Niki Trigoni, Andrew Markham
International Conference on Machine Learning (ICML), 2021.
We propose a novel sound event detection framework for polyphonic and moving sound event detection. We also propose novel object-based evaluation metrics to evaluate performance more objectively.
I am always happy to chat with people who are interested in my work. You can check the following office hour I keep update to book a time slot if you want to chat.