Yuhang He(Henry)

Aurelius: Relation Aware Text-to-Audio Generation At Scale

Yuhang He, He Liang, Jain Yash, Andrew Markham, Vibhav Vineet.

International Conference on Learning Representations (ICLR), 2026.

We introduce Aurelius framework, which contains AudioEventSet - a 110 classes audio event corpus across 7 major classes, AudioRelSet - a 100 classes relation corpus across 6 major classes, and a novel <text,audio> pair creation strategy.

PDF

BibTex

Project Site

Code

RiTTA: Modeling Event Relations in Text-to-Audio Generation

Yuhang He, Jain Yash, Xubo Liu, Andrew Markham, Vibhav Vineet.

Conference on Empirical Methods in Natural Language Processing (EMNLP Main), 2025.

We contribute audio events relation modeling in text-to-audio (TTA) generation task by proposing a new benchmark and a novel evaluation protocol.

PDF

BibTex

Project Site

Code

Deep Neural Room Acoustics Primitive

Yuhang He, Anoop Cherian, Gordon Wichern, Andrew Markham.

The 41st International Conference on Machine Learning (ICML), 2024.

We introduce a novel framework to learn a continuous neural room acoustics field that implicitly encodes all essential sound propagation primitives for each enclosed 3D space.

PDF

BibTex

Poster

SoundCount: Sound Counting from Raw Audio with Dyadic Decomposition Neural Network

Yuhang He, Zhuangzhuang Dai, Long Chen, Niki Trigoni, Andrew Markham.

The 38th Annual AAAI Conference on Artificial Intelligence (AAAI), 2024.

We introduce a learnable dyadic decomposition framework that learns more representative time-frequency representation from highly polyphonic and loudness varying sound waveform. It dyadically decomposes the waveform in multi-stage hierarchical manner.

PDF

BibTex

Poster

Sound3DVDet: 3D Sound Source Detection Using Multiview Microphone Array and RGB Images

Yuhang He, Sangyun Shin, Anoop Cherian, Niki Trigoni, Andrew Markham.

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024.

We introduce a novel 3D sound source localization and classification from multiview acoustic-camera recordings task. The sound source lies on object's physical surface but visually non-observable, which reflects some application cases like gas leaking.

PDF

BibTex

Poster

Code

Metric-Free Exploration for Topological Mapping by Task and Motion Imitation in Feature Space

Yuhang He, Irving Fang, Yiming Li, Rushi Bhavesh Shah, Chen Feng.

Robotics: Science and Systems (RSS), 2023.

We propose metric-free DeepExplorer to efficiently construct topological map to represent an environment. DeepExplorers exhibits strong sim2sim and sim2real generalization capability.

PDF

BibTex

Project

Code

SoundSynp: Sound Source Detection from Raw Waveforms with Multi-Scale Synperiodic Filterbanks

Yuhang He, Andrew Markham

International Conference on Artificial Intelligence and Statistics (AISTATS), 2023.

We propose a novel framework to construct learnable sound signal processing filter banks that achieve multi-scale processing in both time and frequency domain.

PDF

BibTex

Code

SoundDoA: Learn Sound Source Direction of Arrival and Semantics from Sound Raw Waveforms

Yuhang He, Andrew Markham

Interspeech, 2022.

We propose a novel sound event direction of arrival (DoA) estimation framework with a novel filter bank to jointly learn sound event semantics and spatial location relevant representations.

PDF

BibTex

SoundDet: Polyphonic Moving Sound Event Detection and Localization from Raw Waveform

Yuhang He, Niki Trigoni, Andrew Markham

International Conference on Machine Learning (ICML), 2021.

We propose a novel sound event detection framework for polyphonic and moving sound event detection. We also propose novel object-based evaluation metrics to evaluate performance more objectively.

PDF

BibTex

Real-Time Vehicle Detection from Short-range Aerial Image with Compressed MobileNet

Yuhang He, Ziyu Pan, Lingxi Li, Yunxiao Shan, Dongpu Cao and Long Chen

International Conference on Robotics and Automation (ICRA), 2019.

We propose a compressed MobileNet framework for real-time vehicle detection in short-range aerial image.

PDF

BibTex

Dress Fashionably: Learn Fashion Collocation with Deep Mixed-Category Metric Learning

Long Chen, Yuhang He

Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), 2018.

We propose deep mixed-category metric learning framework to equip machine with fashion collocation skill.

PDF

BibTex

Multi-task Relative Attributes Prediction by Incorporating Local Context and Global Style Information Features

Yuhang He, Long Chen, Jianda Chen

British Machine Vision Conference (BMVC), 2016.

We fuse features arising from both local context and global style to boost relative attributes prediction task.

PDF

BibTex

Using Edit Distance and Junction Feature to Detect and Recognize Arrow Road Marking

Yuhang He, Shi Chen, Yifeng Pan, Kai Ni

International IEEE Conference on Intelligent Transportation Systems (ITSC), 2014.

We incorporate classic line detector and edit distance metrics to detect various arrow road markings on autonomous driving road.

PDF

BibTex

Code