Jonathan LE ROUX

Jonathan LE ROUX
Distinguished Research Scientist
Speech & Audio Senior Team Leader
IEEE Fellow (Class of 2024)

Mitsubishi Electric Research Laboratories (MERL)
201 Broadway, 8th Floor, Cambridge, MA 02139, USA

Google Scholar
Github
Bluesky
Twitter
LinkedIn

Applying for internships

MERL has a great internship program, and we're always on the lookout for outstanding students to come spend a few months with us and work on cutting-edge research ideas.

Interns in the Speech and Audio team typically collaborate with us on all aspects of a research project: derivation of new models, implementation, experimental evaluation, and paper preparation. Our interns publish their work in major conferences in the field, such as ICASSP, Interspeech, WASPAA, and ASRU.

Here is a list of current internship openings in the Speech and Audio Group.

Important note:
We only consider applicants who are already in the Ph.D. program and have significant experience and publications in the areas of audio, speech, or language processing. We are unfortunately unable to reply to e-mails from other candidates.

News

Chiori Hori's paper "Robot Confirmation Generation and Action Planning Using Long-context Q-Former Integrated with Multimodal LLM" was selected as a Best Paper Award Candidate (Top 2% of submitted papers) at ASRU 2025.
I was elected as the next Vice Chair of the IEEE Signal Processing Society Audio and Acoustic Signal Processing Technical Committee (IEEE AASP TC) for the term starting on January 1st, 2026, serving along TC Chair Prof. Minje Kim.
SANE 2025 was successfully held on November 7, 2025 at Google New York.
Yoshiki Masuyama's paper "Physics-Informed Direction-Aware Neural Acoustic Fields" was selected as a Best Paper Award Candidate (Top 5% of submitted papers) at WASPAA 2025.
WASPAA 2025 was held October 12-15, 2025, in Tahoe City, CA. The move to a new venue seems to have been very well received, with great talks, posters, discussions, beer, and hikes. And a bear! More details in our Inside Signal Processing article.
Our team, led by our intern Chris Ick, won the Room Acoustics and Speaker Distance Estimation Challenge (GenDARA), organized as part of the IEEE ICASSP Satellite Workshop on Generative Data Augmentation for Real-World Signal Processing Applications (GenDA) at ICASSP 2025.
IguanaTex has reached 1k stars on GitHub!

Older news

Research Interests

At MERL, the core of my work is on audio signal processing, particularly speech enhancement and audio source separation, nowadays relying heavily on deep learning. I also actively participate in other projects in the Speech and Audio team, involving all parts of the speech and audio processing pipeline, such as acoustic event and anomaly detection, diarization, spatial audio, human-machine interaction, etc.

Prior to joining MERL, my Ph.D. and post-doc work included F0 estimation for single/multiple speakers and in noisy environments, spectrogram-consistency-aware signal processing, fast signal reconstruction from magnitude spectrograms, model-based source separation methods using harmonically-constrained GMMs as well as various extensions to non-negative matrix factorization (NMF).

Publications

Note: when using BibTeX to cite my papers, please make sure to write my last name as {Le Roux} in the author list, to prevent BibTeX (and later on Google...) from wrongly considering "Le" as a middle name. Thanks!

Journal Papers

Kevin Wilkinghoff, Haici Yang, Janek Ebbers, François G Germain, Gordon Wichern, Jonathan Le Roux, "Local Density-Based Anomaly Score Normalization for Domain Generalization," accepted for publication in IEEE Transactions on Audio, Speech, and Language Processing. [Github]
Chang-Bin Jeon, Gordon Wichern, François G Germain, Jonathan Le Roux, "Embracing Cacophony: Explaining and Improving Random Mixing in Music Source Separation," accepted for publication in IEEE Open Journal of Signal Processing.
Yoshiki Masuyama, Gordon Wichern, François G Germain, Christopher Ick, Jonathan Le Roux, "SuDaField: Subject- and Dataset-Aware Neural Field for HRTF Modeling," accepted for publication in IEEE Open Journal of Signal Processing. [Github]
Yoshiki Masuyama, Gordon Wichern, François G Germain, Christopher Ick, Jonathan Le Roux, "RANF: Neural Field-Based HRTF Spatial Upsampling with Retrieval Augmentation and Parameter Efficient Fine-Tuning," accepted for publication in IEEE Open Journal of Signal Processing. [Github]
Junghyun Koo, Gordon Wichern, François G Germain, Sameer Khurana, Jonathan Le Roux, "SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers," in IEEE Open Journal of Signal Processing, Vol. 6, pp. 266-275, Jan. 2025. [pdf] [Github]
Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji, "The Sound Demixing Challenge 2023 - Cinematic Demixing Track," in Transactions of the International Society for Music Information Retrieval, 2024. [pdf]
Christoph Boeddeker, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold Haeb-Umbach, Jonathan Le Roux, "TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024. [arXiv] [Github]
Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux, "Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023. [arXiv]
Zhong-Qiu Wang, Gordon Wichern, Shinji Watanabe, Jonathan Le Roux, "STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, Nov. 2022. [arXiv]
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, "Momentum Pseudo-Labeling: Semi-Supervised ASR with Continuously Improving Pseudo-Labels," in IEEE Journal of Selected Topics in Signal Processing, Vol. 16, No. 6, pp. 1424-1438, Sep. 2022. [pdf]
Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux, "Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, Nov. 2021. [arXiv]
Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux, "On The Compensation Between Magnitude and Phase in Speech Separation," in IEEE Signal Processing Letters. [arXiv]
Chiori Hori, Masato Tsuchiya, Siheng Chen, Anoop Cherian, Takaaki Hori, Bret Harsham, tim K. Marks, Jonathan Le Roux, Anthony Vetro, "Scene-aware Interaction Technology Based on Multimodal Sensing Information," in Society of Automotive Engineers of Japan, May 2021. [pdf] (in Japanese)
Fatemeh Pishdadian, Gordon Wichern, Jonathan Le Roux, "Finding Strength in Weakness: Learning to Separate Sounds with Weak Supervision," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 28, pp. 2386–2399, Sep. 2020. [arXiv]
Ryo Aihara, Jonathan Le Roux, Gordon Wichern, "Deep Clustering-based Single Channel Speech Separation and Recent Advances," in Acoustical Science and Technology, Vol. 41, No. 2, pp. 465–471, Mar. 2020. [pdf]
Ryo Aihara, Jonathan Le Roux, Gordon Wichern, "Deep Clustering-based Single Channel Speech Separation and Recent Advances," in The Journal of the Acoustical Society of Japan, Feb. 2020 (in Japanese). [pdf]
Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy Sarroff, John R. Hershey, "Phasebook and Friends: Leveraging Discrete Representations for Source Separation," in IEEE Journal of Selected Topics in Signal Processing, Special Issue on Data Science: Machine Learning for Audio Signal Processing, May 2019. [.bib] [arXiv]
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda, "Duration-Controlled LSTM for Polyphonic Sound Event Detection," in IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 25, no. 11, pp. 2059–2070, Nov. 2017. [.bib]
Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe, "Multi-Microphone Speech Recognition Integrating Beamforming, Robust Feature Extraction, and Advanced DNN/RNN Backend," in Computer Speech & Language, vol. 46, pp. 401–418, Nov. 2017.
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "Prior-based binary masking and discriminative methods for reverberant and noisy speech recognition using distant stereo microphones," Journal of Information Processing , Vol. 25, pp. 407–416, Jun. 2017.
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "Evaluation of Noisy Speech Recognition and Sequence Discriminative Training for Low-rank Deep Neural Network Acoustic Models," Information Processing Society of Japan (IPSJ) Journal, Vol. 57, No. 3, pp. 1080–1088, Mar. 2016. (in Japanese) [.pdf] [.bib]
Timo Gerkmann, Martin Krawczyk-Becker and Jonathan Le Roux, "Phase Processing for Single-Channel Speech Enhancement: History and recent advances," IEEE Signal Processing Magazine, Vol. 32, No. 2, Mar. 2015. (Obligatory IEEE Copyright Notice) [.pdf] [.bib]
Jonathan Le Roux and Emmanuel Vincent, "Consistent Wiener Filtering for Audio Source Separation," IEEE Signal Processing Letters, Vol. 20, No. 3, Mar. 2013. (Obligatory IEEE Copyright Notice) [.pdf] [.bib] [code]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Alain de Cheveigné and Shigeki Sagayama, "Computational Auditory Induction as a Missing-Data Model-Fitting Problem with Bregman Divergence," Speech Communication (Special issue on Perceptual and Statistical Audition), Vol. 53, Issue 5, pp. 658–676, May-June 2011. ScienceDirect (DOI Link). [.pdf] [.bib]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Alain de Cheveigné and Shigeki Sagayama, "Single and Multiple F0 Contour Estimation through Parametric Spectrogram Modeling of Speech in Noisy Environments," IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 4, pp. 1135–1145, May 2007. (Obligatory IEEE Copyright Notice) [.pdf] [.bib]
Erik McDermott, Timothy J. Hazen, Jonathan Le Roux, Atsushi Nakamura and Shigeru Katagiri, "Discriminative Training for Large Vocabulary Speech Recognition using Minimum Classification Error," IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 1, pp. 203–223, Jan. 2007. (Obligatory IEEE Copyright Notice) [.pdf] [.bib]

Book chapters

John Hershey, Jonathan Le Roux, Shinji Watanabe, Scott Wisdom, Zhuo Chen, Yusuf Isik, "Novel deep architectures in speech processing," in New Era for Robust Speech Recognition: Exploiting Deep Learning, S. Watanabe, M. Delcroix, F. Metze, J. R. Hershey, Eds. Springer, 2017.
Hakan Erdogan, John Hershey, Shinji Watanabe, Jonathan Le Roux, "Deep recurrent networks for separation and recognition of single-channel speech in non-stationary background audio," in New Era for Robust Speech Recognition: Exploiting Deep Learning, S. Watanabe, M. Delcroix, F. Metze, J. R. Hershey, Eds. Springer, 2017.
John R. Hershey, Steven J. Rennie, and Jonathan Le Roux, "Factorial Models for Noise Robust Speech Recognition," in: Techniques for Noise Robustness in Automatic Speech Recognition, T. Virtanen, R. Singh, and B. Raj, Eds. Wiley. [.pdf], [.bib] [Amazon]
Nobutaka Ono, Kenichi Miyamoto, Hirokazu Kameoka, Jonathan Le Roux, Yuuki Uchiyama, Emiru Tsunoo, Takuya Nishimoto and Shigeki Sagayama, "Harmonic and Percussive Sound Separation and Its Application to MIR-Related Tasks," in Advances in Music Information Retrieval, ser. Studies in Computational Intelligence, Z. W. Ras and A. Wieczorkowska, Eds. Springer, 2010, vol. 274, pp. 213–236. [.bib]

International Conference Papers (Peer reviewed)

Yoshiki Masuyama, Kohei Saijo, Francesco Paissan, Jiangyu Han, Marc Delcroix, Ryo Aihara, François G Germain, Gordon Wichern, Jonathan Le Roux, "FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2026.
Yoshiki Masuyama, François G Germain, Gordon Wichern, Chiori Hori, Jonathan Le Roux, "Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2026.
Ryo Aihara, Yoshiki Masuyama, Francesco Paissan, François G. Germain, Gordon Wichern, Jonathan Le Roux, "SUNAC: Source-aware Unified Neural Audio Codec," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2026.
Ryo Aihara, Yoshiki Masuyama, François G. Germain, Gordon Wichern, Jonathan Le Roux, "Exploring Disentangled Neural Speech Codecs from Self-Supervised Representations," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), May 2026.
Chiori Hori, Yoshiki Masuyama, Diego Romeres, Devesh K. Jha, Radu Corcodel, Siddarth Jain, Jonathan Le Roux, "Robot Confirmation Generation and Action Planning Using Long-context Q-Former Integrated with Multimodal LLM," in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Dec. 2025. Selected as Best Paper Award Candidate (Top 2% of submitted papers)
Kevin Wilkinghoff, Takuya Fujimura, Keisuke Imoto, Jonathan Le Roux, Zheng-Hua Tan, Tomoki Toda, "Handling Domain Shifts for Anomalous Sound Detection: A Review of DCASE-Related Work," in Proc. Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), Oct. 2025. [pdf]
Yoshiki Masuyama, François G. Germain, Gordon Wichern, Christopher Ick, Jonathan Le Roux, "Physics-Informed Direction-Aware Neural Acoustic Fields," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2025. [pdf] Selected as Best Paper Award Candidate (Top 5% of submitted papers)
Francesco Paissan, Gordon Wichern, Yoshiki Masuyama, Ryo Aihara, François G. Germain, Kohei Saijo, Jonathan Le Roux, "FasTUSS: Faster Task-Aware Unified Source Separation," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2025. [pdf]
Christopher Ick, Gordon Wichern, Yoshiki Masuyama, François G. Germain, Jonathan Le Roux, "Direction-Aware Neural Acoustic Fields for Few-Shot Interpolation of Ambisonic Impulse Responses," in Proc. ISCA Interspeech, Aug. 2025. [pdf]
Sameer Khurana, Dominik Klement, Antoine Laurent, Dominik Bobos, Juraj Novosad, Peter Gazdik, Ellen Zhang, Zilli Huang, Amir Hussein, Ricard Marxer, Yoshiki Masuyama, Ryo Aihara, Chiori Hori, François G. Germain, Gordon Wichern, Jonathan Le Roux, "Factorized RVQ-GAN For Disentangled Speech Tokenization," in Proc. ISCA Interspeech, Aug. 2025. [pdf]
Haici Yang, Gordon Wichern, Ryo Aihara, Yoshiki Masuyama, Sameer Khurana, François G. Germain, Jonathan Le Roux, "Investigating Continuous Autoregressive Generative Speech Enhancement," in Proc. ISCA Interspeech, Aug. 2025. [pdf]
Amir Hussein, Sameer Khurana, Gordon Wichern, François G. Germain, Jonathan Le Roux, "HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement," in Proc. ISCA Interspeech, Aug. 2025. [pdf]
Christopher Ick, Gordon Wichern, Yoshiki Masuyama, François G. Germain, Jonathan Le Roux, "Data Augmentation Using Neural Acoustic Fields With Retrieval-Augmented Pre-training," in Proc. IEEE ICASSP Satellite Workshop on Generative Data Augmentation for Real-World Signal Processing Applications (GenDA), Apr. 2025. [pdf] Winner of the Room Acoustics and Speaker Distance Estimation Challenge (GenDARA)
Kohei Saijo, Janek Ebbers, François G. Germain, Gordon Wichern, Jonathan Le Roux, "Task-Aware Unified Source Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2025. [pdf] [Github]
Kohei Saijo, Janek Ebbers, François G. Germain, Sameer Khurana, Gordon Wichern, Jonathan Le Roux, "Leveraging Audio-Only Data for Text-Queried Target Sound Extraction," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2025. [pdf]
Yoshiki Masuyama, Gordon Wichern, François G. Germain, Christopher Ick, Jonathan Le Roux, "Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2025. [pdf] [Github]
Kevin Wilkinghoff, Haici Yang, Janek Ebbers, François G. Germain, Gordon Wichern, Jonathan Le Roux, "Keeping the Balance: Anomaly Score Calculation for Domain Generalization," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2025. [pdf] [Github]
Janek Ebbers, François G. Germain, Kevin Wilkinghoff, Gordon Wichern, Jonathan Le Roux, "No Class Left Behind: A Closer Look at Class Balancing for Audio Tagging," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2025. [pdf]
Elio Gruttadauria, Mathieu Fontaine, Jonathan Le Roux, Slim Essid, "O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2025. [pdf]
Chiori Hori, Motonari Kambara, Komei Sugiura, Kei Ota, Sameer Khurana, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux, "Interactive Robot Action Replanning using Multimodal LLM Trained from Human Demonstration Videos," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2025. [pdf]
Christopher Ick, Gordon Wichern, Yoshiki Masuyama, François G. Germain, Jonathan Le Roux, "Spatially-Aware Losses for Enhanced Neural Acoustic Fields," in Proc. NeurIPS 2024 Audio Imagination Workshop, Dec. 2024. [pdf]
Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, Jonathan Le Roux, "TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement," in Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), Sep. 2024. [pdf] [Github] Best Student Paper Award Finalist
Jie Yin, Andrew Luo, Yilun Du, Anoop Cherian, Tim K. Marks, Jonathan Le Roux, Chuang Gan, "Disentangled Acoustic Fields For Multimodal Physical Scene Understanding," in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2024. [pdf]
Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, Jonathan Le Roux, "Enhanced Reverberation as Supervision for Unsupervised Speech Separation," in Proc. ISCA Interspeech, Sep. 2024. [pdf] [Github]
Janek Ebber, François G. Germain, Gordon Wichern, Jonathan Le Roux, ""Sound Event Bounding Boxes," in Proc. ISCA Interspeech, Sep. 2024. [pdf] [Github]
Sameer Khurana, Chiori Hori, Antoine Laurent, Gordon Wichern, Jonathan Le Roux, "ZeroST: Zero-Shot Speech Translation," in Proc. ISCA Interspeech, Sep. 2024. [pdf]
Zexu Pan, Gordon Wichern, François G. Germain, Kohei Saijo, Jonathan Le Roux, "PARIS: Pseudo-AutoRegressIve Siamese Training for Online Speech Separation," in Proc. ISCA Interspeech, Sep. 2024. [pdf]
Louis Bahrman, Mathieu Fontaine, Jonathan Le Roux, Gaël Richard, ""Speech Dereverberation Constrained on Room Impulse Response Characteristics," in Proc. ISCA Interspeech, Sep. 2024. [pdf]
Motonari Kambara, Chiori Hori, Komei Sugiura, Kei Ota, Devesh K. Jha, Sameer Khurana, Siddarth Jain, Radu Corcodel, Diego Romeres, Jonathan Le Roux, "Human Action Understanding-based Robot Planning using Multimodal LLM," in Proc. IEEE International Conference on Robotics and Automation (ICRA), Jun. 2024. [pdf]
Chang-Bin Jeon, Gordon Wichern, François G. Germain, Jonathan Le Roux, "Why does music source separation benefit from cacophony?," in Proc. IEEE ICASSP Satellite Workshop on Explainable Machine Learning for Speech and Audio (XAI-SA), IEEEXplore Track, Apr. 2024. [pdf]
Junghyun Koo, Gordon Wichern, François G. Germain, Sameer Khurana, Jonathan Le Roux, "Understanding and Controlling Generative Music Transformers by Probing Individual Attention Heads," in Proc. IEEE ICASSP Satellite Workshop on Explainable Machine Learning for Speech and Audio (XAI-SA), Workshop Track, Apr. 2024. [pdf]
Zexu Pan, Gordon Wichern, François G. Germain, Aswin Subramanian, Jonathan Le Roux, "Late Audio-Visual Fusion for In-The-Wild Speaker Diarization," in Hands-free Speech Communication and Microphone Arrays (HSCMA), Apr. 2024. [pdf]
Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François G. Germain, Jonathan Le Roux, Shinji Watanabe, "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf] Winner of the DCASE 2023 Task 6a Automatic Audio Captioning Challenge
Yoshiki Masuyama, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux, "NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf] [Github]
Zexu Pan, Gordon Wichern, François G. Germain, Sameer Khurana, Jonathan Le Roux, "NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf]
Dimitrios Bralios, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux, "Generation or Replication: Auscultating Audio Latent Diffusion Models," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf]
Chiori Hori, Pu Wang, Mahbub Rahman, Cristian Vaca-Rubio, Sameer Khurana, Anoop Cherian, Jonathan Le Roux, "Wi-Fi based Indoor Monitoring Enhanced by Multimodal Fusion," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf]
Teysir Baoueb, Haocheng Liu, Mathieu Fontaine, Jonathan Le Roux, Gaël Richard, "SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf]
Haocheng Liu, Teysir Baoueb, Mathieu Fontaine, Jonathan Le Roux, Gaël Richard, "GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf]
Zexu Pan, Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux, "Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2023. [pdf] (Winner of the 2nd COG-MHEAR Audio-Visual Speech Enhancement (AVSE) Challenge)
Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François G Germain, Jonathan Le Roux, Shinji Watanabe, "On the Use of Pretrained Deep Audio Encoders for Automated Audio Captioning Tasks," in Proc. International Symposium on Future Active Safety Technology toward zero traffic accidents (FAST-zero), Nov. 2023. [pdf]
Francois Germain, Gordon Wichern, Jonathan Le Roux, "Hyperbolic Unsupervised Anomalous Sound Detection," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2023. [pdf]
Ricardo Falcon Perez, Gordon Wichern, Francois Germain, Jonathan Le Roux, "Location as Supervision for Weakly Supervised Multi-Channel Source Separation of Machine Sounds," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2023. [pdf]
Chiori Hori, Puyuang Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux, "Style-transfer based Spoken Language understanding for Robot Action Sequence Acquisition from Videos," in Proc. ISCA Interspeech, Aug. 2023. [pdf]
Ke Chen, Gordon Wichern, Francois Germain, Jonathan Le Roux, "Pac-HuBERT: Self-Supervised Music Source Separation via Hidden-Unit BERT and Primitive Auditory Features," in Proc. IEEE ICASSP Satellite Workshop on Self-supervision in Audio, Speech and Beyond (SASB), Jun. 2023. [arXiv]
Darius Petermann, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux, "Hyperbolic Audio Source Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023. [arXiv] [Github] (Best Student Paper Award)
Rohith Aralikatti, Christoph Boeddeker, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux, "Reverberation as Supervision for Speech Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023. [arXiv]
Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux, "Optimal Condition Training for Target Source Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023. [arXiv]
Dimitrios Bralios, Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux, "Latent Iterative Refinement for Modular Source Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023. [arXiv]
Hao Yen, François G. Germain, Gordon Wichern, Jonathan Le Roux, "Cold Diffusion for Speech Enhancement," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023. [arXiv].
Satvik Venkatesh, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux, "Improved Domain Generalization via Disentangled Multi-Task Learning in Unsupervised Anomalous Sound Detection," in Proc. DCASE Workshop, Nov. 2022. [pdf]
Efthymios Tzinis, Gordon Wichern, Aswin Subramanian, Paris Smaragdis, Jonathan Le Roux, "Heterogeneous Target Speech Separation," in Proc. ISCA Interspeech, Sep. 2022. [pdf]
Chiori Hori, Takaaki Hori, Jonathan Le Roux, "Low-Latency Streaming Scene-aware Interaction Using Audio-Visual Transformers," in Proc. ISCA Interspeech, Sep. 2022. [pdf]
Darius Petermann, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux, "The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. [arXiv] [Demo] [Github] [Divide and Remaster (DnR) dataset]
Olga Slizovskaia, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux, "Locate This, Not That: Class-Conditioned Sound Event DOA Estimation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. [arXiv]
Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, "Sequence Transduction with Graph-based Supervision," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. [arXiv].
Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, "Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. [arXiv].
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, "Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. [arXiv].
Ankit Parag Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori, "Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. [pdf]
Chiori Hori, Ankit Parag Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Jonathan Le Roux, Tim K. Marks, "Overview of Audio Visual Scene-Aware Dialog with Reasoning Track for Natural Language Generation in DSTC10," in Proc. 10th Dialog System Technology Challenge Workshop at AAAI (DSTC10), Feb. 2022. [pdf]
Ankit Parag Shah, Takaaki Hori, Jonathan Le Roux, Chiori Hori, "DSTC10-AVSD Submission System with Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning," in Proc. 10th Dialog System Technology Challenge Workshop at AAAI (DSTC10), Feb. 2022. [pdf]
Anoop Cherian, Chiori Hori, Tim K. Marks, Jonathan Le Roux, "(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering," in Proc. AAAI Conference on Artificial Intelligence (AAAI), Feb. 2022. [pdf]
Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux, "Convolutive Prediction for Reverberant Speech Separation," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2021. [arXiv]
Gordon Wichern, Ankush Chakrabarty, Zhong-Qiu Wang, Jonathan Le Roux, "Anomalous sound detection using attentive neural processes," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2021. [pdf]
Moitreya Chatterjee, Jonathan Le Roux, Narendra Ahuja, Anoop Cherian, "Visual Scene Graphs for Audio Source Separation," in Proc. IEEE International Conference on Computer Vision (ICCV), Oct. 2021. [arXiv]
Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition," in Proc. ISCA Interspeech, Sep. 2021. [arXiv]
Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux, "Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers," in Proc. ISCA Interspeech, Sep. 2021. [arXiv]
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, "Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition," in Proc. ISCA Interspeech, Sep. 2021. [arXiv]
Chiori Hori, Takaaki Hori, Jonathan Le Roux, "Optimizing Latency for Online Video Captioning Using Audio-Visual Transformers," in Proc. ISCA Interspeech, Sep. 2021. [arXiv]
Yun-Ning Hung, Gordon Wichern, Jonathan Le Roux, "Transcription Is All You Need: Learning to Separate Musical Mixtures with Score as Supervision," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021. [arXiv]
Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021. [arXiv]
Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Capturing Multi-Resolution Context by Dilated Self-Attention," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021. [arXiv]
Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Semi-Supervised Speech Recognition via Graph-based Temporal Classification," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021. [arXiv]
Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian, "Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers," in Proc. AAAI Conference on Artificial Intelligence (AAAI), Feb. 2021. [arXiv]
Niko Moritz, Gordon Wichern, Takaaki Hori, Jonathan Le Roux, "All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection," in Proc. ISCA Interspeech, Oct. 2020. [.pdf] [.bib]
Takaaki Hori, Niko Moritz,Chiori Hori, Jonathan Le Roux, "Transformer-based Long-context End-to-end Speech Recognition," in Proc. ISCA Interspeech, Oct. 2020. [.pdf] [.bib]
Tejas Jayashankar, Jonathan Le Roux, Pierre Moulin, "Detecting Audio Attacks on ASR Systems with Dropout Uncertainty," in Proc. ISCA Interspeech, Oct. 2020. [arXiv] [.pdf] [.bib]
Ethan Manilow, Gordon Wichern, Jonathan Le Roux, "Hierarchical Musical Source Separation," in Proc. International Society for Music Information Retrieval (ISMIR) Conference, Oct. 2020. [Paper, Poster, Video] [.pdf] [.bib] (Best Poster Award, Best Video Award)
Prem Seetharaman, Gordon Wichern, Bryan Pardo, Jonathan Le Roux, "AutoClip: Adaptive Gradient Clipping for Source Separation Networks," in Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Sep. 2020. [arXiv] [.pdf] [.bib]
Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo, "Bootstrapping unsupervised deep music separation from primitive auditory grouping principles," in ICML Workshop on Self-Supervision in Audio and Speech (SAS), Jul. 2020. [arXiv] [.pdf] [.bib]
Fatemeh Pishdadian, Gordon Wichern, Jonathan Le Roux, "Learning to Separate Sounds From Weakly-Labeled Scenes," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020. [.pdf] [.bib]
Matthew Maciejewski, Gordon Wichern, Emmett McQuinn, Jonathan Le Roux, "WHAMR!: Noisy and Reverberant Single-Channel Speech Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020. [arXiv] [.pdf] [.bib]
Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe, "End-to-End Multi-speaker Speech Recognition with Transformer," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020. [arXiv] [.pdf] [.bib]
Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Streaming Automatic Speech Recognition with the Transformer Model," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020. [arXiv] [.pdf] [.bib]
Leda Sarı, Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-To-End ASR," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020. [.pdf] [.bib]
Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Streaming End-to-End Speech Recognition with Joint CTC-Attention Based Models," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2019. [.pdf] [.bib]
Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe, "MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2019. [.pdf] [.bib] (Best Paper Award)
Ethan Manilow, Gordon Wichern, Prem Seetharaman, Jonathan Le Roux, "Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2019. [.pdf] [.bib]
Ilya Kavalerov, Scott Wisdom, Hakan Erdogan, Brian Patton, Kevin Wilson, Jonathan Le Roux, John R. Hershey, "Universal Sound Separation," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2019. [.pdf] [.bib]
Gordon Wichern, Emmett McQuinn, Joe Antognini, Michael Flynn, Richard Zhu, Dwight Crow, Ethan Manilow, Jonathan Le Roux, "WHAM!: Extending Speech Separation to Noisy Environments," in Proc. ISCA Interspeech, Sep. 2019. [.pdf] [.bib]
Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition," in Proc. ISCA Interspeech, Sep. 2019. [.pdf] [.bib]
Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Niko Moritz, Jonathan Le Roux, "Vectorized Beam Search for CTC-Attention-based Speech Recognition," in Proc. ISCA Interspeech, Sep. 2019. [.pdf] [.bib]
Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "End-to-End Multi-Lingual Multi-Speaker Speech Recognition," in Proc. ISCA Interspeech, Sep. 2019. [.pdf] [.bib]
Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Triggered Attention for End-to-End Speech Recognition," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib]
Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy Sarroff, John R. Hershey, "The Phasebook: Building complex masks via discrete representations for source separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib]
Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, John R. Hershey, "SDR — half-baked or well done?," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib] [arXiv] [Code to reproduce the figures]
Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo, "Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib] [arXiv]
Prem Seetharaman, Gordon Wichern, Shrikant Venkataramani, Jonathan Le Roux, "Class-conditional embeddings for music source separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib] [arXiv]
Ryo Aihara, Toshiyuki Hanazawa, Yohei Okato, Gordon Wichern, Jonathan Le Roux, "Teacher-student deep clustering for low-delay single channel speech separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib]
Takaaki Hori, Ramon Astudillo, Tomoki Hayashi, Yu Zhang, Shinji Watanabe, Jonathan Le Roux, "Cycle-consistency training for end-to-end speech recognition," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib] [arXiv]
Gordon Wichern, Jonathan Le Roux, "Phase reconstruction with learned time-frequency representations for single-channel speech separation," in Proc. IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), Sep. 2018. [.pdf] [.bib]
Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang, John R. Hershey, "End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction," in Proc. ISCA Interspeech, Sep. 2018. [.pdf] [.bib]
Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey, "A Purely End-to-end System for Multi-speaker Speech Recognition," in Proc. Annual Meeting of the Association for Computational Linguistics (ACL), Jul. 2018. [.pdf] [.bib]
Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey, "An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2018. [.pdf] [.bib]
Shane Settle, Jonathan Le Roux, Takaaki Hori, Shinji Watanabe, John R. Hershey, "End-to-End Multi-Speaker Speech Recognition," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2018. [.pdf] [.bib]
Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey, "Alternative Objective Functions for Deep Clustering," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2018. [.pdf] [.bib]
Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey, "Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2018. [.pdf] [.bib] (Best Student Paper Award)
Paul Magron, Jonathan Le Roux, Tuomas Virtanen, "Consistent anisotropic Wiener filtering for audio source separation," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2017. [.pdf] [.bib]
Yuuki Tachioka, Tomohiro Narita, Iori Miura, Takanobu Uramoto, Natsuki Monta, Shingo Uenohara, Kenichi Furuya, Shinji Watanabe, Jonathan Le Roux, "Coupled initialization of multi-channel non-negative matrix factorization based on spatial and spectral information," in Proc. ISCA Interspeech, Aug. 2017. [.pdf] [.bib]
Yi Luo, Zhuo Chen, John R. Hershey, Jonathan Le Roux, Nima Mesgarani, "Deep Clustering and Conventional Networks for Music Separation: Stronger Together," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Mar. 2017. [.pdf] [.bib] [arXiv] [demo] (Ranked 1st out of 5 teams at the MIREX 2016 Singing Voice Separation Task)
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda, "BLSTM-HMM Hybrid System Combined with Sound Activity Detection Network for Polyphonic Sound Event Detection," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Mar. 2017. [.pdf] [.bib]
Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey, "Student-Teacher Network Learning with Enhanced Features," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Mar. 2017. [.pdf] [.bib]
Takaaki Hori, Hai Wang, Chiori Hori, Shinji Watanabe, Bret A. Harsham, Jonathan Le Roux, John R. Hershey, Yusuke Koji, Yi Jing, Zhaocheng Zhu, Takeyuki Aikawa, "Dialog State Tracking with Attention-Based Sequence-to-Sequence Learning," in Proc. IEEE Workshop on Spoken Language Technology (SLT), Dec. 2016. [.pdf] [.bib] (Ranked 2nd out of 9 submissions at DSTC5 Dialog State Tracking Challenge)
Scott Wisdom, Thomas Powers, John R. Hershey, Jonathan Le Roux, Les Atlas, "Full-Capacity Unitary Recurrent Neural Networks," in Proc. Advances in Neural Information Processing Systems (NIPS), Dec. 2016. [.pdf] [.bib] [Spotlight video] [Code]
Hakan Erdogan, Tomoki Hayashi, John R. Hershey, Takaaki Hori, Chiori Hori, Wei-Ning Hsu, Suyoun Kim, Jonathan Le Roux, Zhong Meng, Shinji Watanabe, "Multi-Channel Speech Recognition: LSTMs All the Way Through," in Proc. CHiME Workshop, pp. 45--48, Sep. 2016. [.pdf] [.bib] (Ranked 3rd out of 15 submissions at the 4th CHiME Speech Separation and Recognition Challenge, 6-channel track)
Yusuf Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey, "Single-Channel Multi-Speaker Separation using Deep Clustering," in Proc. ISCA Interspeech (Interspeech 2016), Sep. 2016. [.pdf] [.bib] [Demos and code to generate the dataset]
Hakan Erdogan, John R. Hershey, Shinji Watanabe, Michael Mandel, Jonathan Le Roux, "Improved MVDR beamforming using single-channel mask prediction networks," in Proc. ISCA Interspeech (Interspeech 2016), Sep. 2016. [.pdf] [.bib]
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda, "Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection," in Proc. DCASE2016 Workshop (DCASE2016), Sep. 2016. [.pdf] [.bib] (Ranked 3rd out of 9 submissions at the DCASE 2016 Challenge Task 2)
John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe, "Deep Clustering: Discriminative Embeddings for Segmentation and Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Mar. 2016. [.pdf] [.bib]
Scott Wisdom, John R. Hershey, Jonathan Le Roux, Shinji Watanabe, "Deep Unfolding for Multichannel Source Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Mar. 2016. [.pdf] [.bib]
Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe, "The MERL/SRI system for the 3rd CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015), Dec. 2015. [.pdf] [.bib] (Ranked 2nd out of 25 submissions at the 3rd CHiME Speech Separation and Recognition Challenge)
Bret Harsham, Shinji Watanabe, Alan Esenther, John R. Hershey, Jonathan Le Roux, Yi Luan, Daniel Nikovski, Vamsi Potluru, "Driver prediction to improve interaction with in-vehicle HMI," in Proc. Workshop on Digital Signal Processing for In-Vehicle Systems (DSP 2015), Oct. 2015. [.pdf] [.bib]
Felix Weninger, Hakan Erdogan, Shinji Watanabe, Emmanuel Vincent, Jonathan Le Roux, John R. Hershey, Björn Schuller, "Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR," in Proc. International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2015), Aug. 2015. [.pdf] [.bib]
Jonathan Le Roux, John R. Hershey, Felix Weninger, "Deep NMF for Speech Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), Apr. 2015. [.pdf] [.bib]
Jonathan Le Roux, Emmanuel Vincent, John R. Hershey, Daniel P. W. Ellis, "MICbots: collecting large realistic datasets for speech and audio research using mobile robots," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), Apr. 2015. [.pdf] [.bib] [Slides] [Project page]
Hakan Erdogan, John R. Hershey, Shinji Watanabe, Jonathan Le Roux, "Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), Apr. 2015. [.pdf] [.bib]
Felix Weninger, Jonathan Le Roux, John R. Hershey, Björn Schuller, "Discriminatively Trained Recurrent Neural Networks for Single-Channel Speech Separation," in Proc. IEEE GlobalSIP Symposium on Machine Learning Applications in Speech Processing (GlobalSIP MLASP 2014), Dec. 2014. [.pdf] [.bib]
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "Sequence Discriminative Training for Low-Rank Deep Neural Networks," in Proc. IEEE GlobalSIP Symposium on Machine Learning Applications in Speech Processing (GlobalSIP MLASP 2014), Dec. 2014. [.pdf] [.bib]
Felix Weninger, Jonathan Le Roux, John R. Hershey, Shinji Watanabe, "Discriminative NMF and its application to single-channel source separation," Proc. ISCA Interspeech, Sep. 2014. [.pdf] [.bib]
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "Sequential Maximum Mutual Information Linear Discriminant Analysis for Speech Recognition," in Proc. ISCA Interspeech, Sep. 2014. [.pdf] [.bib]
Yuuki Tachioka, Tomohiro Narita, Shinji Watanabe, Jonathan Le Roux, "Ensemble integration of calibrated speaker localization and statistical speech detection in domestic environments," in Proc. IEEE Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2014), May 2014. [.pdf] [.bib]
Felix Weninger, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, Yuuki Tachioka, Jürgen Geiger, Björn Schuller, Gerhard Rigoll, "The MERL/MELCO/TUM system for the REVERB Challenge using Deep Recurrent Neural Network Feature Enhancement," Proc. REVERB workshop (REVERB 2014), May 2014. [.pdf] [.bib]
Shinji Watanabe, Jonathan Le Roux, "A Study on Black Box Optimization for Automatic Speech Recognition," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2014. [.pdf] [.bib]
Umut Şimşekli, Jonathan Le Roux, John R. Hershey, "Non-Negative Source-Filter Dynamical System for Speech Enhancement," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2014. [.pdf] [.bib] (Student travel grant awarded to Umut)
Vamsi K. Potluru, Jonathan Le Roux, Barak A. Pearlmutter, John R. Hershey, Matthew E. Brand, "Pairwise Coordinate Descent Algorithm for NMF," in Proc. NIPS Workshop on Greedy Algorithms, Frank-Wolfe and Friends - A modern perspective, Dec. 2013. [Website], [.pdf], [.bib]
Emmanuel Vincent, Jon Barker, Shinji Watanabe, Jonathan Le Roux, Francesco Nesta, Marco Matassoni, "The Second 'CHiME' Speech Separation and Recognition Challenge: An Overview of Challenge Systems and Outcomes," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2013), Dec. 2013. [.pdf], [.bib]
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "A Generalized Discriminative Training Framework for System Combination," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2013), Dec. 2013. [.pdf], [.bib]
Jonathan Le Roux, Shinji Watanabe, John R. Hershey, "Ensemble Learning for Speech Enhancement," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2013), Oct. 2013. [.pdf], [.bib]
Umut Şimşekli, Jonathan Le Roux, John R. Hershey, "Hierarchical and Coupled Non-Negative Dynamical Systems with Application to Audio Modeling," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2013), Oct. 2013. [.pdf], [.bib] (Student travel grant awarded to Umut)
Koichiro Yoshino, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "Statistical Dialogue Management using Intention Dependency Graph," in Proc. International Joint Conference on Natural Language Processing (IJCNLP 2013), Oct. 2013. [.pdf], [.bib]
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "Discriminative methods for noise robust speech recognition: A CHiME Challenge benchmark," in Proc. International Workshop on Machine Listening in Multisource Environments (CHiME 2013), Jun. 2013. [.pdf], [.bib]
Jonathan Le Roux, Petros T. Boufounos, Kang Kang, John R. Hershey, "Source Localization in Reverberant Environments Using Sparse Optimization," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), May 2013. [.pdf], [.bib]
Cédric Févotte, Jonathan Le Roux, John R. Hershey, "Non-Negative Dynamical System With Application to Speech and Audio," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), May 2013. [.pdf], [.bib]
Emmanuel Vincent, Jon Barker, Shinji Watanabe, Jonathan Le Roux, Francesco Nesta, Marco Matassoni, "The Second CHiME Speech Separation and Recognition Challenge: Datasets, Tasks and Baselines," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), May 2013. [.pdf], [.bib]
Vamsi K. Potluru, Sergey M. Plis, Jonathan Le Roux, Barak A. Pearlmutter, Vince D. Calhoun, Thomas P. Hayes, "Block Coordinaate Descent for Sparse NMF," in Proc. International Conference on Learning Representations (ICLR 2013), May 2013. [arXiv], [.bib]
Creighton K. Heaukulani, Jonathan Le Roux and John R. Hershey, "Latent Dirichlet Reallocation for Term Swapping," in Proc. IEEE International Workshop on Statistical Machine Learning for Speech Processing (IWSML 2012), Mar. 2012. [.pdf], [.bib]
Jonathan Le Roux and John R. Hershey, "Indirect Model-Based Speech Enhancement," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp. 4045–4048, Mar. 2012. [.pdf], [.bib]
Masahiro Nakano, Jonathan Le Roux, Hirokazu Kameoka, Tomohiro Nakamura, Nobutaka Ono and Shigeki Sagayama, "Bayesian Nonparametric Spectrogram Modeling Based on Infinite Factorial Infinite Hidden Markov Model," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2011), Oct. 2011. [.pdf], [.bib]
Masahiro Nakano, Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Infinite-State Spectrum Model for Music Signal Analysis," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), pp. 1972–1975, May 2011. [.pdf], [.bib]
Jonathan Le Roux, Emmanuel Vincent, Yuu Mizuno, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Consistent Wiener Filtering: Generalized Time-Frequency Masking Respecting Spectrogram Consistency," in Proc. 9th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2010), Lecture Notes in Computer Science, Vol. 6365/2010, pp. 89–96, Springer, Sep. 2010. [.pdf], [.bib]
Hirokazu Kameoka, Takuya Yoshioka, Mariko Hamamura, Jonathan Le Roux and Kunio Kashino, "Statistical Model of Speech Signals Based on Composite Autoregressive System with Application to Blind Source Separation," in Proc. 9th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2010), pp. 245–253, Sep. 2010. [.pdf], [.bib]
Masahiro Nakano, Jonathan Le Roux, Hirokazu Kameoka, Yu Kitano, Nobutaka Ono and Shigeki Sagayama, "Nonnegative Matrix Factorization with Markov-chained Bases for Modeling Time-varying patterns in Music Spectrograms," in Proc. 9th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2010), pp. 149–156, Sep. 2010. [.pdf], [.bib]
Hirokazu Kameoka, Jonathan Le Roux and Yasunori Ohishi, "A Statistical Model of Speech F0 Contours," in Proc. ISCA Tutorial and Research Workshop on Statistical And Perceptual Audition (SAPA 2010), pp. 43–48, Sep. 2010. [.pdf], [.bib]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Fast Signal Reconstruction from Magnitude STFT Spectrogram Based on Spectrogram Consistency," in Proc. 13th International Conference on Digital Audio Effects (DAFx-10), pp. 397–403, Sep. 2010. [.pdf], [.bib]
Masahiro Nakano, Hirokazu Kameoka, Jonathan Le Roux, Yu Kitano, Nobutaka Ono, Shigeki Sagayama, "Convergence-Guaranteed Multiplicative Algorithms for Non-Negative Matrix Factorization with Beta-Divergence," in Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2010), pp. 283–288, Aug. 2010. [.pdf], [.bib]
Jonathan Le Roux, Alain de Cheveigné and Lucas C. Parra, "Adaptive Template Matching with Shift-Invariant Semi-NMF," in Advances in Neural Information Processing Systems 21 (Proc. NIPS 2008), D. Koller, Y. Bengio, D. Schuurmans, and L. Bottou, Eds. Cambridge, MA: The MIT Press, 2009. (Presented in Dec. 2008) [.pdf], [.bib] [source code]
Jonathan Le Roux, Nobutaka Ono and Shigeki Sagayama, "Explicit Consistency Constraints for STFT Spectrograms and Their Application to Phase Reconstruction," in Proc. Workshop on Statistical and Perceptual Audition (SAPA 2008), pp. 23–28, Sep. 2008. [.pdf], [.bib]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Alain de Cheveigné and Shigeki Sagayama, "Computational Auditory Induction by Missing-Data Non-Negative Matrix Factorization," in Proc. Workshop on Statistical and Perceptual Audition (SAPA 2008), pp. 1–6, Sep. 2008. [.pdf], [.bib]
Nobutaka Ono, Ken-ichi Miyamoto, Jonathan Le Roux, Hirokazu Kameoka and Shigeki Sagayama, "Separation of a Monaural Audio Signal into Harmonic/Percussive Components by Complementary Diffusion on Spectrogram," in Proc. European Signal Processing Conference (EUSIPCO), Aug. 2008. [.pdf], [.bib]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Shigeki Sagayama and Alain de Cheveigné, "Modulation Analysis of Speech Through Orthogonal FIR Filterbank Optimization," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), pp. 4189–4192, Apr. 2008. [.pdf], [.bib]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Alain de Cheveigné and Shigeki Sagayama, "Single Channel Speech and Background Segregation through Harmonic-Temporal Clustering," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 279–282, Oct. 2007. [.pdf] [.bib]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Alain de Cheveigné and Shigeki Sagayama, "Harmonic-Temporal Clustering of Speech for Single and Multiple F0 Contour Estimation in Noisy Environments," In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), Vol. IV, pp. 1053–1056, Apr. 2007. [.pdf] [.bib]
(Part of the expenses for the presentation of this work at ICASSP 2007 was supported by a grant from The Telecommunications Advancement Foundation, TAF, Japan)
Alain de Cheveigné, Jonathan Le Roux and Jonathan Z. Simon, "MEG Signal Denoising based on Time-Shift PCA," In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), Vol. I, pp. 317–320, Apr. 2007. [.pdf] [.bib]
Hirokazu Kameoka, Jonathan Le Roux, Nobutaka Ono and Shigeki Sagayama, "Speech Analyzer Using a Joint Estimation Model of Spectral Envelope and Fine Structure," In Proc. Interspeech International Conference on Spoken Language Processing (ICSLP2006), pp. 2502–2505, Sep. 2006. [.pdf] [.bib]
Jonathan Le Roux and Erik McDermott, "Optimization Methods for Discriminative Training," in Proc. Eurospeech 2005, pp. 3341–3344, Sep. 2005. [.pdf] [.bib]

International Conferences (Abstract)

Kevin Wilkinghoff, Takuya Fujimura, Keisuke Imoto, Jonathan Le Roux, "Handling Domain Shifts for Anomalous Sound Detection: A Review of DCASE-Related Work," in Proc. Annual meeting of the German and Danish Acoustical Societies (DAS | DAGA), Mar. 2025.
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda, "Convolutional bidirectional long short-term memory hidden Markov model hybrid system for polyphonic sound event detection," in Proc. 5th Joint meeting of the Acoustical Society of America and the Acoustical Society of Japan, Dec. 2016.
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Alain de Cheveigné and Shigeki Sagayama, "Music and Speech Signal Processing using Harmonic-Temporal Clustering," Invited talk at Acoustics'08 Paris (Acoustics'08), July 2008. [.bib]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Parametric Spectrogram Modeling of Single and Concurrent Speech with Spline Pitch Contour," in Proc. 4th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, Nov. 2006. [.bib]

Papers presented in national (Japanese) Meetings/Workshops

Ryo Aihara, Yoshiki Masuyama, François G. Germain, Gordon Wichern, Jonathan Le Roux, "Exploring Disentangled Neural Speech Codecs from Self-Supervised Representations," in Proc. of ASJ Spring Meeting, Mar. 2025.
Yoshiki Mitsui, Ryo Aihara, Takaaki Hori, Jonathan Le Roux, Shinya Taguchi, "Exploring Keyword Enrollment for Japanese End-to-End Automatic Speech Recognition using Contextual Biasing", in OTOGAKU Symposium, Jun. 2024.
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, "Momentum Pseudo-Labeling for Semi-Supervised End-to-End Automatic Speech Recognition," in Proc. of ASJ Spring Meeting, Mar. 2022.
Yuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "Sequential maximum mutual information discriminative training of linear discriminant analysis for speech recognition," in Proc. of ASJ Spring Meeting, 2-1-5, Mar. 2016. [.bib]
Yuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "Ensemble integration of calibrated speaker localization and statistical speech detection in domestic environments: DIRHA corpus," in Proc. of ASJ Spring Meeting, 2-10-5, Mar. 2015. [.bib]
Yuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "A generalized framework of discriminative training for system combination," in Proc. of ASJ Autumn Meeting, 1-8-2, Sep. 2014. [.bib]
Yuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "MMI discriminative training of acoustic models for system combination," in Proc. of ASJ Spring Meeting, Mar. 2014. [.bib]
Yuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "Effectiveness of discriminative approaches for speech recognition under noisy environments on the 2nd CHiME Challenge," in Proc. of ASJ Autumn Meeting, 1-8-1, Sep. 2013. [.bib] (Awaya Prize Young Researcher Award to Yuki for this paper)
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "The effectiveness of discriminative methods to noisy environments on the second CHiME challenge," Technical Report of the Institute of Electronics, Information and Communication Engineers, SP2013-55, Jul. 2013.
Jonathan Le Roux, Cédric Févotte, John R. Hershey, "A new non-negative dynamical system for speech and audio modeling," in Proc. of ASJ Spring Meeting, 3-10-20, Mar. 2013. [.bib] (Awaya Prize Young Researcher Award)
Jonathan Le Roux and John R. Hershey, "Speech enhancement by indirect VTS," in Proc. of ASJ Spring Meeting, 1-Q-13, Mar. 2012. [.bib]
Masahiro Nakano, Jonathan Le Roux, Hirokazu Kameoka, Tomohiro Nakamura, Nobutaka Ono and Shigeki Sagayama, "Bayesian nonparametric spectrogram modeling for music signal analysis," in Proc. of IPSJ SIGMUS, 2011-MUS-89, Jul. 2011. (in Japanese) [.bib] (Best presentation award to Masahiro)
Jonathan Le Roux, "Phase-controlled sound transfer based on maximally-inconsistent spectrograms," in Proc. of ASJ Spring Meeting, 1-Q-51, Mar. 2011. [.pdf], [.bib], [Poster], [Demo page]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Phase initialization schemes for faster spectrogram-consistency-based signal reconstruction," Proc. of ASJ Autumn Meeting, 3-10-3, Sep. 2010. [.bib]
Hirokazu Kameoka, Jonathan Le Roux and Yasunori Ohishi, "Statistical modeling of the speech F0 pattern generation process," Proc. of ASJ Autumn Meeting, 1-1-3, Sep. 2010. [.bib]
Mariko Hamamura, Hirokazu Kameoka, Takuya Yoshioka, Jonathan Le Roux and Kunio Kashino, "Blind source separation and dereverberation using speech power spectral density model derived from composite autoregressive system," in Proc. of ASJ Autumn Meeting, 2-P-22, Sep. 2010. (in Japanese) [.bib]
Masahiro Nakano, Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Music Signal Analysis with Infinite-State Spectrum Model," in Proc. of IPSJ SIGMUS, 2010-MUS-86, Jul. 2010. (in Japanese) [.bib]
Jonathan Le Roux, Emmanuel Vincent, Yu Mizuno, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Consistent Wiener filtering: designing generalized time-frequency masks respecting spectrogram consistency," in Proc. of ASJ Spring Meeting, 3-5-3, Mar. 2010. [.bib]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Fast phase estimation algorithms based on spectrogram consistency," Proc. of ASJ Spring Meeting, 3-5-2, Mar. 2010. [.bib]
Hirokazu Kameoka, Yasunori Ohishi, Daichi Mochihashi and Jonathan Le Roux, "Speech analysis with multi-kernel linear prediction," Proc. of ASJ Spring Meeting, 2-Q-24, Mar. 2010. (in Japanese) [.bib]
Masahiro Nakano, Yu Kitano, Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Polyphonic music signal analysis using non-negative matrix factorization with deformable bases," in Proc. of IPSJ SIGMUS, 2009-MUS-84-12, Feb. 2010. (in Japanese) [.bib]
Hirokazu Kameoka, Jonathan Le Roux, "Study on Multiplicative Update Formula for Nonnegative Matrix Factorization with Frobenius Norm Criterion," in Proc. of ASJ Autumn Meeting, Sep. 2009. (in Japanese) [.bib]
Jonathan Le Roux, Hirokazu Kameoka, Emmanuel Vincent, Nobutaka Ono, Kunio Kashino and Shigeki Sagayama, "Complex NMF under spectrogram consistency constraints," in Proc. of ASJ Autumn Meeting, Sep. 2009. [.bib]
Hirokazu Kameoka, Jonathan Le Roux, Yasunori Ohishi and Kunio Kashino, "Music Factorizer: A note-by-note editing interface for music waveforms," in Proc. of IPSJ SIGMUS Summer Workshop, 2009-MUS-81-9, Jul. 2009. (in Japanese) [.bib]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Spectrogram consistency and its application to phase reconstruction," in Proc. of IPSJ SIGMUS Summer Workshop, 2009-MUS-81-8, Jul. 2009. [.bib] (IPSJ Yamashita Memorial Research Award)
Akinori Ito, Daichi Ando, Jonathan Le Roux, Tomoyasu Nakano and Kazuyoshi Yoshii, "Panel Discussion Featuring Newly Honored Doctors (III)," in Proc. of IPSJ SIGMUS Summer Workshop, 2009-MUS-81-7, Jul. 2009. (in Japanese) [.bib]
Yu Mizuno, Jonathan Le Roux, Nobutaka Ono and Shigeki Sagayama, "Real-Time Time- Scale/Pitch Modification of Music Signal by Stretching Power Spectrogram and Consistent Phase Reconstruction," in Proc. of ASJ Spring Meeting, 2-8-4, Mar. 2009. (in Japanese) [.bib]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Shigeki Sagayama and Alain de Cheveigné, "Filterbank Optimization for Amplitude Modulation Analysis of Audio Signals," in Proc. of ASJ Autumn Meeting, 1-2-1, Sep. 2007. [.bib]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Alain de Cheveigné and Shigeki Sagayama, "Monaural Speech Separation Through Harmonic-Temporal Clustering of the Power Spectrum," in Proc. of ASJ Autumn Meeting, 3-4-3, Sep. 2007. [.bib]
Ken-ichi Miyamoto, M. Tatezono, Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Shigeki Sagayama, "Separation of Harmonic and Non-Harmonic Sounds Based on 2D-filtering of the Spectrogram," in Proc. of ASJ Autumn Meeting, 1-1-7, Sep. 2007. (in Japanese) [.bib]
Hirokazu Kameoka, Jonathan Le Roux, Nobutaka Ono and Shigeki Sagayama, "Harmonic-Temporal Structured Clustering: A New Approach to CASA," in Proc. of the Technical Committee of Psychological and Physiological Acoustics of the Acoustical Society of Japan, Vol. 36, No. 7, H-2006-103, pp. 575–580, Oct. 2006. (in Japanese) [.pdf] [.bib]
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Harmonic Temporal Clustering of Speech Spectrum," in Proc. ASJ Spring Meeting, 2-11-3, in CD-ROM, Mar. 2006. [.bib]

Tutorials, Technical Reports

Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux, "Leveraging Low-Distortion Target Estimates for Improved Speech Enhancement," arXiv preprint, arXiv:2110.00570, Oct. 2021. [arXiv].
Peng Gao, Chiori Hori, Shijie Geng, Takaaki Hori, Jonathan Le Roux, "Multi-Pass Transformer for Machine Translation," arXiv preprint, arXiv:2009.11382, Sep. 2020. [arXiv]
Jonathan Le Roux, Emmanuel Vincent, Hakan Erdogan, "Learning-Based Approaches to Speech Enhancement and Separation," Tutorial given at Interspeech, Sep. 2016. [Slides] [.bib]
John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe, "Deep clustering: Discriminative embeddings for segmentation and separation," arXiv:1508.04306, Aug. 2015. [arXiv (latest version)] [.bib]
Jonathan Le Roux, Felix Weninger, John R. Hershey, "Sparse NMF – half-baked or well done?," Mitsubishi Electric Research Laboratories Technical Report, TR2015-023, Mar. 2015. [.pdf] [.bib] [source code]
John R. Hershey, Jonathan Le Roux, Felix Weninger, "Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures," arXiv:1409.2574, Mitsubishi Electric Research Laboratories Technical Report, TR2014-117, Aug. 2014. [TR], [arXiv (latest version)] [.bib]
Jonathan Le Roux, Emmanuel Vincent, "A categorization of robust speech processing datasets", Mitsubishi Electric Research Laboratories Technical Report, TR2014-116, Aug. 2014.[.pdf] [.bib] [wiki page] (The table can be printed over 4 pages from the PDF by using the "Poster" option in Acrobat, and rescaling so that it fits in 1 x 4 Letter size pages)
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "On the Interpretation of I-Divergence-Based Distribution Fitting as a Maximum-Likelihood Estimation Problem," Technical Report of the University of Tokyo, METR 2008-11, Mar. 2008. [.pdf] [.bib]

Datasets, Softwares

SuDaField, source code for training and evaluating the Subject- and Dataset-Aware Neural Field for HRTF Modeling, presented in our IEEE OJ-SP 2025 paper.
TUSS, source code for training and evaluating our task-aware unified source separation model, presented in our ICASSP 2025 paper.
Anomaly Score Normalization for Domain Generalization, source code for training and evaluating our local density-based anomaly score normalization approach for domain generalization in unsupervised anomalous sound detection, presented in our ICASSP 2025 paper.
RANF-HRTF, source code for training and evaluating the retrieval-augmented neural field (RANF) for HRTF upsampling and personalization, presented in our ICASSP 2025 paper and our OJ-SP 2025 paper.
SMITIN, source code for training and evaluating our Self-Monitored Inference-Time INtervention approach for generative music Transformers, presented in our IEEE Open Journal of Signal Processing 2025 paper.
TF-Locoformer, source code for the Transformer-based model with LOcal-modeling by COnvolution for speech enhancement and audio source separation, presented in our Interspeech 2024 paper. Training and inference scripts are provided, as well as pretrained models for the WSJ0-2mix, Libri2mix, WHAMR!, and DNS-Interspeech2020 datasets.
SEBBs, Python implementation for the prediction of sound event bounding boxes (SEBBs) as proposed in our Interspeech 2024 paper: "Sound Event Bounding Boxes."
ERAS, source code for training and evaluating the Enhanced Reverberation as Supervision (ERAS) framework for fully unsupervised training of 2-source separation using stereo data, proposed in our Interspeech 2024 paper: "Enhanced Reverberation as Supervision for Unsupervised Speech Separation."
TS-SEP, source code for testing the network architectures proposed in our IEEE TASLP paper "TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings."
neural-IIR-field (NIIRF), source code for training and evaluating the neural infinite impulse response filter field (NIIRF) proposed in our ICASSP 2024 paper, "NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization."
Hyper-Unmix, source code for training models and using the hyperbolic interface proposed in our ICASSP 2023 paper, "Hyperbolic Audio Source Separation." Please refer to: Darius Petermann, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux, "Hyperbolic Audio Source Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023. [arXiv]
Cocktail-Fork-Separation and Divide and Remaster (DnR), a baseline system to separate a monaural audio signal into speech, music, and sound effects/background stems, i.e., solving the cocktail fork problem, and a source separation dataset for training and testing such algorithms. Please refer to: Darius Petermann, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux, "The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. [arXiv] [Demo].
WHAM!48kHz, a high-fidelity version of the ambient background noise recordings originally used for the WSJ0 Hipster Ambient Mixtures (WHAM!) dataset. 78 hours of raw binaural recordings (48 kHz/24-bit), collected by our definitely not Careless Whisper.ai collaborators at various urban locations (restaurants, cafes, bars, parks) throughout the Bay Area. As a Last Christmas present in the WHAM! series, we have decided to Make It Big.
WHAMR!, a spatialized (stereo, reverberant) version of the WHAM! dataset, for noisy and reverberant single- and multi-channel speech separation. Please refer to: Matthew Maciejewski, Gordon Wichern, Emmett McQuinn, Jonathan Le Roux, "WHAMR!: Noisy and Reverberant Single-Channel Speech Separation," in Proc. IEEE ICASSP, May 2020.
Slakh: the Synthesized Lakh Dataset, a dataset for music source separation synthesized from MIDI data using professional-grade sample-based virtual instruments. Please refer to: Ethan Manilow, Gordon Wichern, Prem Seetharaman, Jonathan Le Roux, "Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2019.
WHAM!: The WSJ0 Hipster Ambient Mixtures dataset, a dataset for noisy speech separation. Please refer to: Gordon Wichern, Emmett McQuinn, Joe Antognini, Michael Flynn, Richard Zhu, Dwight Crow, Ethan Manilow, Jonathan Le Roux, "WHAM!: Extending Speech Separation to Noisy Environments," in Proc. ISCA Interspeech, Sep. 2019.
Table listing the months in which major SP/ML conferences took place. (Python script to make your own)
Script to generate the spatialized version of the WSJ0-based multi-speaker speech separation dataset used in multi-channel Deep Clustering experiments. Please refer to: Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey, "Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2018.
Script to generate the WSJ0-based multi-speaker speech separation dataset used in Deep Clustering experiments. Please refer to: John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe, "Deep Clustering: Discriminative Embeddings for Segmentation and Separation," in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Mar. 2016.
Sparse NMF with beta-divergence (Matlab; Apache 2.0 license). Please refer to: Jonathan Le Roux, Felix Weninger, John R. Hershey, "Sparse NMF – half-baked or well done?," Mitsubishi Electric Research Laboratories Technical Report, TR2015-023, Mar. 2015. [Python version by Maxime Woringer]
NDS: Non-negative dynamical system (Matlab; MERL research-only license). Please refer to: Cédric Févotte, Jonathan Le Roux, John R. Hershey, "Non-Negative Dynamical System With Application to Speech and Audio," in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2013.
Consistent Wiener Filtering code (Matlab; Research-only license). Please refer to: Jonathan Le Roux and Emmanuel Vincent, "Consistent Wiener Filtering for Audio Source Separation," IEEE Signal Processing Letters, Vol.20, No. 3, Mar. 2013.
Francesco Nesta, Marco Matassoni, Jonathan Le Roux, Shinji Watanabe, Jon P. Barker and Emmanuel Vincent, "The 2nd CHiME Speech Separation and Recognition Challenge WSJ0 dataset," URL: http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task2.html, 2012.
LWS: Fast spectrogram phase recovery using Local Weighted Sums (C/C++, Matlab, Python; Apache 2.0 license). Please refer to: Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Shigeki Sagayama, "Fast Signal Reconstruction from Magnitude STFT Spectrogram Based on Spectrogram Consistency," in Proc. International Conference on Digital Audio Effects (DAFx), pp. 397--403, Sep. 2010.
Shift-invariant Semi-NMF (Matlab). Please refer to: Jonathan Le Roux, Alain de Cheveigné and Lucas C. Parra, "Adaptive Template Matching with Shift-Invariant Semi-NMF," in Advances in Neural Information Processing Systems 21 (Proc. NIPS 2008), D. Koller, Y. Bengio, D. Schuurmans, and L. Bottou, Eds. Cambridge, MA: The MIT Press, 2009.
I put some general purpose software on this page. The most useful piece of software listed there is probably IguanaTex, a free PowerPoint add-in to include LaTeX displays, whose development I have taken over from its original author. IguanaTex is a very good alternative to TexPoint, which unfortunately seems to have fallen into abandonware.

Ph.D. Thesis

Jonathan Le Roux, "Exploiting Regularities in Natural Acoustical Scenes for Monaural Audio Signal Estimation, Decomposition, Restoration and Modification," Ph.D. dissertation, The University of Tokyo & Université Paris 6, Mar. 2009. [.pdf] [.bib]

Maths related

Eric A. Carlen, Maria C. Carvalho, Jonathan Le Roux, Michael Loss and Cédric Villani, "Entropy and Chaos in the Kac Model," in Kinetic and Related Models, Vol. 3, No. 1, pp. 85–122, Mar. 2010. [.pdf] [.bib]
Jonathan Le Roux, "On Kac's Model in Kinetic Theory," Master's Thesis, 2003. (in French) [.pdf] [.bib]
Jonathan Le Roux, "Introduction to Boltzmann Equation," Part of my Magistere's Thesis (Graduation of the ENS Maths Department), 2003. (in French) [.pdf]
Jonathan Le Roux, "Spectral Properties of Elliptic Operators on Compact Manifolds," Master's Thesis, July 2001. (in French) [.pdf] [.bib]
Jonathan Le Roux, "The Berlekamp Switching Game," TIPE (2nd year personal research project, part of France's "Grandes Ecoles" competitive entrance examinations), June 1999. (in French) [.pdf]

Awards, Scholarships

Acoustical Society of Japan Technology Development Award for "Development of the speech separation service Waketekoo", Mar. 2026.
Best Paper Award Candidate (Top 2% of submitted papers) for "Robot Confirmation Generation and Action Planning Using Long-context Q-Former Integrated with Multimodal LLM", ASRU 2025, Dec. 2025.
Best Paper Award Candidate (Top 5% of submitted papers) for "Physics-Informed Direction-Aware Neural Acoustic Fields", WASPAA 2025, Oct. 2025.
Recognized as #17 for default "Citation Only" metric, #6 for "Credit" metric, and #8 for "New Star" metric in the 2025 AI 2000 list of Most Influential Scholars in Speech Recognition, Aminer, Aug. 2025.
Winner of the Room Acoustics and Speaker Distance Estimation Challenge (GenDARA), Apr. 2025.
Best Student Paper Award Finalist for "TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement", IWAENC 2024, Sep. 2024.
Best Team out of 7 at the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge Task 2, Aug. 2024.
Recognized as #19 for default "Citation Only" metric, #10 for "Credit" metric, and #9 for "New Star" metric in the 2024 AI 2000 list of Most Influential Scholars in Speech Recognition, Aminer, Mar. 2024.
Elevated to IEEE Fellow, effective January 1st, 2024.
Best Team out of 8 at the 2nd COG-MHEAR Audio-Visual Speech Enhancement (AVSE) Challenge, Sep. 2023.
Best Student Paper Award for "Hyperbolic Audio Source Separation," ICASSP 2023, Jun. 2023.
Recognized as Outstanding Reviewer by the Organizing Committee of ICASSP 2023, Jun. 2023.
Best Team out of 11 at the DCASE 2023 Task 6a Automatic Audio Captioning Challenge, May 2023.
Recognized as #17 for default "Citation Only" metric and #13 for "Credit" metric in the 2023 AI 2000 list of Most Influential Scholars in Speech Recognition, Aminer, Feb. 2023.
5th Best Team out of 32 at the DCASE 2022 Task 2 Unsupervised Anomalous Sound Detection Challenge, Jul. 2022.
Best Poster Award, Best Video Award for "Hierarchical Musical Source Separation," ISMIR 2020, Oct. 2020.
Recognized as #26 in the list of 100 Most Influential Scholars in Speech Recognition for 2009-2019, Aminer, Feb. 2020.
Best Paper Award for "MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition," ASRU 2019, Dec. 2019.
Recognized as #100 in the list of 100 Most Influential Scholars in Speech Recognition for 2007-2017, Aminer, Mar. 2019.
Best Student Paper Award for "Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation," ICASSP 2018, Apr. 2018.
2nd Best Performance out of 9 submissions at the DSTC5 Dialog State Tracking Challenge, Dec. 2016.
3rd Best Performance out of 15 submissions at the 4th CHiME Speech Separation and Recognition Challenge, 6-channel track, Dec. 2016.
Best Performance out of 5 teams at MIREX 2016 Singing Voice Separation Task, Aug. 2016.
2nd Best Performance out of 25 submissions at the 3rd CHiME Speech Separation and Recognition Challenge, Dec. 2015.
The Awaya Prize Young Researcher Award by the Acoustical Society of Japan (ASJ), September 2013.
Best Performance in Track 2 (medium vocabulary continuous ASR in noisy reverberant environment) of the 2nd CHiME challenge (results), June 2013.
The 2010 Yamashita Memorial Research Award by the Information Processing Society of Japan (IPSJ), March 2011.
The 2009 Dean's Award for Outstanding Student in the Graduate School of Information Science and Technology, The University of Tokyo, March 2009.
Travel grant to the IEEE ICASSP 2007 conference (Honolulu, HI, USA) from The Telecommunications Advancement Foundation (TAF), Japan, 2007.
Japanese Government Monbukagakusho Research Scholarship, April 2005 to March 2009.
Association of International Education, Japan (AIEJ) Scholarship, September 2003 to August 2004.
Ecole Normale Supérieure (ENS) four-year full fellowship, September 1999 to August 2004 (including a one-year leave in China)
National Rank 13 at the Competitive Entrance Examination for the École Normale Supérieure (ENS) of Paris (Maths -- C/S), 1999
National Rank 1 at the Competitive Entrance Examination for the École Polytechnique (Maths/Physics -- MP), 1999
"Mention Régionale" (National Rank 9-16), Concours Général de Mathématiques, 1997

Media coverage

TWIML AI Podcast interview on MERL's work towards solving the cocktail party problem.
Seamless Speech Recognition (2019): NHK, News (Japanese); NHK World, News (English), video report (starting at 4'38"); TV Asahi, ANN news (Japanese); Nippon TV, News24 (Japanese); Fuji TV, Prime News Alpha (Japanese); TV Tokyo, World Business Satellite (Japanese); TV Tokyo, Morning Satellite (Japanese); TBS, News, N Studio (Japanese); The Asahi Shimbun (Japanese); The Nikkei Shimbun (Japanese); Nikkei xTech (Japanese); Response (Japanese)
Single-channel multi-speaker separation using deep clustering (2017): NPR, All Tech Considered (English); TBS, N Studio (Japanese); TV Tokyo, World Business Satellite (Japanese); Fuji TV, News, "Minna no Mirai" (Japanese); IEEE Spectrum (English); The Nikkei (Japanese); Nikkei Technology Online (Japanese); Sankei Biz (Japanese); EE Times Japan (Japanese); ITpro (Japanese); Nikkan Sports (Japanese); Nikkan Kogyo Shimbun (Japanese); Dempa Shimbun (Japanese); Il Sole 24 Ore (Italian)
Noise-suppression technology for high-quality voice communication (2015): IEEE Spectrum (English); Asahi Shimbun (Japanese)

Academic activities

Organizer/Chair/Panelist

Vice Chair of the IEEE Signal Processing Society Audio and Acoustic Signal Processing Technical Committee (IEEE AASP TC) for the term starting on January 1st, 2026.
Co-Organizer of the SANE 2025 workshop, November 12, 2025, New York, NY
General Chair (with Prof. Timo Gerkmann) of WASPAA 2025
Session Chair (x3) for ICASSP 2025
Co-Organizer of the SANE 2024 workshop, October 17, 2024, Cambridge, MA
Session Chair for ICASSP 2024
Co-Organizer of the SANE 2023 workshop, October 26, 2023, New-York, NY
Co-Organizer of the Special Session on Multi-talker methods in Speech Processing at Interspeech 2023
Session Chair for ICASSP 2023
Town Hall Panelist for DCASE 2022
Co-Organizer of the SANE 2022 workshop, October 6, 2022, Cambridge, MA
Technical Program Co-Chair for WASPAA 2021
Session Chair for Interspeech 2021
Session Chair for ICASSP 2021
Session Chair for DCASE 2020
Session Chair for Interspeech 2020 ("Targeted Source Separation", "Speech Enhancement")
Session Chair for ICASSP 2020 (AUD-P8)
Session Chair for DCASE 2019
Co-Organizer of the SANE 2019 workshop, October 24, 2019, Columbia University, New York, NY
Session Chair for ICASSP 2019 (AASP-L4)
Co-Organizer of the SANE 2018 workshop, October 18, 2018, Google, Cambridge, MA
Session Chair for ICASSP 2018 (AASP-L2)
Organizer and Chair of the first-ever on-stage live demo session at ASRU 2017
Co-Organizer of the SANE 2017 workshop, October 19, 2017, Google, New York, NY
Session Chair for WASPAA 2017 (P1)
Session Chair for ICASSP 2017 (AASP-P1)
Co-Organizer of the SANE 2016 workshop, October 21, 2016, MIT, Cambridge, MA
Session Chair for ICASSP 2016 (AASP-P7)
Co-Organizer of the SANE 2015 workshop, October 22, 2015, Google, New York, NY
Co-Organizer and Co-Chair of the Special Session on "Robots for Audio, Audio for Robots" (SS-L3) at ICASSP 2015
Session Co-Chair for ICASSP 2015 (AASP-P10)
Co-Organizer of the SANE 2014 workshop, October 23, 2014, MIT, Cambridge, MA
Session Co-Chair for ICASSP 2014 (AASP-L2)
Manager of the sane-news mailing list on speech and audio research in the Northeast of the American continent
Co-Organizer of the SANE 2013 workshop, October 24, 2013, Columbia University, New York, NY
Program Co-Chair for the 2nd International Workshop on Machine Listening in Multisource Environments (CHiME 2013), June 1, 2013, Vancouver, Canada
Organizing Committee Member for the 2nd "CHiME" Speech Separation and Recognition Challenge, 2012, supported by the IEEE Technical Committees AASP-TC, MMSP-TC and SL-TC
Co-Organizer of the SANE 2012 workshop, October 24, 2012, MERL, Cambridge, MA
Session Co-Chair for ICASSP 2012 (MLSP-P5)
Organizer of the Audio and Music Signal Processing Mini-Symposium, October 20, 2011, MERL, Cambridge, MA

Reviewer

Journals

IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Signal Processing
IEEE Journal of Selected Topics in Signal Processing
IEEE Signal Processing Letters
Past: IEEE Geoscience and Remote Sensing Letters, Signal Processing (Elsevier), Speech Communication (Elsevier), Pattern Recognition (Elsevier, Computer Speech and Language (Elsevier, Journal of Computational and Applied Mathematics (Elsevier, EURASIP Journal on Audio, Speech, and Music Processin, IEICE Transactions on Information and Systems (both English and Japanese issues, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Science, IPSJ Journal of Information Processin, Acta Acustica

Conferences

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)
IEEE Hands-free Speech Communication and Microphone Arrays (HSCMA)
ISCA Interspeech
Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)
EURASIP European Signal Processing Conference (EUSIPCO)
Past: NIPS, NIPS End-to-end Learning for Speech and Audio Processing Workshop, REVERB Workshop, ACM International Conference on Multimodal Interaction (ICMI), ACM Multimedia Conference (ACM-MM), ACM SIGGRAPH, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), IEEE Global Conference on Signal and Information Processing (GlobalSIP), IEEE International Symposium on Circuits and Systems (ISCAS), ISCA Workshop on Statistical And Perceptual Audition (SAPA), International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), International Conference on Digital Audio Effects (DAFx)

Member

Institute of Electrical and Electronic Engineers, IEEE: Fellow (Class of 2024)
IEEE Signal Processing Society Audio and Acoustic Signal Processing Technical Committee (IEEE AASP TC): Member (2014.1 - 2019.12), Vice Chair (2026.1 - 2026.12)

Brief Curriculum Vitae

2023.04-Present	Distinguished Research Scientist, Mitsubishi Electric Research Labs (MERL), Cambridge, MA, USA.
2020.04-Present	Speech & Audio Senior Team Leader, Mitsubishi Electric Research Labs (MERL), Cambridge, MA, USA.
2018.01-2020.03	Speech & Audio Team Leader, Mitsubishi Electric Research Labs (MERL), Cambridge, MA, USA.
2017.10-2023.03	Senior Principal Research Scientist, Mitsubishi Electric Research Labs (MERL), Cambridge, MA, USA.
2014.04-2017.09	Principal Research Scientist, Mitsubishi Electric Research Labs (MERL), Cambridge, MA, USA.
2011.04-2014.03	Research Scientist, Mitsubishi Electric Research Labs (MERL), Cambridge, MA, USA.
2009.04-2011.03	Research Associate at NTT CS Labs, Media Information Laboratory, Recognition Research Group, Atsugi, Japan.
2009.03	Ph.D. in Information Science and Technology obtained from the University of Tokyo.
	Thesis supervisor: Prof. Shigeki Sagayama.
2009.03	Ph.D. in Computer Science, Telecommunications and Electronics obtained from the Université Paris VI.
	Thesis supervisor: Dr. Alain de Cheveigné.
2006.08-2006.11	Internship at Microsoft Reseach Asia in Beijing, China. Supervisor: Dr. Frank Soong.
2005.04-2009.03	Japanese Government Monbukagakusho Research Student Scholarship.
2004.11-2005.03	Internship at NTT CS Laboratories in Kyoto. Supervisor: Dr. Erik McDermott.
2003.08-2004.08	Exchange student, the Graduate School of Mathematical Sciences, the University of Tokyo. Supervisor: Prof. Hiroshi Matano.
2003.07	M.Sc. (Magna Cum Laude) in Stochastic Processes, Université Paris VI. Supervisor: Prof. Cédric Villani.
2001.08-2002.08	Internship at the Sino-French joint Laboratory LIAMA, Beijing, China. Supervisor: Dr. Philippe de Reffye.
	Chinese Profiency Test (HSK), level 8 out of 11 (Intermediate A)
2001.07	M.Sc. (Summa Cum Laude, Rank 1) in Partial Differential Equations, Université Paris XI. Supervisor: Prof. Patrick Gérard.
2000.06	B.Sc. (Summa Cum Laude) in Mathematics, Université Paris VII.
1999.09-2003.06	Student in the Department of Mathematics, École Normale Supérieure (ENS), Paris.
Summer 1999	Admission at the École Polytechnique (National Rank 1)
	Admission at the École Normale Supérieure (ENS) of Paris (National Rank 13)

1997.09-1999.06	Preparatory Classes (Mathématiques supérieures and Mathématiques spéciales)
	to the French "Grandes Ecoles" in Mathematics and Physics , Lycée Henri IV, Paris.
1997.06	High-School Graduation, Summa Cum Laude.

Personal homepages, misc.

Chokotto - ちょこっと, my not-really-current blog, updated once in a usually long while.
De l'origine du monde - 手酌, my previous blog, which was written in both Japanese and French.
My band The Empty Bed's Myspace page (some songs up; yes, I'm the one singing), and our Official Homepage.
My former homepage at the École Normale Supérieure, in French.
Sagayama/Ono Laboratory, Graduate School of Information Science and Technology, the University of Tokyo.
Some Latex tips I compiled while writing my PhD thesis (Japanese version also available)
A very useful Manual of the BibTex-Mode for GNU Emacs: it appears not to be online anywhere else anymore, so I decided to make it available here. (Note: I am not the author of this file)