- Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, Jonathan Le Roux, "TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement," in Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), Sep. 2024. [pdf] [Github] Best Student Paper Award Finalist
- Jie Yin, Andrew Luo, Yilun Du, Anoop Cherian, Tim K. Marks, Jonathan Le Roux, Chuang Gan, "Disentangled Acoustic Fields For Multimodal Physical Scene Understanding," in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2024. [pdf]
- Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, Jonathan Le Roux, "Enhanced Reverberation as Supervision for Unsupervised Speech Separation," in Proc. ISCA Interspeech, Sep. 2024. [pdf] [Github]
- Janek Ebber, François G. Germain, Gordon Wichern, Jonathan Le Roux, ""Sound Event Bounding Boxes," in Proc. ISCA Interspeech, Sep. 2024. [pdf] [Github]
- Sameer Khurana, Chiori Hori, Antoine Laurent, Gordon Wichern, Jonathan Le Roux, "ZeroST: Zero-Shot Speech Translation," in Proc. ISCA Interspeech, Sep. 2024. [pdf]
- Zexu Pan, Gordon Wichern, François G. Germain, Kohei Saijo, Jonathan Le Roux, "PARIS: Pseudo-AutoRegressIve Siamese Training for Online Speech Separation," in Proc. ISCA Interspeech, Sep. 2024. [pdf]
- Louis Bahrman, Mathieu Fontaine, Jonathan Le Roux, Gaël Richard, ""Speech Dereverberation Constrained on Room Impulse Response Characteristics," in Proc. ISCA Interspeech, Sep. 2024. [pdf]
- Motonari Kambara, Chiori Hori, Komei Sugiura, Kei Ota, Devesh K. Jha, Sameer Khurana, Siddarth Jain, Radu Corcodel, Diego Romeres, Jonathan Le Roux, "Human Action Understanding-based Robot Planning using Multimodal LLM," in Proc. IEEE International Conference on Robotics and Automation (ICRA), Jun. 2024. [pdf]
- Chang-Bin Jeon, Gordon Wichern, François G. Germain, Jonathan Le Roux, "Why does music source separation benefit from cacophony?," in Proc. IEEE ICASSP Satellite Workshop on Explainable Machine Learning for Speech and Audio (XAI-SA), IEEEXplore Track, Apr. 2024. [pdf]
- Junghyun Koo, Gordon Wichern, François G. Germain, Sameer Khurana, Jonathan Le Roux, "Understanding and Controlling Generative Music Transformers by Probing Individual Attention Heads," in Proc. IEEE ICASSP Satellite Workshop on Explainable Machine Learning for Speech and Audio (XAI-SA), Workshop Track, Apr. 2024. [pdf]
- Zexu Pan, Gordon Wichern, François G. Germain, Aswin Subramanian, Jonathan Le Roux, "Late Audio-Visual Fusion for In-The-Wild Speaker Diarization," in Hands-free Speech Communication and Microphone Arrays (HSCMA), Apr. 2024. [pdf]
- Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François G. Germain, Jonathan Le Roux, Shinji Watanabe, "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf] Winner of the DCASE 2023 Task 6a Automatic Audio Captioning Challenge
- Yoshiki Masuyama, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux, "NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf] [Github]
- Zexu Pan, Gordon Wichern, François G. Germain, Sameer Khurana, Jonathan Le Roux, "NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf]
- Dimitrios Bralios, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux, "Generation or Replication: Auscultating Audio Latent Diffusion Models," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf]
- Chiori Hori, Pu Wang, Mahbub Rahman, Cristian Vaca-Rubio, Sameer Khurana, Anoop Cherian, Jonathan Le Roux, "Wi-Fi based Indoor Monitoring Enhanced by Multimodal Fusion," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf]
- Teysir Baoueb, Haocheng Liu, Mathieu Fontaine, Jonathan Le Roux, Gaël Richard, "SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf]
- Haocheng Liu, Teysir Baoueb, Mathieu Fontaine, Jonathan Le Roux, Gaël Richard, "GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2024. [pdf]
- Zexu Pan, Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux, "Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2023. [pdf] (Winner of the 2nd COG-MHEAR Audio-Visual Speech Enhancement (AVSE) Challenge)
- Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François G Germain, Jonathan Le Roux, Shinji Watanabe, "On the Use of Pretrained Deep Audio Encoders for Automated Audio Captioning Tasks," in Proc. International Symposium on Future Active Safety Technology toward zero traffic accidents (FAST-zero), Nov. 2023. [pdf]
- Francois Germain, Gordon Wichern, Jonathan Le Roux, "Hyperbolic Unsupervised Anomalous Sound Detection," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2023. [pdf]
- Ricardo Falcon Perez, Gordon Wichern, Francois Germain, Jonathan Le Roux, "Location as Supervision for Weakly Supervised Multi-Channel Source Separation of Machine Sounds," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2023. [pdf]
- Chiori Hori, Puyuang Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux, "Style-transfer based Spoken Language understanding for Robot Action Sequence Acquisition from Videos," in Proc. ISCA Interspeech, Aug. 2023. [pdf]
- Ke Chen, Gordon Wichern, Francois Germain, Jonathan Le Roux, "Pac-HuBERT: Self-Supervised Music Source Separation via Hidden-Unit BERT and Primitive Auditory Features," in Proc. IEEE ICASSP Satellite Workshop on Self-supervision in Audio, Speech and Beyond (SASB), Jun. 2023. [arXiv]
- Darius Petermann, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux, "Hyperbolic Audio Source Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023. [arXiv] [Github] (Best Student Paper Award)
- Rohith Aralikatti, Christoph Boeddeker, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux, "Reverberation as Supervision for Speech Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023. [arXiv]
- Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux, "Optimal Condition Training for Target Source Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023. [arXiv]
- Dimitrios Bralios, Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux, "Latent Iterative Refinement for Modular Source Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023. [arXiv]
- Hao Yen, François G. Germain, Gordon Wichern, Jonathan Le Roux, "Cold Diffusion for Speech Enhancement," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023. [arXiv].
- Satvik Venkatesh, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux, "Improved Domain Generalization via Disentangled Multi-Task Learning in Unsupervised Anomalous Sound Detection," in Proc. DCASE Workshop, Nov. 2022. [pdf]
- Efthymios Tzinis, Gordon Wichern, Aswin Subramanian, Paris Smaragdis, Jonathan Le Roux, "Heterogeneous Target Speech Separation," in Proc. ISCA Interspeech, Sep. 2022. [pdf]
- Chiori Hori, Takaaki Hori, Jonathan Le Roux, "Low-Latency Streaming Scene-aware Interaction Using Audio-Visual Transformers," in Proc. ISCA Interspeech, Sep. 2022. [pdf]
- Darius Petermann, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux, "The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. [arXiv] [Demo] [Github] [Divide and Remaster (DnR) dataset]
- Olga Slizovskaia, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux, "Locate This, Not That: Class-Conditioned Sound Event DOA Estimation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. [arXiv]
- Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, "Sequence Transduction with Graph-based Supervision," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. [arXiv].
- Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, "Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. [arXiv].
- Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, "Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. [arXiv].
- Ankit Parag Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori, "Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. [pdf]
- Chiori Hori, Ankit Parag Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Jonathan Le Roux, Tim K. Marks, "Overview of Audio Visual Scene-Aware Dialog with Reasoning Track for Natural Language Generation in DSTC10," in Proc. 10th Dialog System Technology Challenge Workshop at AAAI (DSTC10), Feb. 2022. [pdf]
- Ankit Parag Shah, Takaaki Hori, Jonathan Le Roux, Chiori Hori, "DSTC10-AVSD Submission System with Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning," in Proc. 10th Dialog System Technology Challenge Workshop at AAAI (DSTC10), Feb. 2022. [pdf]
- Anoop Cherian, Chiori Hori, Tim K. Marks, Jonathan Le Roux, "(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering," in Proc. AAAI Conference on Artificial Intelligence (AAAI), Feb. 2022. [pdf]
- Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux, "Convolutive Prediction for Reverberant Speech Separation," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2021. [arXiv]
- Gordon Wichern, Ankush Chakrabarty, Zhong-Qiu Wang, Jonathan Le Roux, "Anomalous sound detection using attentive neural processes," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2021. [pdf]
- Moitreya Chatterjee, Jonathan Le Roux, Narendra Ahuja, Anoop Cherian, "Visual Scene Graphs for Audio Source Separation," in Proc. IEEE International Conference on Computer Vision (ICCV), Oct. 2021. [arXiv]
- Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition," in Proc. ISCA Interspeech, Sep. 2021. [arXiv]
- Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux, "Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers," in Proc. ISCA Interspeech, Sep. 2021. [arXiv]
- Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, "Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition," in Proc. ISCA Interspeech, Sep. 2021. [arXiv]
- Chiori Hori, Takaaki Hori, Jonathan Le Roux, "Optimizing Latency for Online Video Captioning Using Audio-Visual Transformers," in Proc. ISCA Interspeech, Sep. 2021. [arXiv]
- Yun-Ning Hung, Gordon Wichern, Jonathan Le Roux, "Transcription Is All You Need: Learning to Separate Musical Mixtures with Score as Supervision," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021. [arXiv]
- Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021. [arXiv]
- Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Capturing Multi-Resolution Context by Dilated Self-Attention," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021. [arXiv]
- Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Semi-Supervised Speech Recognition via Graph-based Temporal Classification," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021. [arXiv]
- Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian, "Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers," in Proc. AAAI Conference on Artificial Intelligence (AAAI), Feb. 2021. [arXiv]
- Niko Moritz, Gordon Wichern, Takaaki Hori, Jonathan Le Roux, "All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection," in Proc. ISCA Interspeech, Oct. 2020. [.pdf] [.bib]
- Takaaki Hori, Niko Moritz,Chiori Hori, Jonathan Le Roux, "Transformer-based Long-context End-to-end Speech Recognition," in Proc. ISCA Interspeech, Oct. 2020. [.pdf] [.bib]
- Tejas Jayashankar, Jonathan Le Roux, Pierre Moulin, "Detecting Audio Attacks on ASR Systems with Dropout Uncertainty," in Proc. ISCA Interspeech, Oct. 2020. [arXiv] [.pdf] [.bib]
- Ethan Manilow, Gordon Wichern, Jonathan Le Roux, "Hierarchical Musical Source Separation," in Proc. International Society for Music Information Retrieval (ISMIR) Conference, Oct. 2020. [Paper, Poster, Video] [.pdf] [.bib] (Best Poster Award, Best Video Award)
- Prem Seetharaman, Gordon Wichern, Bryan Pardo, Jonathan Le Roux, "AutoClip: Adaptive Gradient Clipping for Source Separation Networks," in Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Sep. 2020. [arXiv] [.pdf] [.bib]
- Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo, "Bootstrapping unsupervised deep music separation from primitive auditory grouping principles," in ICML Workshop on Self-Supervision in Audio and Speech (SAS), Jul. 2020. [arXiv] [.pdf] [.bib]
- Fatemeh Pishdadian, Gordon Wichern, Jonathan Le Roux, "Learning to Separate Sounds From Weakly-Labeled Scenes," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020. [.pdf] [.bib]
- Matthew Maciejewski, Gordon Wichern, Emmett McQuinn, Jonathan Le Roux, "WHAMR!: Noisy and Reverberant Single-Channel Speech Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020. [arXiv] [.pdf] [.bib]
- Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe, "End-to-End Multi-speaker Speech Recognition with Transformer," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020. [arXiv] [.pdf] [.bib]
- Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Streaming Automatic Speech Recognition with the Transformer Model," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020. [arXiv] [.pdf] [.bib]
- Leda Sarı, Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-To-End ASR," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020. [.pdf] [.bib]
- Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Streaming End-to-End Speech Recognition with Joint CTC-Attention Based Models," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2019. [.pdf] [.bib]
- Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe, "MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2019. [.pdf] [.bib] (Best Paper Award)
- Ethan Manilow, Gordon Wichern, Prem Seetharaman, Jonathan Le Roux, "Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2019. [.pdf] [.bib]
- Ilya Kavalerov, Scott Wisdom, Hakan Erdogan, Brian Patton, Kevin Wilson, Jonathan Le Roux, John R. Hershey, "Universal Sound Separation," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2019. [.pdf] [.bib]
- Gordon Wichern, Emmett McQuinn, Joe Antognini, Michael Flynn, Richard Zhu, Dwight Crow, Ethan Manilow, Jonathan Le Roux, "WHAM!: Extending Speech Separation to Noisy Environments," in Proc. ISCA Interspeech, Sep. 2019. [.pdf] [.bib]
- Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition," in Proc. ISCA Interspeech, Sep. 2019. [.pdf] [.bib]
- Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Niko Moritz, Jonathan Le Roux, "Vectorized Beam Search for CTC-Attention-based Speech Recognition," in Proc. ISCA Interspeech, Sep. 2019. [.pdf] [.bib]
- Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "End-to-End Multi-Lingual Multi-Speaker Speech Recognition," in Proc. ISCA Interspeech, Sep. 2019. [.pdf] [.bib]
- Niko Moritz, Takaaki Hori, Jonathan Le Roux, "Triggered Attention for End-to-End Speech Recognition," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib]
- Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy Sarroff, John R. Hershey, "The Phasebook: Building complex masks via discrete representations for source separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib]
- Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, John R. Hershey, "SDR — half-baked or well done?," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib] [arXiv] [Code to reproduce the figures]
- Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo, "Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib] [arXiv]
- Prem Seetharaman, Gordon Wichern, Shrikant Venkataramani, Jonathan Le Roux, "Class-conditional embeddings for music source separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib] [arXiv]
- Ryo Aihara, Toshiyuki Hanazawa, Yohei Okato, Gordon Wichern, Jonathan Le Roux, "Teacher-student deep clustering for low-delay single channel speech separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib]
- Takaaki Hori, Ramon Astudillo, Tomoki Hayashi, Yu Zhang, Shinji Watanabe, Jonathan Le Roux, "Cycle-consistency training for end-to-end speech recognition," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019. [.pdf] [.bib] [arXiv]
- Gordon Wichern, Jonathan Le Roux, "Phase reconstruction with learned time-frequency representations for single-channel speech separation," in Proc. IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), Sep. 2018. [.pdf] [.bib]
- Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang, John R. Hershey, "End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction," in Proc. ISCA Interspeech, Sep. 2018. [.pdf] [.bib]
- Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey, "A Purely End-to-end System for Multi-speaker Speech Recognition," in Proc. Annual Meeting of the Association for Computational Linguistics (ACL), Jul. 2018. [.pdf] [.bib]
- Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey, "An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2018. [.pdf] [.bib]
- Shane Settle, Jonathan Le Roux, Takaaki Hori, Shinji Watanabe, John R. Hershey, "End-to-End Multi-Speaker Speech Recognition," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2018. [.pdf] [.bib]
- Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey, "Alternative Objective Functions for Deep Clustering," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2018. [.pdf] [.bib]
- Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey, "Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2018. [.pdf] [.bib] (Best Student Paper Award)
- Paul Magron, Jonathan Le Roux, Tuomas Virtanen, "Consistent anisotropic Wiener filtering for audio source separation," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2017. [.pdf] [.bib]
- Yuuki Tachioka, Tomohiro Narita, Iori Miura, Takanobu Uramoto, Natsuki Monta, Shingo Uenohara, Kenichi Furuya, Shinji Watanabe, Jonathan Le Roux, "Coupled initialization of multi-channel non-negative matrix factorization based on spatial and spectral information," in Proc. ISCA Interspeech, Aug. 2017. [.pdf] [.bib]
- Yi Luo, Zhuo Chen, John R. Hershey, Jonathan Le Roux, Nima Mesgarani, "Deep Clustering and Conventional Networks for Music Separation: Stronger Together," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Mar. 2017. [.pdf] [.bib] [arXiv] [demo] (Ranked 1st out of 5 teams at the MIREX 2016 Singing Voice Separation Task)
- Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda, "BLSTM-HMM Hybrid System Combined with Sound Activity Detection Network for Polyphonic Sound Event Detection," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Mar. 2017. [.pdf] [.bib]
- Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey, "Student-Teacher Network Learning with Enhanced Features," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Mar. 2017. [.pdf] [.bib]
- Takaaki Hori, Hai Wang, Chiori Hori, Shinji Watanabe, Bret A. Harsham, Jonathan Le Roux, John R. Hershey, Yusuke Koji, Yi Jing, Zhaocheng Zhu, Takeyuki Aikawa, "Dialog State Tracking with Attention-Based Sequence-to-Sequence Learning," in Proc. IEEE Workshop on Spoken Language Technology (SLT), Dec. 2016. [.pdf] [.bib] (Ranked 2nd out of 9 submissions at DSTC5 Dialog State Tracking Challenge)
- Scott Wisdom, Thomas Powers, John R. Hershey, Jonathan Le Roux, Les Atlas, "Full-Capacity Unitary Recurrent Neural Networks," in Proc. Advances in Neural Information Processing Systems (NIPS), Dec. 2016. [.pdf] [.bib] [Spotlight video] [Code]
- Hakan Erdogan, Tomoki Hayashi, John R. Hershey, Takaaki Hori, Chiori Hori, Wei-Ning Hsu, Suyoun Kim, Jonathan Le Roux, Zhong Meng, Shinji Watanabe, "Multi-Channel Speech Recognition: LSTMs All the Way Through," in Proc. CHiME Workshop, pp. 45--48, Sep. 2016. [.pdf] [.bib] (Ranked 3rd out of 15 submissions at the 4th CHiME Speech Separation and Recognition Challenge, 6-channel track)
- Yusuf Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey, "Single-Channel Multi-Speaker Separation using Deep Clustering," in Proc. ISCA Interspeech (Interspeech 2016), Sep. 2016. [.pdf] [.bib] [Demos and code to generate the dataset]
- Hakan Erdogan, John R. Hershey, Shinji Watanabe, Michael Mandel, Jonathan Le Roux, "Improved MVDR beamforming using single-channel mask prediction networks," in Proc. ISCA Interspeech (Interspeech 2016), Sep. 2016. [.pdf] [.bib]
- Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda, "Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection," in Proc. DCASE2016 Workshop (DCASE2016), Sep. 2016. [.pdf] [.bib] (Ranked 3rd out of 9 submissions at the DCASE 2016 Challenge Task 2)
- John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe, "Deep Clustering: Discriminative Embeddings for Segmentation and Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Mar. 2016. [.pdf] [.bib]
- Scott Wisdom, John R. Hershey, Jonathan Le Roux, Shinji Watanabe, "Deep Unfolding for Multichannel Source Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Mar. 2016. [.pdf] [.bib]
- Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe, "The MERL/SRI system for the 3rd CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015), Dec. 2015. [.pdf] [.bib] (Ranked 2nd out of 25 submissions at the 3rd CHiME Speech Separation and Recognition Challenge)
- Bret Harsham, Shinji Watanabe, Alan Esenther, John R. Hershey, Jonathan Le Roux, Yi Luan, Daniel Nikovski, Vamsi Potluru, "Driver prediction to improve interaction with in-vehicle HMI," in Proc. Workshop on Digital Signal Processing for In-Vehicle Systems (DSP 2015), Oct. 2015. [.pdf] [.bib]
- Felix Weninger, Hakan Erdogan, Shinji Watanabe, Emmanuel Vincent, Jonathan Le Roux, John R. Hershey, Björn Schuller, "Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR," in Proc. International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2015), Aug. 2015. [.pdf] [.bib]
- Jonathan Le Roux, John R. Hershey, Felix Weninger, "Deep NMF for Speech Separation," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), Apr. 2015. [.pdf] [.bib]
- Jonathan Le Roux, Emmanuel Vincent, John R. Hershey, Daniel P. W. Ellis, "MICbots: collecting large realistic datasets for speech and audio research using mobile robots," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), Apr. 2015. [.pdf] [.bib] [Slides] [Project page]
- Hakan Erdogan, John R. Hershey, Shinji Watanabe, Jonathan Le Roux, "Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), Apr. 2015. [.pdf] [.bib]
- Felix Weninger, Jonathan Le Roux, John R. Hershey, Björn Schuller, "Discriminatively Trained Recurrent Neural Networks for Single-Channel Speech Separation," in Proc. IEEE GlobalSIP Symposium on Machine Learning Applications in Speech Processing (GlobalSIP MLASP 2014), Dec. 2014. [.pdf] [.bib]
- Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "Sequence Discriminative Training for Low-Rank Deep Neural Networks," in Proc. IEEE GlobalSIP Symposium on Machine Learning Applications in Speech Processing (GlobalSIP MLASP 2014), Dec. 2014. [.pdf] [.bib]
- Felix Weninger, Jonathan Le Roux, John R. Hershey, Shinji Watanabe, "Discriminative NMF and its application to single-channel source separation," Proc. ISCA Interspeech, Sep. 2014. [.pdf] [.bib]
- Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "Sequential Maximum Mutual Information Linear Discriminant Analysis for Speech Recognition," in Proc. ISCA Interspeech, Sep. 2014. [.pdf] [.bib]
- Yuuki Tachioka, Tomohiro Narita, Shinji Watanabe, Jonathan Le Roux, "Ensemble integration of calibrated speaker localization and statistical speech detection in domestic environments," in Proc. IEEE Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2014), May 2014. [.pdf] [.bib]
- Felix Weninger, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, Yuuki Tachioka, Jürgen Geiger, Björn Schuller, Gerhard Rigoll, "The MERL/MELCO/TUM system for the REVERB Challenge using Deep Recurrent Neural Network Feature Enhancement," Proc. REVERB workshop (REVERB 2014), May 2014. [.pdf] [.bib]
- Shinji Watanabe, Jonathan Le Roux, "A Study on Black Box Optimization for Automatic Speech Recognition," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2014. [.pdf] [.bib]
- Umut Şimşekli, Jonathan Le Roux, John R. Hershey, "Non-Negative Source-Filter Dynamical System for Speech Enhancement," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2014. [.pdf] [.bib] (Student travel grant awarded to Umut)
- Vamsi K. Potluru, Jonathan Le Roux, Barak A. Pearlmutter, John R. Hershey, Matthew E. Brand, "Pairwise Coordinate Descent Algorithm for NMF," in Proc. NIPS Workshop on Greedy Algorithms, Frank-Wolfe and Friends - A modern perspective, Dec. 2013. [Website], [.pdf], [.bib]
- Emmanuel Vincent, Jon Barker, Shinji Watanabe, Jonathan Le Roux, Francesco Nesta, Marco Matassoni, "The Second 'CHiME' Speech Separation and Recognition Challenge: An Overview of Challenge Systems and Outcomes," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2013), Dec. 2013. [.pdf], [.bib]
- Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "A Generalized Discriminative Training Framework for System Combination," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2013), Dec. 2013. [.pdf], [.bib]
- Jonathan Le Roux, Shinji Watanabe, John R. Hershey, "Ensemble Learning for Speech Enhancement," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2013), Oct. 2013. [.pdf], [.bib]
- Umut Şimşekli, Jonathan Le Roux, John R. Hershey, "Hierarchical and Coupled Non-Negative Dynamical Systems with Application to Audio Modeling," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2013), Oct. 2013. [.pdf], [.bib] (Student travel grant awarded to Umut)
- Koichiro Yoshino, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "Statistical Dialogue Management using Intention Dependency Graph," in Proc. International Joint Conference on Natural Language Processing (IJCNLP 2013), Oct. 2013. [.pdf], [.bib]
- Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey, "Discriminative methods for noise robust speech recognition: A CHiME Challenge benchmark," in Proc. International Workshop on Machine Listening in Multisource Environments (CHiME 2013), Jun. 2013. [.pdf], [.bib]
- Jonathan Le Roux, Petros T. Boufounos, Kang Kang, John R. Hershey, "Source Localization in Reverberant Environments Using Sparse Optimization," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), May 2013. [.pdf], [.bib]
- Cédric Févotte, Jonathan Le Roux, John R. Hershey, "Non-Negative Dynamical System With Application to Speech and Audio," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), May 2013. [.pdf], [.bib]
- Emmanuel Vincent, Jon Barker, Shinji Watanabe, Jonathan Le Roux, Francesco Nesta, Marco Matassoni, "The Second CHiME Speech Separation and Recognition Challenge: Datasets, Tasks and Baselines," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), May 2013. [.pdf], [.bib]
- Vamsi K. Potluru, Sergey M. Plis, Jonathan Le Roux, Barak A. Pearlmutter, Vince D. Calhoun, Thomas P. Hayes, "Block Coordinaate Descent for Sparse NMF," in Proc. International Conference on Learning Representations (ICLR 2013), May 2013. [arXiv], [.bib]
- Creighton K. Heaukulani, Jonathan Le Roux and John R. Hershey, "Latent Dirichlet Reallocation for Term Swapping," in Proc. IEEE International Workshop on Statistical Machine Learning for Speech Processing (IWSML 2012), Mar. 2012. [.pdf], [.bib]
- Jonathan Le Roux and John R. Hershey, "Indirect Model-Based Speech Enhancement," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp. 40454048, Mar. 2012. [.pdf], [.bib]
- Masahiro Nakano, Jonathan Le Roux, Hirokazu Kameoka, Tomohiro Nakamura, Nobutaka Ono and Shigeki Sagayama, "Bayesian Nonparametric Spectrogram Modeling Based on Infinite Factorial Infinite Hidden Markov Model," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2011), Oct. 2011. [.pdf], [.bib]
- Masahiro Nakano, Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Infinite-State Spectrum Model for Music Signal Analysis," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), pp. 19721975, May 2011. [.pdf], [.bib]
- Jonathan Le Roux, Emmanuel Vincent, Yuu Mizuno, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Consistent Wiener Filtering: Generalized Time-Frequency Masking Respecting Spectrogram Consistency," in Proc. 9th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2010), Lecture Notes in Computer Science, Vol. 6365/2010, pp. 8996, Springer, Sep. 2010. [.pdf], [.bib]
- Hirokazu Kameoka, Takuya Yoshioka, Mariko Hamamura, Jonathan Le Roux and Kunio Kashino, "Statistical Model of Speech Signals Based on Composite Autoregressive System with Application to Blind Source Separation," in Proc. 9th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2010), pp. 245253, Sep. 2010. [.pdf], [.bib]
- Masahiro Nakano, Jonathan Le Roux, Hirokazu Kameoka, Yu Kitano, Nobutaka Ono and Shigeki Sagayama, "Nonnegative Matrix Factorization with Markov-chained Bases for Modeling Time-varying patterns in Music Spectrograms," in Proc. 9th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2010), pp. 149156, Sep. 2010. [.pdf], [.bib]
- Hirokazu Kameoka, Jonathan Le Roux and Yasunori Ohishi, "A Statistical Model of Speech F0 Contours," in Proc. ISCA Tutorial and Research Workshop on Statistical And Perceptual Audition (SAPA 2010), pp. 4348, Sep. 2010. [.pdf], [.bib]
- Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono and Shigeki Sagayama, "Fast Signal Reconstruction from Magnitude STFT Spectrogram Based on Spectrogram Consistency," in Proc. 13th International Conference on Digital Audio Effects (DAFx-10), pp. 397403, Sep. 2010. [.pdf], [.bib]
- Masahiro Nakano, Hirokazu Kameoka, Jonathan Le Roux, Yu Kitano, Nobutaka Ono, Shigeki Sagayama, "Convergence-Guaranteed Multiplicative Algorithms for Non-Negative Matrix Factorization with Beta-Divergence," in Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2010), pp. 283288, Aug. 2010. [.pdf], [.bib]
- Jonathan Le Roux, Alain de Cheveigné and Lucas C. Parra, "Adaptive Template Matching with Shift-Invariant Semi-NMF," in Advances in Neural Information Processing Systems 21 (Proc. NIPS 2008), D. Koller, Y. Bengio, D. Schuurmans, and L. Bottou, Eds. Cambridge, MA: The MIT Press, 2009. (Presented in Dec. 2008) [.pdf], [.bib] [source code]
- Jonathan Le Roux, Nobutaka Ono and Shigeki Sagayama, "Explicit Consistency Constraints for STFT Spectrograms and Their Application to Phase Reconstruction," in Proc. Workshop on Statistical and Perceptual Audition (SAPA 2008), pp. 2328, Sep. 2008. [.pdf], [.bib]
- Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Alain de Cheveigné and Shigeki Sagayama, "Computational Auditory Induction by Missing-Data Non-Negative Matrix Factorization," in Proc. Workshop on Statistical and Perceptual Audition (SAPA 2008), pp. 16, Sep. 2008. [.pdf], [.bib]
- Nobutaka Ono, Ken-ichi Miyamoto, Jonathan Le Roux, Hirokazu Kameoka and Shigeki Sagayama, "Separation of a Monaural Audio Signal into Harmonic/Percussive Components by Complementary Diffusion on Spectrogram," in Proc. European Signal Processing Conference (EUSIPCO), Aug. 2008. [.pdf], [.bib]
- Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Shigeki Sagayama and Alain de Cheveigné, "Modulation Analysis of Speech Through Orthogonal FIR Filterbank Optimization," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), pp. 41894192, Apr. 2008. [.pdf], [.bib]
- Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Alain de Cheveigné and Shigeki Sagayama, "Single Channel Speech and Background Segregation through Harmonic-Temporal Clustering," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 279282, Oct. 2007. [.pdf] [.bib]
- Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Alain de Cheveigné and Shigeki Sagayama, "Harmonic-Temporal Clustering of Speech for Single and Multiple F0 Contour Estimation in Noisy Environments," In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), Vol. IV, pp. 10531056, Apr. 2007. [.pdf] [.bib]
(Part of the expenses for the presentation of this work at ICASSP 2007 was supported by a grant from The Telecommunications Advancement Foundation, TAF, Japan)
- Alain de Cheveigné, Jonathan Le Roux and Jonathan Z. Simon, "MEG Signal Denoising based on Time-Shift PCA," In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), Vol. I, pp. 317320, Apr. 2007. [.pdf] [.bib]
- Hirokazu Kameoka, Jonathan Le Roux, Nobutaka Ono and Shigeki Sagayama, "Speech Analyzer Using a Joint Estimation Model of Spectral Envelope and Fine Structure," In Proc. Interspeech International Conference on Spoken Language Processing (ICSLP2006), pp. 25022505, Sep. 2006. [.pdf] [.bib]
- Jonathan Le Roux and Erik McDermott, "Optimization Methods for Discriminative Training," in Proc. Eurospeech 2005, pp. 33413344, Sep. 2005. [.pdf] [.bib]
|