MERL's Speech and Audio team had 5 papers presented in Kos, Greece at Interspeech 2024.
MERL's team, led by Yoshiki Masuyama, ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions".
MERL's Speech and Audio team had 10 papers presented in Seoul, Korea during the ICASSP 2024 week, 7 in the main conference, 2 in XAI-SA workshop, and 1 at HSCMA 2024, on audio captioning, speech and music separation, diarization, audio generative models, spatial audio reproduction, and multimodal indoor monitoring.
SANE 2023 was held on October 26 at NYU in Brooklyn, co-organized with Juan Bello and John Hershey.
The Special Session on Multi-talker methods in Speech Processing, co-organized by Peter Bell, Michael Akeroyd, Jon Barker, Marc Delcroix, Liang Lu, myself, Jinyu Li, Cassia Valentini, and DeLiang Wang, was accepted at Interspeech 2023, with 16 papers.
Darius Petermann's paper "Hyperbolic Audio Source Separation" won a Best Student Paper Award at ICASSP 2023. The code for the algorithm and the demo interface is on Github: Hyper-Unmix.
SANE was back in 2022 after a 3-year hiatus. Videos of the invited talks are available on the website.
I had the pleasure of talking with Sam Charrington for the TWIML AI podcast about MERL's work towards solving the cocktail party problem. Check out the interview (available on Youtube and major podcast providers) at this link.
We released Divide and Remaster (DnR), a source separation dataset for training and testing algorithms that separate a monaural audio signal into speech, music, and sound effects/background stems, i.e., solving the cocktail fork problem. More details and a video on the cocktail fork website.
As a Last Christmas present in the WHAM! series, we have decided to Make It Big and release WHAM!48kHz, a high-fidelity version of the ambient background noise recordings originally used for the WSJ0 Hipster Ambient Mixtures (WHAM!) dataset. 78 hours of raw binaural recordings (48 kHz/24-bit), collected by our definitely not Careless Whisper.ai collaborators at various urban locations (restaurants, cafes, bars, parks) throughout the Bay Area.
Our paper "Hierarchical Musical Source Separation" won the Best Poster Award and the Best Video Award at the 2020 International Society for Music Information Retrieval Conference (ISMIR 2020), held October 11-14 in virtual Montreal, Canada. Check out the paper page on the ISMIR website.
SANE 2017 was held on October 19, 2017, at Google, NY, with a new record audience of 180 people. Videos and slides of the SANE 2017 talks (as well as those from previous years) are now available. There is even a convenient Monday morning binge-watching YouTube playlist.
SANE 2016 was held on October 21, 2016, at MIT. Slides of all talks are now available from the SANE website.
Emmanuel Vincent (Inria, France), Hakan Erdogan (Sabanci University, MSR), and myself presented a tutorial on "Learning-based Approaches to Speech Enhancement And Separation" at Interspeech 2016 in San Francisco, CA. The room was full, with over 100 attendees. The slides are available here.
SANE 2015 was held on October 22, 2015, at Google, NYC, with a record audience of 130 people. Thanks to Hank Liao's hard work, the videos and slides of the SANE 2015 talks are now available. Here is a convenient Saturday night binge-watching YouTube playlist.
The Slides for my ICASSP 2015 talk on MICbots are now available for download. I put together a project page that explains the concept and the actual construction of our MICbots. It features a Youtube video showing them moving on the tune of dark suits and greasy wash water. From there, you can also download PyRobot 2, a Python wrapper for the iRobot Create 2 Open Interface; it is derived from Damon Kohler's PyRobot, that handled the first version of the interface.
Thanks to the many participants and the speakers of SANE 2014 for a very exciting day. The slides for all talks as well as some photos are now available through the sane-news group (slides for previous SANE workshops also available). More info on SANE 2014.
Emmanuel Vincent (INRIA) and I gathered a comprehensive list of datasets for robust speech processing research, with detailed attributes and links to software baselines and evaluation results. It is available as a Technical Report as well as a wiki page under the ROSP wiki.
The code for our IEEE Signal Processing Letters article "Consistent Wiener Filtering for Audio Source Separation" is now available (research-only license).
I added a page with non-research software where I share pieces of software that I modified to suit my needs and that others may found useful. In particular, I have taken over development of IguanaTex, a free add-in to include LaTeX displays in PowerPoint. It is a good alternative to TexPoint.
We created a Google group, "Speech and Audio in the Northeast", to gather researchers and students in speech and audio from the northeast of the American continent. Anyone is welcome to join.