ICASSP 2015 in Brisbane


I’m flying tomorrow from Tokyo to Brisbane to attend the ICASSP 2015 conference. Who would have guessed I’d be back to Brisbane and its conference center 7 years after Interspeech 2008… if I’d had to choose a conference location to be repeated, I’d probably have gone with Honolulu, but anyway.

I’ll be chairing a special session Wednesday morning on “Audio for Robots – Robots for Audio“, that I am co-organizing with Emmanuel Vincent (INRIA) and Walter Kellerman (Friedrich-Alexander-Universität Erlangen-Nürnberg). I will also present the following two papers:

  • MICbots: collecting large realistic datasets for speech and audio research using mobile robots,” with Emmanuel Vincent, John R. Hershey, and Daniel P. W. Ellis. [.pdf] [.bib]
    Abstract: Speech and audio signal processing research is a tale of data collection efforts and evaluation campaigns. Large benchmark datasets for automatic speech recognition (ASR) have been instrumental in the advancement of speech recognition technologies.  However, when it comes to robust ASR, source separation, and localization, especially using microphone arrays, the perfect dataset is out of reach, and many different data collection efforts have each made different compromises between the conflicting factors in terms of realism, ground truth, and costs. Our goal here is to escape some of the most difficult trade-offs by proposing MICbots, a low-cost method of collecting large amounts of realistic data where annotations and ground truth are readily available. Our key idea is to use freely moving robots equiped with microphones and loudspeakers, playing recorded utterances from existing (already annotated) speech datasets.  We give an overview of previous data collection efforts and the trade-offs they make, and describe the benefits of using our robot-based approach. We finally explain the use of this method to collect room impulse response measurement.
  • Deep NMF for Speech Separation,” with John R. Hershey and Felix Weninger. [.pdf] [.bib]
    Abstract: Non-negative matrix factorization (NMF) has been widely used for challenging single-channel audio source separation tasks. However, inference in NMF-based models relies on iterative inference methods, typically formulated as multiplicative updates.  We propose “deep NMF”, a novel non-negative deep network architecture which results from unfolding the NMF iterations and untying its parameters. This  architecture can be discriminatively trained for optimal separation performance. To optimize its non-negative parameters, we show how a new form of back-propagation, based on multiplicative updates, can be used to preserve non-negativity, without the need for constrained optimization. We show on a challenging speech separation task that deep NMF improves in terms of accuracy upon NMF and is competitive with conventional sigmoid deep neural networks, while requiring a tenth of the number of parameters.

If you are attending the conference, don’t hesitate to come by and ask the hard questions…

(The photo above is myself happily posing with Dot, Hot, and Lot, our first three MICbots.)

Tidying up my (data) mess

I’ve recently bought a NAS (network-attached storage), with the main goal of making it as easy as possible to actually enjoy our collection of photos and home videos instead of only amassing them, never to be watched again.

Of course, a big part of this is to clean up the data first, once and for good, and that is, no wonder, a major endeavor (which is the very reason why I kept kicking the can down the road all these years!). With three children under 3, you may wonder how come I have spare time to take care of this. The answer is simple: I don’t. I have been working on this for several weeks late at night, which is probably one of the main reasons why it took my tired brain so long to figure many important things out.

It’s still a work in progress, but I thought I’d share a few tricks that I’ve used and that required a lot of searching and fiddling around before I could get them right.

Things would be much easier if I wasn’t borderline (?) OCD and I didn’t spend hours fixing time issues (wrong time zone, cameras out of sync) in my photos and home videos. Even worse, my photo library (26000 pictures and counting) was managed in iPhoto ’11 (9.2.3, running on Snow Leopard… I know, time to update), so I had to find a way to export everything without losing the time adjustments I had made in there as well as other metadata (Faces info for example, although it’s not clear whether other software can handle those).

I have been using a combination of several tools:

  • iPhoto: over the years, I fixed a lot of timing issues through the “Adjust Time” option of iPhoto, but somehow never clicked the “Modify the originals” checkbox. I thus had to select my entire library, add 1 second (without modifying the originals, because it takes forever and is useless!), then remove 1 second, this time checking the Modify the originals box. I actually did it on smaller batches, because it does not take much for iPhoto to crash…
  • phoshare:  this is an open source software written in Python that exports images from iPhoto/Aperture while preserving metadata. Here is a great tutorial (and here’s another) on how to use Phoshare. I modified the code so that the exported images could be renamed according to both the date and the time they were taken (the original code only handles the date).Of course, I later realized that I could have spared myself the trouble by simply exporting the pictures with their original names, and do the automatic renaming using either exiftool or Adobe Bridge as explained below. Anyway. If you are still interested in using the modified version, download my version of imageutils.py and put it in Contents/Resources/lib/python2.7/tilutil/ in place of the old one (you need to right-click the application and choose “Show package content” to access those files). Then you can use {hh}{MM}{ss} in the file name template for hours, minutes and seconds.
  • exiftool: this is a nifty piece of software that can do all sorts of crazy things on/using the EXIF data of a photo. I used it to set the modification date/time of all my photos to the date/time they were taken. This is done in one line:
    exiftool "-DateTimeOriginal>FileModifyDate" folder_name/
    and it even goes down recursively in subfolders, hurray!
  • GNU touch: for videos, as well as pictures for which the EXIF data was too messy, I resorted to using the command line (I’m on a Mac, but it works the same on Cygwin on Windows) and the good old GNU touch. This actually took me quite a while to understand, because resources online were pointing at obsolete syntax (“use -B n to go back n seconds!”… nope). It looks like touch’s syntax was completely changed in the last few years, and for the better. The -d switch is pretty magical, and lets you set the modification time of a file to a specific date, where the format of the date is pretty much anything that makes sense! In combination with -r file1, you can grab the modification time of file1 and apply it to another file after adding or subtracting an arbitrary amount of time, e.g.:
    touch -r file1.avi -d "-1hour-2minutes3seconds" file2.avi
    will set file2.avi’s modification time to 1 hour 1 minute and 57 seconds prior to that of file1.avi. You can of course use it as touch -r file1.avi -d "6hours" file1.avi for example to account for a +6 hour time zone difference, which is what you need when taking pictures in France with a camera set to Eastern Time. Surprisingly, it is very hard to find documentation of the time-shifting aspect of -d, so I hope this helps someone out.
  • Adobe Bridge: I used it to automatically rename files according to their modification date, e.g. DSC00100.jpg renamed to 20130901_175959_DSC00100.jpg. I could have done this with exiftool, but I happened to have it and it’s very easy to deal with file batches.

Now that I’ve prepared all this data to put on the NAS, there are still a couple minor things that need to be dealt with, namely:

  • backing up the whole thing. Multiple times.
  • making it easy to watch on the TV.

For backup, I use two 2TB WD Passport drives: I keep one in a remote location, and have the NAS back up on the other one every couple days. My plan is to swap the drives between home and the remote location every couple weeks. I also started using Google Drive, and subscribed a 1TB plan ($10/month). I uploaded all home videos and photos, making sure to use the Google Drive app, and NOT drag-and-drop folders into Chrome: I tried that first, and it reset all the modification dates to the day of the upload, so that I ended up with dozens of home videos from 5 years ago that were showing up as if they’d been taken last week. Google Drive has the benefit of being both a backup, and an easy way to share media with the family. Plus, their auto-awesome feature is really cool. Going full-on Google, I also uploaded my music collection to Google Music: it’s free up to 50,000 songs. Yes, fifty bloody K. That’s a pretty ridiculously large music library, and more than enough for me.

As for watching, I’ve been toying with OpenELEC on a Raspberry Pi, a small Linux distribution that runs the Kodi media center. I use a Flirc IR receiver together with a Harmony programmable remote to control it as just another appliance. The result is impressive, kind of like browsing Netflix but with your own media library. It’s amazing what you can do with a $35 tiny computer (plus $9 for the case…). But I’ll soon get back to that.