Signals Processed: 2016

Sunday, October 23, 2016

A (very) short trip to Korea

Panorama view from my room at the Nest Hotel in Incheon. The Incheon Airport is visible on the right.

Earlier this month, I was in Incheon, South Korea, to present a talk at the Symposium for "Statistical physics, machine learning, and its application to speech and pattern recognition", organized by Prof. Kang-Hun Ahn as part of the Korea Institute for Advanced Study (KIAS). I was specifically invited to give a talk there (along with Jörg Lücke and Steven van de Par). One does not refuse such an invitation, esp. as a post-doc trying to make an academic career happen.

As a conference, the event was very good, in both of scientific content and forging connections that hopefully can continue in the future. Many interesting discussions happened outside the sessions, too.

The location of the Symposium banquet. It was excellent. Don't ask me for the name though; but I can give the coordinates: 37°25'53.1"N 126°25'27.0"E

The trip was bizarre for me for one reason, though. It's the first time I traveled that far just to present at a conference. Unfortunately, this symposium was scheduled not long after I had returned from Italy (where I was at MLSP 2016, coupled with a 1 week vacation), and the week before classes start here at the University of Oldenburg; so I had no time to do any sightseeing in Korea. I literally arrived the day before the first day of the symposium, and left the morning after the last day. Total time in Korea: about 66 hours. Total time flying there and back: 30 hours. We (myself, Jörg, and Steven) never went further than about 5 km from the Airport.

I certainly hope to go to Korea again, but then stay a little longer! There is so much to see, and I have friends in Japan I'd like to visit, too. (I've been to Jeju before, so I know Korea can be very beautiful. Next time I'd like to bring the wife and kids along!)

Wednesday, September 14, 2016

MLSP2016 paper: Speaker Tracking for Hearing Aids

MLSP 2016 poster, the print version
can be found here.

Yesterday, I presented my poster at the 2016 IEEE International Workshop on Machine Learning for Signal processing. I think it was received pretty well, there were several people that talked to me, and we had very good discussions. The biggest problem (and typical for all poster sessions) was that there were other good posters being presented, and I couldn't really spend time talking to the other authors at that session. However, over the next few days I'll have a chance to chat with them, so it's all good.

The beach of the conference venue,
with view towards Salerno.

My own paper I'm presenting is entitled "Speaker Tracking for Hearing Aids", and it basically a method to link speech utterances spoken at different times by the same speaker, a classic problem also found in speaker diarization (but I don't need to do segmentation). My method however is optimized for low computational complexity (for hearing aids), yet reaches comparable performance to typical far more complex methods. You can find the abstract, paper, and poster (seen in the pic) on my homepage.

Overall, I like these small, highly focused conferences - and being in a nice sunny environment is not to be sneezed at either.

Wednesday, August 31, 2016

A library of Gammatone Filterbanks in Python

I have already programmed gammatone filterbanks several times in MATLAB, but in for some recent work I needed a specific one (the One-Zero Gammatone filterbank) [Katsiamis, 2006] in Python.

Gammatone impulse responses, in time domain

In addition, I wanted my "old" FIR GTFB (from my Ph.D. Thesis) to be in there as well as the GTFB of a former colleague here in Oldenburg [Chen, 2015]. So I packed them all up in a library - I also wanted to learn how to make a PyPI compatible Python library.

This is a work in progress. The gammatone filterbanks mentioned are there, but momentarily I don't have time to polish the code; furthermore, I'd like to eventually add the Slaney GTFB and the Hohmann GTFB - not to mention the structure needs more cleanup. Consider this v0.01, for educational purposes. You can find it on github.

Thursday, August 11, 2016

Audio Resampling in Python

(Note: this post was written as a Jupyter Notebook which can be found with the Python code at https://nbviewer.jupyter.org/gist/jthiem/0bb8b2296c8dbd0b83bfab8270a8eb24.)

UPDATE: You can resample with nothing but scipy and one extra function now. ~~See this new post.~~
I now have written a package that you can install using pip.

One of the most basic tasks in audio processing anyone would need to do is resampling audio files; seems like the data you want to process is never sampled in the rate you want. 44.1k? 16k? 8k? Those are the common ones; there are some really odd ones out there.

Resampling is actually quite hard to get right. You need to properly choose your antialias filters, and write a interpolation/decimation procedure that won't introduce too much noise. There have been books written on this topic. For the most part, there is no single univeral way to to this right, since context matters.

For a more detailed discussion on sample rate conversion see the aptly named "Digital Audio Resampling Homepage" [https://ccrma.stanford.edu/~jos/resample/]. Then look at [http://src.infinitewave.ca/] for a super informative comparison of a ton of resampling implementations. No seriously, go there. (It inspired me to compare the methods below using a sine sweep.)

MATLAB has the built-in resample function. Everyone uses it. It works, but also sucks in many aspects - a much better function is in the DSP toolbox, but as near as I can tell, noone uses it (even when copious other parts of the DSP toolbox are being used!)

But I'm not talking about MATLAB, I'm talking about Python here, and it doesn't have a default resampling method - except if you're doing any kind of numerical computing, you'll be using numpy and probably scipy --- and the latter has a built-in resample function.

To get ahead of myself a bit, scipy.signal.resample sucks for audio resampling. That becomes apparent quite quickly - it works in frequency domain, by basically truncation or zero-padding the signal in the frequency domain. This is quite ugly in time domain (especially since it assumes the signal to be circular).

There are two alternatives I want to point at that are "turn-key", that is, you just have to specify your resampling ratio, and the library does the rest. I know of a few others, but at the time of writing this they didn't seem as easy to use (eg. just doing the interpolation/decimation, but not calculating a filter.) If there are other notable libraries I should know of, please mention it in the comments of the blog version of this notebook [https://signalsprocessed.blogspot.de].

The first alternative is scikit.resample. Using it has two problems: it relies on an external (C-language) library called "Secret Rabbit Code" (SRC as in "sample rate conversion", geddit?) aka libsamplerate [http://www.mega-nerd.com/SRC/]. Not a problem in itself, but you need to install it and this step is not automated in pip and friends (as far as I know at time of writing). (Problem for me was that I needed to run it on a platform where I couldn't do this easily.) Problem 2: scikit.resample 0.3.3 is Python 2 only, and the Python 3 version is unofficial (0.4.0-dev, I used [https://github.com/gregorias/samplerate]).

The other one is resampy, which can be found in PyPI or [https://github.com/bmcfee/resampy]. It is easy to install and works with Python 3 (librosa now uses it as preferred resampler over libsamplerate)

TL;DR: if you can install external libs, use scikit.resample (0.4.0-dev). resampy is faster, but not as good, but entirely reasonable. (If MATLAB resample is good enough for you, resampy will serve you just fine and is easier to install)

Now for a comparison with pretty pictures.

Downsampling a sweep

A common task is to convert between the CD sampling rate of 44.1 kHz, and the multiples of 8kHz that originated from the telekon industry. In particular, 48kHz. (Supposedly the rate of 44.1k was chosen in part because is was difficult to convert to 48 kHz; but it is also connected to NTSC timing). So let's have a sweep sampled at 48 kHz. (See the Jupyter Notebook for the Python code)

Sweep 100-23900 Hz, sampled at 48 kHz.

Now convert it to 44.1 kHz using the different libraries, and plot the spectrograms.

Same sweep downsampled to 44.1 kHz using scikit.resample.

scikit.samplerate (or libsamplerate AKA "Secret Rabbit Code") does it well. Almost no aliasing, yet close to Nyquist, and noise levels in line with the input. Note that once the sweep is above Nyquist, the output is basically nil.

The same sweep but now downsampled using scipy.signal.resample.

And this is scipy.signal.resample. Throughout the entire signal there is a high-frequency tone, and there is considerable energy even if the sweep is above Nyquist. This is a result of the frequency-domain processing operating on the entire signal at one. This is NOT suited to audio signals.

The same sweep downsampled using resampy.

Finally, resampy. There are some funny nonlinear effects, but at very low level - for audio applications generally nothing to worry about. Even the aliasing is below -60 dB. So, while not as good as scikit.resample is is useable and certainly better than scipy.signal.resample!

Upsampling an impulse

A way to examine the antialias filter is to upsample an impulse. Here, note a frustrating aspect of all these differing libraries: the interface is completely different beween all of them; and to boot, resampy and scikit.resample give different length outputs for the same upsampling ratio. This means you can't just switch between them using a from foo import resample statement, you'll need a wrapper function if you want to switch freely amongst them (see librosa as an example).

>>> us1 = sk_samplerate.resample(impulse, P/Q, 'sinc_best')
>>> us2 = scipy_signal.resample(impulse, int(len(impulse)*P/Q))
>>> us3 = resampy.resample(impulse, Q, P)
>>> print(us1.shape, us2.shape, us3.shape)
(1087,) (1088,) (1088,)

Frequency response of an impulse upsampled from 44.1 kHz to 48 kHz

'scikit.resample' certainly uses the best antialias filter, with a steep slope, allowing very little energy over Nyquist. 'resampy' is a more gentle filter (lower order, perhaps?) but decent. 'scipy.signal.resample'... well, anyways.

What do the impulse responses look like in time domain?

Time-domain impulse upsampled from 44.1 kHz to 48 kHz

The same section of the impulse, but with samples converted into dB (10*log10 magnitude squared)

Unsurprisingly, 'resampy' is the most compact, which explains the gentler cut-off. 'scikit.resample' is wider, but acceptable. 'scipy.signal.resample' sticks out like a sore thumb.

Speed comparison

>>> %timeit sk_samplerate.resample(sig, Q/P, 'sinc_best')
10 loops, best of 3: 98.1 ms per loop
>>> %timeit scipy_signal.resample(np.float64(sig), int(len(sig)*Q/P))
100 loops, best of 3: 4.52 ms per loop
>>> %timeit resampy.resample(np.float64(sig), P, Q)
10 loops, best of 3: 40.7 ms per loop

And here is a notable merit of 'scipy.signal.resample'. It is faster by an order of magnitude compared to the other methods. I suspect that if you make sure your signals are of length 2^N

2^{N}

, you'll get even faster results, since it'll switch to a FFT instead of a DFT. The other two are probably losing some speed in the passing of data from Python to C - but fundamentally, frequency domain techniques do tend to be fast. So keep that in mind you you need speed.

Conclusion

I'll just repeat the above TL;DR: use scikit.resample (0.4.0-dev) if you can install libsamplerate in a sane place. Else, use resampy, which is faster, but not as good --- but entirely reasonable. Use 'scipy.signal.resample' only if you really need the speed, and be aware of its shortcomings (note the circular assumption!).

Monday, June 6, 2016

FFT-based Overlap-Add FIR filtering in Python

Here is a small Python function I've written (github), that might be useful if you're doing signal processing in Python. The function implements the classic FFT-based Overlap-Add filtering, potentially saving a heck of a lot of processing time (assuming your filter is of sufficiently high order: usually about 128 taps).

For a thorough explanation of the algorithm, see the Wikipedia article.

I wrote this code as part of a lager ongoing project (gammatone filtering) that I will release eventually. What I still have to add to this code is the ability to set and save state information (it's simply the part of the response cut off at the end of the function) and the ability to filter complex signals with complex filters (replacing rfft with fft). Changes made in the latest version.

Wednesday, May 18, 2016

A simple scikit-learn classifier based on Gaussian Mixture Models (GMM)

When I started switching to Python for my work on CASA, it wasn't entirely clear to me how to use the sklearn GMM (sklearn.mixture.GMM) for classification. Turned out a bit easier than expected (yay for scikit-learn!), but for others, here is my implementation of a class that behaves like the other classifiers (eg. sklean.svm.SVC). All you need to decide is how many Gaussians you want to model your data with, and off you go.

Link to Github repo. A Jupyter notebook shows a sample use.

Why isn't something like this in sklearn yet? Well, turns out someone did propose it already (no surprise) in a much more general way: see this discussion on GitHub. (I myself was pointed there when I asked about my own code) My bit of code is far more primitive, but I hope easier to understand.

Tuesday, April 12, 2016

DAGA2016 article: Probabilistic 2D localization of sound sources using a multichannel bilateral hearing aid

Just put my DAGA 2016 article online. Link to paper.

Abstract: In the context of localization for Computational Auditory Scene Analysis (CASA), probabilistic localisation is a technique where a probability that a sound source is present is computed for each possible direction. This approach has been shown to work well with binaural signals provided the location of the sources to be localized is in front of the user and approximately on the same plane as the ears. Modern hearing aids use multiple microphones to perform array processing, and in a bilateral configuration, the extra microphones can be used by localization algorithms to not only estimate the horizontal direction (azimuth), but vertical direction (elevation) as well, thereby also resolving the front-back confusion. In this work, we present three different approaches to use Gaussian Mixture Model classifiers to localize sounds relative to a multi- microphone bilateral hearing aid. One approach is to divide a unit sphere into a nonuniform grid and assign a class to each grid point; the other two approaches estimate elevation and azimuth separately, using either a vertical-polar coordinate system or an ear- polar coordinate system. The benefits and drawbacks in terms of performance, computational complexity and memory requirements are discussed for each of these approaches.

Monday, February 8, 2016

GMM based localizer on custom ASIC model

The model interface hardware with the FPGA in-circuit emulator.


Lukas Gerlach (L) and Christopher Seifert (R) demoing their ASIC model setup, running realtime on a FPGA.

It's always nice to see one's own research code running on real actual hardware with live data rather than just having a simulation in MATLAB. My colleagues over at the Institut für Mikroelektronische Systeme (IMS) of the Leibnitz Universität in Hannover presented a demo of their hardware at the Hearing4All winter plenary held last week in Soltau. The code running on the hardware visible is a GMM based localizer originally written by Tobias May, but since heavily modified by myself. The next step is that we'll write up exactly what we did to make this all work and how well it does - so look out for an article on this in the near future! It's one of the advantages of being at an intengrated cluster; at Hearing4All, pretty much everything related to hearing loss is being investigated: from basic ear physiology, to audiology, models, algorithms, clinical procedures, implants, and new ground-breaking hardware.

Tuesday, February 2, 2016

The Selective Binaural Beamformer: It's out!

After six or so months of going through the peer review gauntlet, our paper on the Selective Binaural Beamformer (or simply SBB) is finally published. Thanks to all my coauthors (Menno, Daniel, Simon, and Steven) as well as the reviewers (especially reviewer #2, who gave very tough but important feedback) I think this became a very nice paper. Please go ahead and read it at http://www.asp.eurasipjournals.com/content/2016/1/12 (EURASIP Journal on Advances in Signal Processing, full title "Speech enhancement for multimicrophone binaural hearing aids aiming to preserve the spatial auditory scene"): it's open access, one can read it either at the above address or download a PDF (see the right sidebar on the linked page). Being open access, it's free and CC-A 4.0 licensed.

The basic idea behind the algorithm is this: Normally, if using a beamforming algorithm on a binaural hearing aid, the entire auditory image will collapse to the position of the beam direction, that is ALL sound will appear (to the hearing aid user) to originate from the same location. Various methods have been proposed to fix this - Simon Doclo in particular has done a lot of work on this topic (which is why it was so helpful to have him as coauthor). My approach to this problem was to take the signal in the STFT domain (ie, the signal is divided into discrete short time frames and narrow frequency bins) and in each "bin" (time-frequency unit) make a decision if the target signal is dominant, or if the background noise is dominant. In the first case, I use the beamformer output: the signal is enhanced, collapsed, but that's OK - it _should_ be coming from the target direction anyways. In the second case, I simply use the signal as it comes from the two microphones closest to the ear canal, without processing - hence there is (almost) no difference from the "real" signal reaching the ears. So, all the benefit of the beamformer without the nasty collapse of the auditory field!

...Well, mostly. The tricky part is to make a good speech/noise decision (or actually a "target signal"/background noise decision). But there's a fancy SNR estimator in there, from Adam Kuklasinski (see ref. 19 - I met him in Lisbon where he presented it at EUSIPCO), and that works pretty well.

So if this is the kind of thing that seems interesting to you, read the paper - and I will post some of the sample files (that were used during subjective testing) soonish on my personal homepage.

ASCIIMath creating images