ASCIIMath creating images

Wednesday, May 18, 2016

A simple scikit-learn classifier based on Gaussian Mixture Models (GMM)

When I started switching to Python for my work on CASA, it wasn't entirely clear to me how to use the sklearn GMM (sklearn.mixture.GMM) for classification.  Turned out a bit easier than expected (yay for scikit-learn!), but for others, here is my implementation of a class that behaves like the other classifiers (eg. sklean.svm.SVC).  All you need to decide is how many Gaussians you want to model your data with, and off you go.

Link to Github repo. A Jupyter notebook shows a sample use.

Why isn't something like this in sklearn yet?  Well, turns out someone did propose it already (no surprise) in a much more general way: see this discussion on GitHub.  (I myself was pointed there when I asked about my own code)  My bit of code is far more primitive, but I hope easier to understand.

Tuesday, April 12, 2016

DAGA2016 article: Probabilistic 2D localization of sound sources using a multichannel bilateral hearing aid

Just put my DAGA 2016 article online. Link to paper.

Abstract: In the context of localization for Computational Auditory Scene Analysis (CASA), probabilistic localisation is a technique where a probability that a sound source is present is computed for each possible direction. This approach has been shown to work well with binaural signals provided the location of the sources to be localized is in front of the user and approximately on the same plane as the ears. Modern hearing aids use multiple microphones to perform array processing, and in a bilateral configuration, the extra microphones can be used by localization algorithms to not only estimate the horizontal direction (azimuth), but vertical direction (elevation) as well, thereby also resolving the front-back confusion. In this work, we present three different approaches to use Gaussian Mixture Model classifiers to localize sounds relative to a multi- microphone bilateral hearing aid. One approach is to divide a unit sphere into a nonuniform grid and assign a class to each grid point; the other two approaches estimate elevation and azimuth separately, using either a vertical-polar coordinate system or an ear- polar coordinate system. The benefits and drawbacks in terms of performance, computational complexity and memory requirements are discussed for each of these approaches.

Monday, February 8, 2016

GMM based localizer on custom ASIC model

The model interface hardware with the FPGA in-circuit emulator.
Lukas Gerlach (L) and Christopher Seifert (R) demoing their ASIC model setup, running realtime on a FPGA.

It's always nice to see one's own research code running on real actual hardware with live data rather than just having a simulation in MATLAB.  My colleagues over at the Institut für Mikroelektronische Systeme (IMS) of the Leibnitz Universität in Hannover presented a demo of their hardware at the Hearing4All winter plenary held last week in Soltau.  The code running on the hardware visible is a GMM based localizer originally written by Tobias May, but since heavily modified by myself.  The next step is that we'll write up exactly what we did to make this all work and how well it does - so look out for an article on this in the near future! It's one of the advantages of being at an intengrated cluster; at Hearing4All, pretty much everything related to hearing loss is being investigated: from basic ear physiology, to audiology, models, algorithms, clinical procedures, implants, and new ground-breaking hardware.


Tuesday, February 2, 2016

The Selective Binaural Beamformer: It's out!


After six or so months of going through the peer review gauntlet, our paper on the Selective Binaural Beamformer (or simply SBB) is finally published.  Thanks to all my coauthors (Menno, Daniel, Simon, and Steven) as well as the reviewers (especially reviewer #2, who gave very tough but important feedback) I think this became a very nice paper.  Please go ahead and read it at http://www.asp.eurasipjournals.com/content/2016/1/12 (EURASIP Journal on Advances in Signal Processing, full title "Speech enhancement for multimicrophone binaural hearing aids aiming to preserve the spatial auditory scene"): it's open access, one can read it either at the above address or download a PDF (see the right sidebar on the linked page).  Being open access, it's free and CC-A 4.0 licensed. 

The basic idea behind the algorithm is this: Normally, if using a beamforming algorithm on a binaural hearing aid, the entire auditory image will collapse to the position of the beam direction, that is ALL sound will appear (to the hearing aid user) to originate from the same location.  Various methods have been proposed to fix this - Simon Doclo in particular has done a lot of work on this topic (which is why it was so helpful to have him as coauthor).  My approach to this problem was to take the signal in the STFT domain (ie, the signal is divided into discrete short time frames and narrow frequency bins) and in each "bin" (time-frequency unit) make a decision if the target signal is dominant, or if the background noise is dominant.  In the first case, I use the beamformer output: the signal is enhanced, collapsed, but that's OK - it _should_ be coming from the target direction anyways.  In the second case, I simply use the signal as it comes from the two microphones closest to the ear canal, without processing - hence there is (almost) no difference from the "real" signal reaching the ears.  So, all the benefit of the beamformer without the nasty collapse of the auditory field!

...Well, mostly.  The tricky part is to make a good speech/noise decision (or actually a "target signal"/background noise decision).  But there's a fancy SNR estimator in there, from Adam Kuklasinski (see ref. 19 - I met him in Lisbon where he presented it at EUSIPCO), and that works pretty well.

So if this is the kind of thing that seems interesting to you, read the paper - and I will post some of the sample files (that were used during subjective testing) soonish on my personal homepage.

Thursday, July 2, 2015

A bit of Python path hackery

Like many people, I have a directory in $HOME for useful python code I write. Similar to MATLAB's "Documents/MATLAB" directory, I want those files to be easily available if I'm hacking new stuff. The typical way of handling this is to do:

import sys
sys.path.append('/path/to/my/dir')
       
 
but that is a bit ugly, and has the problem that if I'm doing an IPython Notebook (as I often do), this append function gets reevaluated if I reexecute the cell in which I do all my imports (since as I'm edition the notebook and adding stuff, I would do a lot). Besides I now have sys in my namespace.

I could just dump it into ".local/lib/python3.4/site-packages" (too hidden, outside the tree that is synced between machines), or link "Documents/Python" to that path, but then I don't want to have pip install stuff there, and also, there might be times I don't want "Documents/Python" to be in the path.

So here's my solution.  I place a file in ".local/lib/python3.4/site-packages" and ".local/lib/python2.7/site-packages" with a name like "LocalPath.py", and this file contains

       
import sys

myPythonDir = '/home/jthiem/Documents/Python'
if myPythonDir not in sys.path:
    sys.path.append(myPythonDir);
    
 
I can now do
       
import LocalPath
 
and have instant no-fuss access to my homebrew packages.  Now isn't that nice.

EUSIPCO 2015

It's becoming a habit: I'm going to EUSIPCO again.  They seem to have a preference for the sunnier parts of Europe (fine by me!) so this time I'm off to Nice, France, where I will be presenting some of my current work on CASA for hearing devices.  While what I'll be presenting is just a small part of the whole, it's an important initial result: evaluating the features I'm extracting from the hearing aid microphones to use for localization of sources.  I'll be presenting my paper "Features for Speaker Localization in Multichannel Bilateral Hearing Aids" as part of the Acoustic scene analysis using microphone array special session, on Wednesday at 14:30.  (A bit of a pity, I would have liked to attend the "Audio and speech source separation and enhancement" session as well)

I'll be going straight to Nice after visiting the in-laws in Canada, so it'll be quite the trip.  Hope to see you there!

Friday, November 28, 2014

The new and improved method of real-time MIDI control of MATLAB (or Python, or ... whatever!)

In an older post on my blog, I had this method to get the state of a MIDI controller into MATLAB (though readable by pretty much anything that can read files).  This method is quite clunky and when I wanted to do something similar again, I re-thought the problem and came up with a better method based on memory mapped files. You can find the code on GitHub.

Basically, I'm creating a file of fixed length (260 bytes, in /tmp), mmap() it and update the contents based on the MIDI stream I'm receiving.  The first 128 bytes are for keys, where each byte is the last seen velocity (0 means "off").  The following 128 bytes are for the cc messages.  Bytes 257 and 258 store the pitch bend, a 14-bit value with MSB in byte 257.  Byte 259 is the last received program change value and 260 is the channel aftertouch.

MATLAB Access

Accessing the values from MATLAB does not require any special functions, it can simply be done using

mm = memmapfile('/tmp/midibroadcast');

then the data can be obtained in real-time from mm.Data.  As described above, mm.Data(1:128) are the last seen key down velocities (0 meaning key is not pressed) and mm.Data(129:256) are controllers 0..127.  The pitch bend value can be obtained as mm.Data(257)*256+mm.Data(258) (note this may be buggy; my cheap controller (KeyRig 25) does not let me set and hold precise pitch bends, so I can't test it properly). mm.Data(259) is the program change and mm.Data(260) shows the current aftertouch amount.

Python Access

From Python the values can be accessed as easily:

import os
import mmap
mfd = os.open('/tmp/midibroadcast', os.O_RDONLY)
mfile = mmap.mmap(mfd, 0, prot=mmap.PROT_READ)

Remembering that Python counts from 0, the controllers can be read in real-time as mfile[128] to mfile[255]: just subtract 1 from the descriptions above.

Any language which can do memory-mapping should be able to do the same, but it should even be possible to read the current state just by re-reading the /tmp/midibroadcast file.

Enjoy!