Signals Processed: Academia

Showing posts with label Academia. Show all posts

Tuesday, April 24, 2018

The disappearance and reappearance of the DEMAND corpus

The DEMAND ("Diverse Environments Multichannel Acoustic Noise Database") Corpus is one of my most successful creations - at the time I'm writing this post, Google Scholar claims there are 43 citations of the main article describing it. (And I blogged about it here, here, here and here) So it was a bit of a nasty surprise when Professor Chan (of Queens University in Kingston, Ontario; a good friend of my Ph.D. supervisor) dropped me a note telling me the database had disappeared off the internet.

To my dismay, I did not have a copy of the data on my own harddrives either, except for the files sampled at 16 kHz; luckily after some frantic emailing, I was told by my former colleagues Remy and Nancy that INRIA still has backups of the original files and Emmanuel has a backup of the website along with the HTML and descriptive PDF. Note to self: KEEP WELL ORGANIZED BACKUPS!

There was still the problem of finding a new home to host the data, and Emmanuel suggested Zenodo, a platform funded by CERN and the EU for open-access data: DEMAND fits the bill pretty well. As a bonus, the dataset now has a "proper" DOI:

I hope the new home of DEMAND is more permanent than the old one; ot certainly looks good. I'll be putting my HRTF database (blog post, brief preliminary conference paper) on there too once the main journal article describing it is vetted - actually the data is already there, it just needs to be released.

Enjoy the data! (And let's hope the new location will pop up at the top of Google when using the search terms "DEMAND noise", as the old one did!)

Wednesday, January 3, 2018

A trip to the library...

When I started thinking about resampling and the filters used in resampling I leaned heavily on the book by P. P. Vaidyanathanm, "Multirate Systems And Filter Banks", the "brown book" that every signal processing practitioner should have on his or her shelf (or on a shelf of a nearby colleague)!

Like many articles discussing this topic (e.g. the documentation of this MATLAB function), the key paper always referenced is Kaiser, James F. “Nonrecursive Digital Filter Design Using the I0-Sinh Window Function.” Proceedings of the 1974 IEEE International Symposium on Circuits and Systems. April, 1974, pp. 20–23. Being curious about how the equations given in Vaidyanathanm were derived, I wanted to see the original paper - but unfortunately, the earliest proceedings of ISCAS that can be found in IEEE Xplore are from 1988 - 14 years after the one I'm after.

Luckily, I happen to be on Christmas vacation at my inlaws, which live in Kitchener, which is right next door to Waterloo - home to one of the best engineering universities of the world. Unsurprisingly, they have a well-stocked library, which includes the above conference proceeding booklet and, braving the cold, I now have a copy for my own archives.

I could only copy it by taking snaps using my cellphone - the copiers need a special card to use, and the scanners need a UWaterloo login, but for reading the content, that is sufficient with today's smartphones. It's only for personal use, and I do hope that the IEEE will eventually get around to scanning in the older papers!

Papers from that era have a certain charm. Typewriter written equations. FORTRAN code. I believe the below should be FORTRAN66 (due to the IF statement; I don't think FORTRAN IV had that, and FORTRAN77 obviously didn't exist yet). Without looking it up, do you know how that IF statement works? (Explanation at end)

As for what I learned from the paper (on first quick read): those formulas relating N and beta to the Stopband attenuation and transition band width were fitted to empirical data. I expected as much - but still I wanted to confirm that to myself. (If on a closer read I turn out to be wrong, I'll correct this post)

So there's one item off my list of things to to in 2018. Visit the UWaterloo Campus and its library. Good fun.

-------

The FORTRAN IF statement: That is known as the "arithmetic IF" or "three-way IF" statement. The expression after the IF is evaluated, and if it is negative, the execution jumps to the first label ("2", exiting the subroutine). If the result is 0, we go to the second label ("1"), and if it is positive, to the third (also "1"). Here, each term of the power series expansion is calculated, and as soon as the new term is less than 2x10^-9 of the approximation, the code terminates. And IIRC, FORTRAN is call-by-reference.

Wednesday, May 3, 2017

Resampling in Python: Electric Bugaloo

In a previous post, I looked at some sample rate conversion methods for Python, at least for audio data. I did some more digging into it, and thanks to a note by Prof. Christian Muenker from the Munich University of Applied Sciences I was made aware of scipy.signal.resample_poly (new in scipy since 0.18.0). This lead me down a bit of a rabbit-hole and I ended up with a Jupyter Notebook which I'm not going to copy-paste here since there is quite a bit of code and some LaTeX in there. Here is the link to it instead.

For the impatient, here are the interesting bits:

def resample_poly_filter(up, down, beta=5.0, L=16001):
    
    # *** this block STOLEN FROM scipy.signal.resample_poly ***
    # Determine our up and down factors
    # Use a rational approximation to save computation time on really long
    # signals
    g_ = gcd(up, down)
    up //= g_
    down //= g_
    max_rate = max(up, down)

    sfact = np.sqrt(1+(beta/np.pi)**2)
            
    # generate first filter attempt: with 6dB attenuation at f_c.
    filt = firwin(L, 1/max_rate, window=('kaiser', beta))
    
    N_FFT = 2**19
    NBINS = N_FFT/2+1
    paddedfilt = np.zeros(N_FFT)
    paddedfilt[:L] = filt
    ffilt = np.fft.rfft(paddedfilt)
    
    # now find the minimum between f_c and f_c+sqrt(1+(beta/pi)^2)/L
    bot = int(np.floor(NBINS/max_rate))
    top = int(np.ceil(NBINS*(1/max_rate + 2*sfact/L)))
    firstnull = (np.argmin(np.abs(ffilt[bot:top])) + bot)/NBINS
    
    # generate the proper shifted filter
    filt2 = firwin(L, -firstnull+2/max_rate, window=('kaiser', beta))
    
    return filt2

plt.figure(figsize=(15,3))
wfilt = resample_poly_filter(P, Q, L=2**16+1)
plt.specgram(scipy_signal.resample_poly(sig, P, Q, window=wfilt)*30, scale='dB', Fs=P, NFFT=256)
plt.colorbar()
plt.axis((0,2,0,Q/2))

Recycling my test sweep from the previous post, I get:


Sweep resampled using my own filter

But really, please read the Notebook. Comments are welcome!

Monday, March 20, 2017

Publication update

This blog has been basically been inactive since last October since being a PostDoc means that there are a whole bunch other things that I am busy with. And of course, it was winter - that means statistically, there is always someone in the family who is sick (kids bring home every germ that is going around...).

View of Kiel. Source: Johannes Barre 2006, on Wikipedia

But it is spring now! And I have just returned from DAGA 2017 in Kiel (where we found a very nice Thai restaurant) so time to update some of my work!

First off, my colleagues in Hannover published "Customized high performance low power processor for binaural speaker localization" at ICECS 2016 in Monte Carlo, Monaco (paper on IEEE Xplore), there was the Winter plenary of Hearing4all, and at DAGA 2017, I presented "Pitch features for low-complexity online speaker tracking", and Sarina (Ph.D. student I'm co-supervising) presented "A distance measure to combine monaural and binaural auditory cues for sound source segregation", both of which can be found on my homepage. In the pipeline is now "Real-time Implementation of a GMM-based Binaural Localization Algorithm on a VLIW-SIMD Processor" by Christopher, which has been accepted and will be presented at ICME 2017 in Hong Kong in July, and I submitted a paper ("Segregation and Linking of Speech Glimpses in Multisource Scenarios on a Hearing Aid") to EUSIPCO 2017; that one is still in review.

I was also teaching a class in the past semester ("5.04.4223 Introduction into Music Information Retrieval") which, because it's a brand new class took a crazy amount of work to prepare for - but I think the students really enjoyed it, and I saw some very good code being written for the final project.

Now back to real work (writing more papers, that is)! (Well, there's one or two topics I'll put on the blog in the next little while, too. Later.)

Sunday, October 23, 2016

A (very) short trip to Korea

Panorama view from my room at the Nest Hotel in Incheon. The Incheon Airport is visible on the right.

Earlier this month, I was in Incheon, South Korea, to present a talk at the Symposium for "Statistical physics, machine learning, and its application to speech and pattern recognition", organized by Prof. Kang-Hun Ahn as part of the Korea Institute for Advanced Study (KIAS). I was specifically invited to give a talk there (along with Jörg Lücke and Steven van de Par). One does not refuse such an invitation, esp. as a post-doc trying to make an academic career happen.

As a conference, the event was very good, in both of scientific content and forging connections that hopefully can continue in the future. Many interesting discussions happened outside the sessions, too.

The location of the Symposium banquet. It was excellent. Don't ask me for the name though; but I can give the coordinates: 37°25'53.1"N 126°25'27.0"E

The trip was bizarre for me for one reason, though. It's the first time I traveled that far just to present at a conference. Unfortunately, this symposium was scheduled not long after I had returned from Italy (where I was at MLSP 2016, coupled with a 1 week vacation), and the week before classes start here at the University of Oldenburg; so I had no time to do any sightseeing in Korea. I literally arrived the day before the first day of the symposium, and left the morning after the last day. Total time in Korea: about 66 hours. Total time flying there and back: 30 hours. We (myself, Jörg, and Steven) never went further than about 5 km from the Airport.

I certainly hope to go to Korea again, but then stay a little longer! There is so much to see, and I have friends in Japan I'd like to visit, too. (I've been to Jeju before, so I know Korea can be very beautiful. Next time I'd like to bring the wife and kids along!)

Wednesday, September 14, 2016

MLSP2016 paper: Speaker Tracking for Hearing Aids

MLSP 2016 poster, the print version
can be found here.

Yesterday, I presented my poster at the 2016 IEEE International Workshop on Machine Learning for Signal processing. I think it was received pretty well, there were several people that talked to me, and we had very good discussions. The biggest problem (and typical for all poster sessions) was that there were other good posters being presented, and I couldn't really spend time talking to the other authors at that session. However, over the next few days I'll have a chance to chat with them, so it's all good.

The beach of the conference venue,
with view towards Salerno.

My own paper I'm presenting is entitled "Speaker Tracking for Hearing Aids", and it basically a method to link speech utterances spoken at different times by the same speaker, a classic problem also found in speaker diarization (but I don't need to do segmentation). My method however is optimized for low computational complexity (for hearing aids), yet reaches comparable performance to typical far more complex methods. You can find the abstract, paper, and poster (seen in the pic) on my homepage.

Overall, I like these small, highly focused conferences - and being in a nice sunny environment is not to be sneezed at either.

Wednesday, August 31, 2016

A library of Gammatone Filterbanks in Python

I have already programmed gammatone filterbanks several times in MATLAB, but in for some recent work I needed a specific one (the One-Zero Gammatone filterbank) [Katsiamis, 2006] in Python.

Gammatone impulse responses, in time domain

In addition, I wanted my "old" FIR GTFB (from my Ph.D. Thesis) to be in there as well as the GTFB of a former colleague here in Oldenburg [Chen, 2015]. So I packed them all up in a library - I also wanted to learn how to make a PyPI compatible Python library.

This is a work in progress. The gammatone filterbanks mentioned are there, but momentarily I don't have time to polish the code; furthermore, I'd like to eventually add the Slaney GTFB and the Hohmann GTFB - not to mention the structure needs more cleanup. Consider this v0.01, for educational purposes. You can find it on github.

Monday, June 6, 2016

FFT-based Overlap-Add FIR filtering in Python

Here is a small Python function I've written (github), that might be useful if you're doing signal processing in Python. The function implements the classic FFT-based Overlap-Add filtering, potentially saving a heck of a lot of processing time (assuming your filter is of sufficiently high order: usually about 128 taps).

For a thorough explanation of the algorithm, see the Wikipedia article.

I wrote this code as part of a lager ongoing project (gammatone filtering) that I will release eventually. What I still have to add to this code is the ability to set and save state information (it's simply the part of the response cut off at the end of the function) and the ability to filter complex signals with complex filters (replacing rfft with fft). Changes made in the latest version.

Wednesday, May 18, 2016

A simple scikit-learn classifier based on Gaussian Mixture Models (GMM)

When I started switching to Python for my work on CASA, it wasn't entirely clear to me how to use the sklearn GMM (sklearn.mixture.GMM) for classification. Turned out a bit easier than expected (yay for scikit-learn!), but for others, here is my implementation of a class that behaves like the other classifiers (eg. sklean.svm.SVC). All you need to decide is how many Gaussians you want to model your data with, and off you go.

Link to Github repo. A Jupyter notebook shows a sample use.

Why isn't something like this in sklearn yet? Well, turns out someone did propose it already (no surprise) in a much more general way: see this discussion on GitHub. (I myself was pointed there when I asked about my own code) My bit of code is far more primitive, but I hope easier to understand.

Tuesday, April 12, 2016

DAGA2016 article: Probabilistic 2D localization of sound sources using a multichannel bilateral hearing aid

Just put my DAGA 2016 article online. Link to paper.

Abstract: In the context of localization for Computational Auditory Scene Analysis (CASA), probabilistic localisation is a technique where a probability that a sound source is present is computed for each possible direction. This approach has been shown to work well with binaural signals provided the location of the sources to be localized is in front of the user and approximately on the same plane as the ears. Modern hearing aids use multiple microphones to perform array processing, and in a bilateral configuration, the extra microphones can be used by localization algorithms to not only estimate the horizontal direction (azimuth), but vertical direction (elevation) as well, thereby also resolving the front-back confusion. In this work, we present three different approaches to use Gaussian Mixture Model classifiers to localize sounds relative to a multi- microphone bilateral hearing aid. One approach is to divide a unit sphere into a nonuniform grid and assign a class to each grid point; the other two approaches estimate elevation and azimuth separately, using either a vertical-polar coordinate system or an ear- polar coordinate system. The benefits and drawbacks in terms of performance, computational complexity and memory requirements are discussed for each of these approaches.

Monday, February 8, 2016

GMM based localizer on custom ASIC model

The model interface hardware with the FPGA in-circuit emulator.


Lukas Gerlach (L) and Christopher Seifert (R) demoing their ASIC model setup, running realtime on a FPGA.

It's always nice to see one's own research code running on real actual hardware with live data rather than just having a simulation in MATLAB. My colleagues over at the Institut für Mikroelektronische Systeme (IMS) of the Leibnitz Universität in Hannover presented a demo of their hardware at the Hearing4All winter plenary held last week in Soltau. The code running on the hardware visible is a GMM based localizer originally written by Tobias May, but since heavily modified by myself. The next step is that we'll write up exactly what we did to make this all work and how well it does - so look out for an article on this in the near future! It's one of the advantages of being at an intengrated cluster; at Hearing4All, pretty much everything related to hearing loss is being investigated: from basic ear physiology, to audiology, models, algorithms, clinical procedures, implants, and new ground-breaking hardware.

Tuesday, February 2, 2016

The Selective Binaural Beamformer: It's out!

After six or so months of going through the peer review gauntlet, our paper on the Selective Binaural Beamformer (or simply SBB) is finally published. Thanks to all my coauthors (Menno, Daniel, Simon, and Steven) as well as the reviewers (especially reviewer #2, who gave very tough but important feedback) I think this became a very nice paper. Please go ahead and read it at http://www.asp.eurasipjournals.com/content/2016/1/12 (EURASIP Journal on Advances in Signal Processing, full title "Speech enhancement for multimicrophone binaural hearing aids aiming to preserve the spatial auditory scene"): it's open access, one can read it either at the above address or download a PDF (see the right sidebar on the linked page). Being open access, it's free and CC-A 4.0 licensed.

The basic idea behind the algorithm is this: Normally, if using a beamforming algorithm on a binaural hearing aid, the entire auditory image will collapse to the position of the beam direction, that is ALL sound will appear (to the hearing aid user) to originate from the same location. Various methods have been proposed to fix this - Simon Doclo in particular has done a lot of work on this topic (which is why it was so helpful to have him as coauthor). My approach to this problem was to take the signal in the STFT domain (ie, the signal is divided into discrete short time frames and narrow frequency bins) and in each "bin" (time-frequency unit) make a decision if the target signal is dominant, or if the background noise is dominant. In the first case, I use the beamformer output: the signal is enhanced, collapsed, but that's OK - it _should_ be coming from the target direction anyways. In the second case, I simply use the signal as it comes from the two microphones closest to the ear canal, without processing - hence there is (almost) no difference from the "real" signal reaching the ears. So, all the benefit of the beamformer without the nasty collapse of the auditory field!

...Well, mostly. The tricky part is to make a good speech/noise decision (or actually a "target signal"/background noise decision). But there's a fancy SNR estimator in there, from Adam Kuklasinski (see ref. 19 - I met him in Lisbon where he presented it at EUSIPCO), and that works pretty well.

So if this is the kind of thing that seems interesting to you, read the paper - and I will post some of the sample files (that were used during subjective testing) soonish on my personal homepage.

Thursday, July 2, 2015

EUSIPCO 2015

It's becoming a habit: I'm going to EUSIPCO again. They seem to have a preference for the sunnier parts of Europe (fine by me!) so this time I'm off to Nice, France, where I will be presenting some of my current work on CASA for hearing devices. While what I'll be presenting is just a small part of the whole, it's an important initial result: evaluating the features I'm extracting from the hearing aid microphones to use for localization of sources. I'll be presenting my paper "Features for Speaker Localization in Multichannel Bilateral Hearing Aids" as part of the Acoustic scene analysis using microphone array special session, on Wednesday at 14:30. (A bit of a pity, I would have liked to attend the "Audio and speech source separation and enhancement" session as well)

I'll be going straight to Nice after visiting the in-laws in Canada, so it'll be quite the trip. Hope to see you there!

Sunday, October 5, 2014

Using an ARM Chromebook for Scientific (and Academic) Computing

Samsung ARM Chromebook

A couple of months ago, I decided to get myself a Chromebook. The Samsung ARM Chromebook is cheap (to the point of being almost disposable) and it's got an ARM CPU - and for performance at the least possible amount of juice it's hard to beat. I really like the fact that this thing emits no noise that I can detect even in a very quiet room.

But how useful is it, for someone in a standard engineering/academic setting? The answer is that it works well, for me at least - with some special considerations. Especially for the last few weeks, it has been my primary laptop, having been dragged to research cluster meetings and one conference. I will explain the details of a few typical things I do, such as (LaTeX) document editing, intensive numerical computation, etc. Read below the break for details.

Transplant: yet another bridge between MATLAB and Python, but a good one!

This post is basically just an advertisement for a project done by a Master's student that I'm co-supervising at the moment. While there are numerous methods already out there to link MATLAB to Python (and rumour has it that the next(?) release will make it easier to call Python from MATLAB), I think Basti's "transplant" (github link) strikes a good balance between simplicity and capability. Bastian's code is elegant and reliable. My own contribution has just been a small bug fix, the ability to transfer logical (boolean) matrices, and an attempt to add the ability to capture MATLAB's stdout (Bastian came up with a much better solution).

For me the resulting killer feature is that I can write IPython notebooks that call MATLAB code in a sane way. Complex code which would take too much effort to convert to Python can be called, then the results can be plotted in the Notebook, which is great when working remotely (a longer post on my workflow is in the works...). Results become more accessible, with a lot of the complexity hidden away in .m files.

So, check it out and spread the word!

Friday, July 4, 2014

My workspace for the next month or two

The two arc source positioning system in the anechoic chamber
of the Carl-von-Ossietzky University of Oldenburg

Yeah, I'll be doing HRTF measurements on dummy heads with multichannel hearing aids. Should be fun.

Thursday, May 29, 2014

Hello Lisbon! EUSIPCO 2014, here I come!

Downtown Lisbon, picture by
Keiran Thomas via Wikimedia Commons

As hinted in a previous post, together with Menno Müller I had a paper submitted to EUSIPCO 2014, and yesterday we finally got the review results. It got accepted by all reviewers, and several of those six (SIX!?!!) really liked the paper, so naturally I'm pleased as punch about it. And of course I'm already looking into how to get to Lisbon and what else to go see when I'm there. So while I can't publish the paper on my own homepage until after the conference is over (and besides, the reviewers did ask for some minor corrections that I still have to put in), I can now refer to: J. Thiemann, M. Müller, and S. van de Par, "A Binaural Hearing Aid Speech Enhancement Method Maintaining Spatial Awareness for the User", to be presented at EUSIPCO 2014.

Monday, May 5, 2014

Spatial properties of DEMAND

One of the nice plots from the presentation,
which didn't make it into the paper for space
reasons. The plots show the fit of the measured
coherence to the theoretical prediction.

Here is my primary DAGA 2014 paper, where I examine some of the intermicrophone coherence of the DEMAND recordings. Also, I experiment a little with calibration, using multidimensional scaling. Not much one can squeeze into two pages.

The presentation was a bit of a bust - they put me in a session more about policy and noise pollution etc. ("Psychoakustik - Lärmschutzpolitik"), so there was not much useful interaction. Oh well, it happens.

The paper is here, and the presentation slides here. This might be quite useful if you're using DEMAND. The question if anyone is actually using DEMAND (other than me) is still open - I would love to hear from anyone who is.

A bit more interesting is a paper written by a M.Sc. student in our lab, on the test results of a hearing aid algorithm we've been working on. The paper is "Erhaltung der räumlichen Wahrnehmung bei Störgeräuschreduktion in Hörgeräten", by Menno Müller, Joachim Thiemann, Daniel Marquardt, Simon Doclo and Steven van de Par, and the method we use to do binaural noise reduction with preservation of spatial awareness is outlined. A more detailed paper has been submitted to EUSIPCO 2014, and in about 2 or 3 weeks I should find out if that has been accepted.

Tuesday, March 11, 2014

DAGA 2014

Attending DAGA 2014 - doesn't require much travelling since it's right here in Oldenburg. I just have just one small presentation (some more DEMAND stuff). Pretty much all the senior profs of the Hearing4All cluster are somehow involved in the organisation of this event, so for all of us peons it is pretty much de rigeur to attend. It's a pretty fun conference (so far) - and tonight the big social event is The Barber of Seville at the Oldenburg Staatstheater.

Tuesday, August 13, 2013

EUSIPCO2013 and MLSP2013 papers

The Museum of Marrakech, a CC-licensed picture by
Donar Reiskoffer, from Wikimedia Commons here.

September will be a busy time for me, since I will be going to two conferences - on one of which my family will be accompanying me - including the in-laws! The first one (with entourage) will be in Marrakesh, Morocco: EUSIPCO 2013. This will be my first time visiting Africa, and I'm really looking forward to it. (After that, I'll need to find an excuse to visit Australia and Antarctica!)

The other conference I'm going to is MLSP 2013, in Southampton, UK, and it should be ... nice. For some reason, Madeline declined to accompany me on that trip.

The papers I will be presenting are some results from my time at IRISA as part of METISS - now called PANAMA. At EUSIPCO, I will be presenting "A fast EM algorithm for Gaussian model-based source separation", on the BAEM algorithm, while at MLSP I'll be presenting "An experimental comparison of source separation and beamforming techniques for microphone array signal enhancement", some observations of the Generalized EM algorithm implemented in FASST when applied to multichannel signals.

Hope to see you there!

ASCIIMath creating images