Signals Processed: Audio Processing

Showing posts with label Audio Processing. Show all posts

Tuesday, April 24, 2018

The disappearance and reappearance of the DEMAND corpus

The DEMAND ("Diverse Environments Multichannel Acoustic Noise Database") Corpus is one of my most successful creations - at the time I'm writing this post, Google Scholar claims there are 43 citations of the main article describing it. (And I blogged about it here, here, here and here) So it was a bit of a nasty surprise when Professor Chan (of Queens University in Kingston, Ontario; a good friend of my Ph.D. supervisor) dropped me a note telling me the database had disappeared off the internet.

To my dismay, I did not have a copy of the data on my own harddrives either, except for the files sampled at 16 kHz; luckily after some frantic emailing, I was told by my former colleagues Remy and Nancy that INRIA still has backups of the original files and Emmanuel has a backup of the website along with the HTML and descriptive PDF. Note to self: KEEP WELL ORGANIZED BACKUPS!

There was still the problem of finding a new home to host the data, and Emmanuel suggested Zenodo, a platform funded by CERN and the EU for open-access data: DEMAND fits the bill pretty well. As a bonus, the dataset now has a "proper" DOI:

I hope the new home of DEMAND is more permanent than the old one; ot certainly looks good. I'll be putting my HRTF database (blog post, brief preliminary conference paper) on there too once the main journal article describing it is vetted - actually the data is already there, it just needs to be released.

Enjoy the data! (And let's hope the new location will pop up at the top of Google when using the search terms "DEMAND noise", as the old one did!)

Wednesday, January 3, 2018

A trip to the library...

When I started thinking about resampling and the filters used in resampling I leaned heavily on the book by P. P. Vaidyanathanm, "Multirate Systems And Filter Banks", the "brown book" that every signal processing practitioner should have on his or her shelf (or on a shelf of a nearby colleague)!

Like many articles discussing this topic (e.g. the documentation of this MATLAB function), the key paper always referenced is Kaiser, James F. “Nonrecursive Digital Filter Design Using the I0-Sinh Window Function.” Proceedings of the 1974 IEEE International Symposium on Circuits and Systems. April, 1974, pp. 20–23. Being curious about how the equations given in Vaidyanathanm were derived, I wanted to see the original paper - but unfortunately, the earliest proceedings of ISCAS that can be found in IEEE Xplore are from 1988 - 14 years after the one I'm after.

Luckily, I happen to be on Christmas vacation at my inlaws, which live in Kitchener, which is right next door to Waterloo - home to one of the best engineering universities of the world. Unsurprisingly, they have a well-stocked library, which includes the above conference proceeding booklet and, braving the cold, I now have a copy for my own archives.

I could only copy it by taking snaps using my cellphone - the copiers need a special card to use, and the scanners need a UWaterloo login, but for reading the content, that is sufficient with today's smartphones. It's only for personal use, and I do hope that the IEEE will eventually get around to scanning in the older papers!

Papers from that era have a certain charm. Typewriter written equations. FORTRAN code. I believe the below should be FORTRAN66 (due to the IF statement; I don't think FORTRAN IV had that, and FORTRAN77 obviously didn't exist yet). Without looking it up, do you know how that IF statement works? (Explanation at end)

As for what I learned from the paper (on first quick read): those formulas relating N and beta to the Stopband attenuation and transition band width were fitted to empirical data. I expected as much - but still I wanted to confirm that to myself. (If on a closer read I turn out to be wrong, I'll correct this post)

So there's one item off my list of things to to in 2018. Visit the UWaterloo Campus and its library. Good fun.

-------

The FORTRAN IF statement: That is known as the "arithmetic IF" or "three-way IF" statement. The expression after the IF is evaluated, and if it is negative, the execution jumps to the first label ("2", exiting the subroutine). If the result is 0, we go to the second label ("1"), and if it is positive, to the third (also "1"). Here, each term of the power series expansion is calculated, and as soon as the new term is less than 2x10^-9 of the approximation, the code terminates. And IIRC, FORTRAN is call-by-reference.

Sunday, May 7, 2017

NNResample package on PyPI

In the previous post, I described my filter generation method for scipy.signal.resample_poly, where the filter is designed such that the first null if the filter is on Nyquist. For easier inclusion into my own projects, I made it into a package as a self-standing function, and then decided to make that my first submission to PyPI. The package depends only on numpy and scipy>=0.18.0.

So, if you want to use my resampler, you can now simply do `pip install nnresample`. To report bugs, the link to the repository is https://github.com/jthiem/nnresample.

I've included one "special feature": since it takes a little time to generate the filter (to search for the null and regenerate the filter afterwards) the generated filter is cached in a global for re-use.

As for the name, since I'm calling it "Null-on-Nyquist Resampler", the obvious name should be non-resample, but I think that could be confusing...

Wednesday, May 3, 2017

Resampling in Python: Electric Bugaloo

In a previous post, I looked at some sample rate conversion methods for Python, at least for audio data. I did some more digging into it, and thanks to a note by Prof. Christian Muenker from the Munich University of Applied Sciences I was made aware of scipy.signal.resample_poly (new in scipy since 0.18.0). This lead me down a bit of a rabbit-hole and I ended up with a Jupyter Notebook which I'm not going to copy-paste here since there is quite a bit of code and some LaTeX in there. Here is the link to it instead.

For the impatient, here are the interesting bits:

def resample_poly_filter(up, down, beta=5.0, L=16001):
    
    # *** this block STOLEN FROM scipy.signal.resample_poly ***
    # Determine our up and down factors
    # Use a rational approximation to save computation time on really long
    # signals
    g_ = gcd(up, down)
    up //= g_
    down //= g_
    max_rate = max(up, down)

    sfact = np.sqrt(1+(beta/np.pi)**2)
            
    # generate first filter attempt: with 6dB attenuation at f_c.
    filt = firwin(L, 1/max_rate, window=('kaiser', beta))
    
    N_FFT = 2**19
    NBINS = N_FFT/2+1
    paddedfilt = np.zeros(N_FFT)
    paddedfilt[:L] = filt
    ffilt = np.fft.rfft(paddedfilt)
    
    # now find the minimum between f_c and f_c+sqrt(1+(beta/pi)^2)/L
    bot = int(np.floor(NBINS/max_rate))
    top = int(np.ceil(NBINS*(1/max_rate + 2*sfact/L)))
    firstnull = (np.argmin(np.abs(ffilt[bot:top])) + bot)/NBINS
    
    # generate the proper shifted filter
    filt2 = firwin(L, -firstnull+2/max_rate, window=('kaiser', beta))
    
    return filt2

plt.figure(figsize=(15,3))
wfilt = resample_poly_filter(P, Q, L=2**16+1)
plt.specgram(scipy_signal.resample_poly(sig, P, Q, window=wfilt)*30, scale='dB', Fs=P, NFFT=256)
plt.colorbar()
plt.axis((0,2,0,Q/2))

Recycling my test sweep from the previous post, I get:


Sweep resampled using my own filter

But really, please read the Notebook. Comments are welcome!

Monday, March 20, 2017

Publication update

This blog has been basically been inactive since last October since being a PostDoc means that there are a whole bunch other things that I am busy with. And of course, it was winter - that means statistically, there is always someone in the family who is sick (kids bring home every germ that is going around...).

View of Kiel. Source: Johannes Barre 2006, on Wikipedia

But it is spring now! And I have just returned from DAGA 2017 in Kiel (where we found a very nice Thai restaurant) so time to update some of my work!

First off, my colleagues in Hannover published "Customized high performance low power processor for binaural speaker localization" at ICECS 2016 in Monte Carlo, Monaco (paper on IEEE Xplore), there was the Winter plenary of Hearing4all, and at DAGA 2017, I presented "Pitch features for low-complexity online speaker tracking", and Sarina (Ph.D. student I'm co-supervising) presented "A distance measure to combine monaural and binaural auditory cues for sound source segregation", both of which can be found on my homepage. In the pipeline is now "Real-time Implementation of a GMM-based Binaural Localization Algorithm on a VLIW-SIMD Processor" by Christopher, which has been accepted and will be presented at ICME 2017 in Hong Kong in July, and I submitted a paper ("Segregation and Linking of Speech Glimpses in Multisource Scenarios on a Hearing Aid") to EUSIPCO 2017; that one is still in review.

I was also teaching a class in the past semester ("5.04.4223 Introduction into Music Information Retrieval") which, because it's a brand new class took a crazy amount of work to prepare for - but I think the students really enjoyed it, and I saw some very good code being written for the final project.

Now back to real work (writing more papers, that is)! (Well, there's one or two topics I'll put on the blog in the next little while, too. Later.)

Wednesday, September 14, 2016

MLSP2016 paper: Speaker Tracking for Hearing Aids

MLSP 2016 poster, the print version
can be found here.

Yesterday, I presented my poster at the 2016 IEEE International Workshop on Machine Learning for Signal processing. I think it was received pretty well, there were several people that talked to me, and we had very good discussions. The biggest problem (and typical for all poster sessions) was that there were other good posters being presented, and I couldn't really spend time talking to the other authors at that session. However, over the next few days I'll have a chance to chat with them, so it's all good.

The beach of the conference venue,
with view towards Salerno.

My own paper I'm presenting is entitled "Speaker Tracking for Hearing Aids", and it basically a method to link speech utterances spoken at different times by the same speaker, a classic problem also found in speaker diarization (but I don't need to do segmentation). My method however is optimized for low computational complexity (for hearing aids), yet reaches comparable performance to typical far more complex methods. You can find the abstract, paper, and poster (seen in the pic) on my homepage.

Overall, I like these small, highly focused conferences - and being in a nice sunny environment is not to be sneezed at either.

Wednesday, August 31, 2016

A library of Gammatone Filterbanks in Python

I have already programmed gammatone filterbanks several times in MATLAB, but in for some recent work I needed a specific one (the One-Zero Gammatone filterbank) [Katsiamis, 2006] in Python.

Gammatone impulse responses, in time domain

In addition, I wanted my "old" FIR GTFB (from my Ph.D. Thesis) to be in there as well as the GTFB of a former colleague here in Oldenburg [Chen, 2015]. So I packed them all up in a library - I also wanted to learn how to make a PyPI compatible Python library.

This is a work in progress. The gammatone filterbanks mentioned are there, but momentarily I don't have time to polish the code; furthermore, I'd like to eventually add the Slaney GTFB and the Hohmann GTFB - not to mention the structure needs more cleanup. Consider this v0.01, for educational purposes. You can find it on github.

Thursday, August 11, 2016

Audio Resampling in Python

(Note: this post was written as a Jupyter Notebook which can be found with the Python code at https://nbviewer.jupyter.org/gist/jthiem/0bb8b2296c8dbd0b83bfab8270a8eb24.)

UPDATE: You can resample with nothing but scipy and one extra function now. ~~See this new post.~~
I now have written a package that you can install using pip.

One of the most basic tasks in audio processing anyone would need to do is resampling audio files; seems like the data you want to process is never sampled in the rate you want. 44.1k? 16k? 8k? Those are the common ones; there are some really odd ones out there.

Resampling is actually quite hard to get right. You need to properly choose your antialias filters, and write a interpolation/decimation procedure that won't introduce too much noise. There have been books written on this topic. For the most part, there is no single univeral way to to this right, since context matters.

For a more detailed discussion on sample rate conversion see the aptly named "Digital Audio Resampling Homepage" [https://ccrma.stanford.edu/~jos/resample/]. Then look at [http://src.infinitewave.ca/] for a super informative comparison of a ton of resampling implementations. No seriously, go there. (It inspired me to compare the methods below using a sine sweep.)

MATLAB has the built-in resample function. Everyone uses it. It works, but also sucks in many aspects - a much better function is in the DSP toolbox, but as near as I can tell, noone uses it (even when copious other parts of the DSP toolbox are being used!)

But I'm not talking about MATLAB, I'm talking about Python here, and it doesn't have a default resampling method - except if you're doing any kind of numerical computing, you'll be using numpy and probably scipy --- and the latter has a built-in resample function.

To get ahead of myself a bit, scipy.signal.resample sucks for audio resampling. That becomes apparent quite quickly - it works in frequency domain, by basically truncation or zero-padding the signal in the frequency domain. This is quite ugly in time domain (especially since it assumes the signal to be circular).

There are two alternatives I want to point at that are "turn-key", that is, you just have to specify your resampling ratio, and the library does the rest. I know of a few others, but at the time of writing this they didn't seem as easy to use (eg. just doing the interpolation/decimation, but not calculating a filter.) If there are other notable libraries I should know of, please mention it in the comments of the blog version of this notebook [https://signalsprocessed.blogspot.de].

The first alternative is scikit.resample. Using it has two problems: it relies on an external (C-language) library called "Secret Rabbit Code" (SRC as in "sample rate conversion", geddit?) aka libsamplerate [http://www.mega-nerd.com/SRC/]. Not a problem in itself, but you need to install it and this step is not automated in pip and friends (as far as I know at time of writing). (Problem for me was that I needed to run it on a platform where I couldn't do this easily.) Problem 2: scikit.resample 0.3.3 is Python 2 only, and the Python 3 version is unofficial (0.4.0-dev, I used [https://github.com/gregorias/samplerate]).

The other one is resampy, which can be found in PyPI or [https://github.com/bmcfee/resampy]. It is easy to install and works with Python 3 (librosa now uses it as preferred resampler over libsamplerate)

TL;DR: if you can install external libs, use scikit.resample (0.4.0-dev). resampy is faster, but not as good, but entirely reasonable. (If MATLAB resample is good enough for you, resampy will serve you just fine and is easier to install)

Now for a comparison with pretty pictures.

Downsampling a sweep

A common task is to convert between the CD sampling rate of 44.1 kHz, and the multiples of 8kHz that originated from the telekon industry. In particular, 48kHz. (Supposedly the rate of 44.1k was chosen in part because is was difficult to convert to 48 kHz; but it is also connected to NTSC timing). So let's have a sweep sampled at 48 kHz. (See the Jupyter Notebook for the Python code)

Sweep 100-23900 Hz, sampled at 48 kHz.

Now convert it to 44.1 kHz using the different libraries, and plot the spectrograms.

Same sweep downsampled to 44.1 kHz using scikit.resample.

scikit.samplerate (or libsamplerate AKA "Secret Rabbit Code") does it well. Almost no aliasing, yet close to Nyquist, and noise levels in line with the input. Note that once the sweep is above Nyquist, the output is basically nil.

The same sweep but now downsampled using scipy.signal.resample.

And this is scipy.signal.resample. Throughout the entire signal there is a high-frequency tone, and there is considerable energy even if the sweep is above Nyquist. This is a result of the frequency-domain processing operating on the entire signal at one. This is NOT suited to audio signals.

The same sweep downsampled using resampy.

Finally, resampy. There are some funny nonlinear effects, but at very low level - for audio applications generally nothing to worry about. Even the aliasing is below -60 dB. So, while not as good as scikit.resample is is useable and certainly better than scipy.signal.resample!

Upsampling an impulse

A way to examine the antialias filter is to upsample an impulse. Here, note a frustrating aspect of all these differing libraries: the interface is completely different beween all of them; and to boot, resampy and scikit.resample give different length outputs for the same upsampling ratio. This means you can't just switch between them using a from foo import resample statement, you'll need a wrapper function if you want to switch freely amongst them (see librosa as an example).

>>> us1 = sk_samplerate.resample(impulse, P/Q, 'sinc_best')
>>> us2 = scipy_signal.resample(impulse, int(len(impulse)*P/Q))
>>> us3 = resampy.resample(impulse, Q, P)
>>> print(us1.shape, us2.shape, us3.shape)
(1087,) (1088,) (1088,)

Frequency response of an impulse upsampled from 44.1 kHz to 48 kHz

'scikit.resample' certainly uses the best antialias filter, with a steep slope, allowing very little energy over Nyquist. 'resampy' is a more gentle filter (lower order, perhaps?) but decent. 'scipy.signal.resample'... well, anyways.

What do the impulse responses look like in time domain?

Time-domain impulse upsampled from 44.1 kHz to 48 kHz

The same section of the impulse, but with samples converted into dB (10*log10 magnitude squared)

Unsurprisingly, 'resampy' is the most compact, which explains the gentler cut-off. 'scikit.resample' is wider, but acceptable. 'scipy.signal.resample' sticks out like a sore thumb.

Speed comparison

>>> %timeit sk_samplerate.resample(sig, Q/P, 'sinc_best')
10 loops, best of 3: 98.1 ms per loop
>>> %timeit scipy_signal.resample(np.float64(sig), int(len(sig)*Q/P))
100 loops, best of 3: 4.52 ms per loop
>>> %timeit resampy.resample(np.float64(sig), P, Q)
10 loops, best of 3: 40.7 ms per loop

And here is a notable merit of 'scipy.signal.resample'. It is faster by an order of magnitude compared to the other methods. I suspect that if you make sure your signals are of length 2^N

2^{N}

, you'll get even faster results, since it'll switch to a FFT instead of a DFT. The other two are probably losing some speed in the passing of data from Python to C - but fundamentally, frequency domain techniques do tend to be fast. So keep that in mind you you need speed.

Conclusion

I'll just repeat the above TL;DR: use scikit.resample (0.4.0-dev) if you can install libsamplerate in a sane place. Else, use resampy, which is faster, but not as good --- but entirely reasonable. Use 'scipy.signal.resample' only if you really need the speed, and be aware of its shortcomings (note the circular assumption!).

ASCIIMath creating images