Signals Processed: Multilevel Non-negative Matrix Factorisation à la FASST

The main graphical description of the FASST
NMF procedure from Alexei's paper [1]

A particularly useful technique I got acquainted with at my old PostDoc position is the use of Non-negative Matrix Factorisation (NMF), in particular how it is being used in the FASST toolbox. While the FASST toolbox is a great comprehensive framework for source separation, its inner workings are a bit obtuse. At my new group here in Oldenburg, my colleagues are interested in NMF - but rather than ripping it out of FASST, I decided to simply reimplement the key algorithm used (it's not very hard). The final MATLAB code can be found here, with a quick test script here (oh, and you'll need this function for testing as well). Read on for full explanation...

The code simply implements equation (30) from the paper by Alexei [1], though with a small simplification (left as an exercise to the reader). Given 5 (or 6) matrices V, W, U, G, H, (and optionally E,) we attempt to refine the approximation

$V \approx WUGH \odot E$,

by updating one of W, U, G, or H. If given, E is fixed, if it is not given, it is simply assumed to be all ones (in FASST, E can also be factorised into 4 matrices). Note that $\odot$ is the Hadamard product (element-wise multiplication of the two matrices). The update comes down to

BCDdinv = 1 ./ (B*C*D);
num = B.' * ( V .* (1./E) .* BCDdinv.^2 ) * D.';
den = B.' * BCDdinv * D.';
C = C .* num ./ den;

and it's just a matter of generating B, C, and D from the inputs; C is the one matrix to be updated. The derivation of the update equation can be found in Nancy's paper [2], where $\beta = 0$ since our cost function is the Itakura-Saito distance.

[1] "A General Flexible Framework for the Handling of Prior Information in Audio Source Separation," Alexei Ozerov, Emmanuel Vincent, and Frédéric Bimbot, IEEE Proc. Audio, Speech, and Language Processing, Volume 20, Issue 4. Also on IEEE Xplore.
[2] "A Tempering Approach for Itakura-Saito Non-Negative Matrix Factorization, with Applications to Music Transcriptions," Nancy Bertin, Cédric Févotte and Roland Badeau, ICASSP 2009. Also on IEEE Xplore.

Signals Processed

ASCIIMath creating images

Friday, July 5, 2013

Multilevel Non-negative Matrix Factorisation à la FASST

No comments:

Post a Comment