Class MFCCStuff
Inherited Members
Namespace: AudioAnalysisTools.DSP
Assembly: AudioAnalysisTools.dll
Syntax
public class MFCCStuff
Methods
| Improve this Doc View SourceAcousticVectors(Double[,], Double[], Boolean, Boolean)
This method assumes that the supplied mfcc matrix DOES NOT contain frame dB (log energy) values in column zero. These are added in from the supplied array of frame log-energies.
Declaration
public static double[, ] AcousticVectors(double[, ] mfcc, double[] frameDbNormed, bool includeDelta, bool includeDoubleDelta)
Parameters
Type | Name | Description |
---|---|---|
Double[,] | mfcc | A matrix of mfcc coefficients. Column zero is empty. |
Double[] | frameDbNormed | log-energy values for the frames. |
Boolean | includeDelta | Whether or not to add delta features. |
Boolean | includeDoubleDelta | Whether or not to add double delta features. |
Returns
Type | Description |
---|---|
Double[,] | A matrix of complete mfcc values with additional deltas, frame energies etc. |
Cepstra(Double[,], Int32)
Declaration
public static double[, ] Cepstra(double[, ] spectra, int coeffCount)
Parameters
Type | Name | Description |
---|---|---|
Double[,] | spectra | |
Int32 | coeffCount |
Returns
Type | Description |
---|---|
Double[,] |
Cepstra(Double[,], Int32, Double[,])
use this version when want to make matrix of Cosines only one time.
Declaration
public static double[, ] Cepstra(double[, ] spectra, int coeffCount, double[, ] cosines)
Parameters
Type | Name | Description |
---|---|---|
Double[,] | spectra | |
Int32 | coeffCount | |
Double[,] | cosines |
Returns
Type | Description |
---|---|
Double[,] |
Cosines(Int32, Int32)
Returns a matrix of cosine basis functions. These are prepared prior to performing a DCT, Discrete Cosine Transform. The rows k = 0 to coeffCount are the basis functions. The columns, m = 0 to M where M = signalLength or the length of the required DCT. The value of m/M ranges from 0 to 1.0. The value of Pim/M ranges from 0 to Pi radians. The value of kPim/M ranges from 0 to kPi radians. WHen k=2, 2Pi radians corresponds to one rotation.
Declaration
public static double[, ] Cosines(int signalLength, int coeffCount)
Parameters
Type | Name | Description |
---|---|---|
Int32 | signalLength | The length of the signal to be processed. e.g. the frequency bin count or filter bank count or ... |
Int32 | coeffCount | The number of basis funcitons = the rquired number of DCT coefficients. |
Returns
Type | Description |
---|---|
Double[,] |
DCT(Double[], Double[,])
Declaration
public static double[] DCT(double[] spectrum, double[, ] cosines)
Parameters
Type | Name | Description |
---|---|---|
Double[] | spectrum | |
Double[,] | cosines |
Returns
Type | Description |
---|---|
Double[] |
DecibelSpectra(Double[,], Double, Int32, Double)
Converts amplitude spectra (in a spectrogram) to dB spectra, normalising for window power and sample rate. NOTE 1: This calculation is done in three separate steps in order to avoid duplicating the tricky calculations in the method GetLogEnergySpectrogram(). NOTE 2: The decibels value is a ratio. Here the ratio is implied. dB = 10*log(amplitude ^2) but in this method adjust power to account for power of Hamming window and SR.
Declaration
public static double[, ] DecibelSpectra(double[, ] amplitudeM, double windowPower, int sampleRate, double epsilon)
Parameters
Type | Name | Description |
---|---|---|
Double[,] | amplitudeM | the amplitude spectra. |
Double | windowPower | value for window power normalisation. |
Int32 | sampleRate | to NormaliseMatrixValues for the sampling rate. |
Double | epsilon | small value to avoid log of zero. |
Returns
Type | Description |
---|---|
Double[,] | a spectrogram of decibel values. |
GetLogEnergySpectrogram(Double[,], Double, Int32, Double)
This method converts the passed matrix of spectrogram energy values, (i.e. squared amplitude values) to log-energy values. This method is used when calculating standard, mel-freq and mfcc spectrograms. In the case of mel-scale, the passed energy spectrogram is output from the mel-frequency filter bank, and the energy values are converted directly to log-energy, normalising for window power and sample rate. Note that the output is log-energy, not decibels: decibels = 10 * log-energy. NOTE 1: THIS METHOD ASSUMES THAT THE LAST FREQ BIN (ie the last matrix column) IS THE NYQUIST FREQ BIN. NOTE 2: THIS METHOD ASSUMES THAT THE FIRST FREQ BIN (ie the first matrix column) IS THE MEAN or DC FREQ BIN. NOTE 3: The window contributes power to the signal which must subsequently be removed from the spectral power. NOTE 4: Spectral power must be normalised for sample rate. Effectively calculate freq power per sample. NOTE 5: The power in all freq bins except f=0 must be doubled because the power spectrum is an even function about f=0; This is due to the fact that the spectrum actually consists of 512 + 1 values, the centre value being for f=0.
Declaration
public static double[, ] GetLogEnergySpectrogram(double[, ] energyM, double windowPower, int sampleRate, double epsilon)
Parameters
Type | Name | Description |
---|---|---|
Double[,] | energyM | the amplitude spectra. |
Double | windowPower | value for window power normalisation. |
Int32 | sampleRate | to NormaliseMatrixValues for the sampling rate. |
Double | epsilon | small value to avoid log of zero. |
Returns
Type | Description |
---|---|
Double[,] | a spectrogram of decibel values. |
GetMelBinBounds(Int32, Int32)
Returns an [N, 2] matrix with bin ID in column 1 and lower Herz bound in column 2 but on Mel scale.
Declaration
public static int[, ] GetMelBinBounds(int nyquist, int melBinCount)
Parameters
Type | Name | Description |
---|---|---|
Int32 | nyquist | |
Int32 | melBinCount |
Returns
Type | Description |
---|---|
Int32[,] |
GetMfccFeatureVector(Double[], Double[,], Int32, Boolean, Boolean)
Constructs a feature vector of MFCCs including deltas and double deltas as requested by user. The dB array has been normalised in 0-1.
Declaration
public static double[] GetMfccFeatureVector(double[] dB, double[, ] matrix, int timeId, bool includeDelta, bool includeDoubleDelta)
Parameters
Type | Name | Description |
---|---|---|
Double[] | dB | log-energy values for the frames. |
Double[,] | matrix | A matrix of mfcc coefficients. Column zero is empty. |
Int32 | timeId | index for the required timeframe. |
Boolean | includeDelta | Whether or not to add delta features. |
Boolean | includeDoubleDelta | Whether or not to add double-delta features. |
Returns
Type | Description |
---|---|
Double[] | a mfcc feature vector for a single time-frame. |
HerzTranform(Double, Double, Double)
this method calculates a user customised version of the fixed mel frequency convernsion in the method Mel(double f).
Declaration
public static double HerzTranform(double f, double c, double div)
Parameters
Type | Name | Description |
---|---|---|
Double | f | this is the linear frequncy in Herz. |
Double | c | this value = 2595.0 in the standard Mel transform. |
Double | div | this value = 700 in the standard Mel transform. |
Returns
Type | Description |
---|---|
Double | Mel frequency. |
InverseHerzTranform(Double, Double, Double)
Declaration
public static double InverseHerzTranform(double m, double c, double div)
Parameters
Type | Name | Description |
---|---|---|
Double | m | |
Double | c | |
Double | div |
Returns
Type | Description |
---|---|
Double |
InverseMel(Double)
Converts a Mel value to Herz. NOTE: By default this Mel scale is linear to 1000 Hz.
Declaration
public static double InverseMel(double mel)
Parameters
Type | Name | Description |
---|---|---|
Double | mel |
Returns
Type | Description |
---|---|
Double | the Herz value. |
LinearFilterBank(Double[,], Int32, Double, Int32, Int32)
Does linear filterbank conversion for sonogram for any frequency band given by minFreq and maxFreq. Performs linear integral as opposed to Mel integral The first step is to calculate the number of filters for the required frequency sub-band.
Declaration
public static double[, ] LinearFilterBank(double[, ] matrix, int filterBankCount, double nyquist, int minFreq, int maxFreq)
Parameters
Type | Name | Description |
---|---|---|
Double[,] | matrix | the sonogram. |
Int32 | filterBankCount | number of filters over full freq range 0 Hz - Nyquist. |
Double | nyquist | max frequency in original spectra. |
Int32 | minFreq | min freq in passed sonogram matrix. |
Int32 | maxFreq | max freq in passed sonogram matrix. |
Returns
Type | Description |
---|---|
Double[,] |
LinearIntegral(Double, Double, Double, Double)
Declaration
public static double LinearIntegral(double x0, double x1, double y0, double y1)
Parameters
Type | Name | Description |
---|---|---|
Double | x0 | |
Double | x1 | |
Double | y0 | |
Double | y1 |
Returns
Type | Description |
---|---|
Double |
LinearIntegral(Int32, Int32, Double, Double)
Declaration
public static double LinearIntegral(int x0, int x1, double y0, double y1)
Parameters
Type | Name | Description |
---|---|---|
Int32 | x0 | |
Int32 | x1 | |
Double | y0 | |
Double | y1 |
Returns
Type | Description |
---|---|
Double |
LinearInterpolate(Double, Double, Double, Double, Double)
Declaration
public static double LinearInterpolate(double x0, double x1, double y0, double y1, double x2)
Parameters
Type | Name | Description |
---|---|---|
Double | x0 | |
Double | x1 | |
Double | y0 | |
Double | y1 | |
Double | x2 |
Returns
Type | Description |
---|---|
Double |
Mel(Double)
Returns a Mel value for the passed Herz value NOTE: According to Wikipedia there is no single objective mel(ody) scale conversion. Mel scale is based on just-noticeable difference in pitch by the ear with ascend pitch. I.E> THis is psycho-acoustic phenomenon. 1000Hz is used as the common reference point i.e. 1000Hz = 1000Mel. In speech processing, typically use a linear conversion below 1000Hz.
Declaration
public static double Mel(double f)
Parameters
Type | Name | Description |
---|---|---|
Double | f |
Returns
Type | Description |
---|---|
Double |
MelFilterBank(Double[,], Int32, Double, Int32, Int32)
Does conversion from linear frequency scale to mel-scale for any frequency band given by minFreq and maxFreq. Uses Greg's MelIntegral The first step is to calculate the number of filters for the required frequency sub-band.
Declaration
public static double[, ] MelFilterBank(double[, ] matrix, int filterBankCount, double nyquist, int minFreq, int maxFreq)
Parameters
Type | Name | Description |
---|---|---|
Double[,] | matrix | the spectrogram. |
Int32 | filterBankCount | number of filters over full freq range 0 Hz - Nyquist. |
Double | nyquist | max frequency in original spectra. |
Int32 | minFreq | min freq in the passed sonogram matrix. |
Int32 | maxFreq | max freq in the passed sonogram matrix. |
Returns
Type | Description |
---|---|
Double[,] |
MelIntegral(Double, Double, Double, Double)
Declaration
public static double MelIntegral(double f0, double f1, double y0, double y1)
Parameters
Type | Name | Description |
---|---|---|
Double | f0 | |
Double | f1 | |
Double | y0 | |
Double | y1 |
Returns
Type | Description |
---|---|
Double |
VocalizationDetection(Double[], Double, Double, Int32, Int32, Int32, Int32[])
Declaration
public static int[] VocalizationDetection(double[] decibels, double lowerDbThreshold, double upperDbThreshold, int k1k2delay, int syllableGap, int minPulse, int[] zeroCrossings)
Parameters
Type | Name | Description |
---|---|---|
Double[] | decibels | |
Double | lowerDbThreshold | |
Double | upperDbThreshold | |
Int32 | k1k2delay | |
Int32 | syllableGap | |
Int32 | minPulse | |
Int32[] | zeroCrossings |
Returns
Type | Description |
---|---|
Int32[] |