Class HarmonicParameters
Parameters needed from a config file to detect the stacked harmonic components of a soundscape. This can also be used for recognizing the harmonics of non-biological sounds such as from turbines, motor-bikes, compressors, hi-revving motors, etc.
Harmonic Event detection
The algorithm to find harmonic events uses a discrete cosine transform
or DCT to find a stack of harmonics or formants. Setting the correct DCT for the target syllable requires additional parameters. Note that for our purposes here, the terms harmonic
and formant
are taken as equivalent.
The algorithm to find harmonic events can be visualized as similar to the [oscillations algorithm]](xref:AnalysisPrograms.Recognizers.Base.OscillationParameters), but rotated by 90 degrees. It uses a DCT oriented in a vertical direction and requires similar additional parameters.
Profiles:
Speech: !HarmonicParameters
FrameSize: 512
FrameStep: 512
SmoothingWindow: 3
# The search band
MinHertz: 500
MaxHertz: 5000
# Min & max duration for a set of harmonics.
MinDuration: 0.2
MaxDuration: 1.0
DecibelThreshold: 2.0
# Min & max Hertz interval between harmonics.
MinFormantGap: 400
MaxFormantGap: 1200
DctThreshold: 0.15
# Event threshold - use this to determine FP/FN trade-off.
EventThreshold: 0.5
Note
Some of these parameters are common to all events, that is, those that determine the search band, the allowable event duration and the decibel threshold —see CommonParameters.
The remaining parameters are unique to the harmonic algorithm and determine the search for harmonics.
There are four parameters specific to Harmonics
: SmoothingWindow
,
MinFormantGap
, MaxFormantGap
and DctThreshold
. SmoothingWindow
sets the window size of a moving average filter that smoothes the frequency bin values in the spectrogram prior to running the DCT. This can be useful when the formants of interest are broken by noise or interrupted. MinFormantGap
and MaxFormantGap
specify the minimum and maximum
allowed interval (measured in Hertz) between adjacent formants/harmonics.
By default, the DCT is calculated over all frequency bins in the search band.
DctThreshold
is a value between 0.0 and 1.0 which sets the sensitivity of the search. A lower value of DctThreshold
will detect more harmonic events.
The output from a DCT operation is an array of coefficients (taking values in
[0, 1]
). The index into the array indicates a particular harmonic interval and the array value at that index indicates magnitude of that interval. The index with largest amplitude
indicates the likely interval between each of the formants. However, if the maximum coefficient is less than the DctThreshold
, a stack of formants is consider not to be present. Lowering the DctThreshold
increases the likelihood that random noise will be accepted as a true set of formants; increasing the DctThreshold
increases the likelihood that a target set of formants is rejected.
Note that to reduce the chances of the DCT algorithm producing an erroneous result, a minimum of three harmonics/formants is required, that is, the fundamental and two higher harmonics. Another way to think of this is that at least two harmonic intervals are required to constitute a stack of harmonics. Despite this precaution, the DCT algorithm is sensitive to noise and you made need to experiment to get the optimum parameter values.
Implements
Inherited Members
Namespace: AnalysisPrograms.Recognizers.Base
Assembly: AudioAnalysisTools.dll
Syntax
[YamlTypeTag(typeof(HarmonicParameters), null)]
public class HarmonicParameters : CommonParameters, IValidatableObject
Properties
| Improve this Doc View SourceDctThreshold
Gets or sets the dctThreshold.
Declaration
public double? DctThreshold { get; set; }
Property Value
Type | Description |
---|---|
Nullable<Double> |
MaxFormantGap
Gets or sets the top bound of gap between formants. Units are Hertz.
Declaration
public int? MaxFormantGap { get; set; }
Property Value
Type | Description |
---|---|
Nullable<Int32> |
MinFormantGap
Gets or sets the bottom bound of the gap between formants. Units are Hertz.
Declaration
public int? MinFormantGap { get; set; }
Property Value
Type | Description |
---|---|
Nullable<Int32> |
SmoothingWindow
Gets or sets a smoothing window. This is used to run a moving average filter along each of the frequency bins. It can help to smooth over discontinuous formants. If applied sensible values are 3, 5, or 7.
Declaration
public int SmoothingWindow { get; set; }
Property Value
Type | Description |
---|---|
Int32 |
Methods
| Improve this Doc View SourceConvertScoreArray2HarmonicEvents(SpectrogramStandard, Double[], Double[], UnitConverters, Int32[], Double, Double, Int32, Int32, Int32, Double, TimeSpan)
Finds harmonic events in an array harmonic scores. NOTE: The score array is assumed to be temporal i.e. each element of the array is derived from a time frame. The method uses the passed scoreThreshold in order to calculate a normalised score. Max possible score := threshold * 5. normalised score := score / maxPossibleScore.
Declaration
public static List<EventCommon> ConvertScoreArray2HarmonicEvents(SpectrogramStandard spectrogram, double[] scores, double[] dBArray, UnitConverters converter, int[] maxIndexArray, double minDuration, double maxDuration, int minHz, int maxHz, int bandBinCount, double scoreThreshold, TimeSpan segmentStartOffset)
Parameters
Type | Name | Description |
---|---|---|
SpectrogramStandard | spectrogram | |
Double[] | scores | the array of harmonic scores. |
Double[] | dBArray | |
UnitConverters | converter | |
Int32[] | maxIndexArray | the array of max index values derived from the DCT. Used to calculate the harmonic interval. |
Double | minDuration | duration of event must exceed this to be a valid event. |
Double | maxDuration | duration of event must be less than this to be a valid event. |
Int32 | minHz | lower freq bound of the event. |
Int32 | maxHz | upper freq bound of the event. |
Int32 | bandBinCount | |
Double | scoreThreshold | threshold. |
TimeSpan | segmentStartOffset | the time offset from segment start to the recording start. |
Returns
Type | Description |
---|---|
List<EventCommon> | a list of acoustic events. |
DetectHarmonicsInSpectrogramData(Double[,], Double, Int32)
A METHOD TO DETECT a set of stacked HARMONICS/FORMANTS in the sub-band of a spectrogram. Developed for GenericRecognizer of harmonics. NOTE 1: This method assumes the matrix is derived from a spectrogram rotated so that the matrix rows are spectral columns of the spectrogram. NOTE 2: As of March 2020, this method averages the values in five adjacent frames. This is to reduce noise. But it requires that the frequency of any potential formants is not changing rapidly. A side-effect is that the edges of harmonic events become blurred. This may not be suitable for detecting human speech. However can reduce the frame step. NOTE 3: This method assumes that the minimum number of formants in a stack = 3. This means that the first 4 values in the returned array of DCT coefficients are set = 0 (see below).
Declaration
public static Tuple<double[], double[], int[]> DetectHarmonicsInSpectrogramData(double[, ] m, double xThreshold, int smoothingWindow)
Parameters
Type | Name | Description |
---|---|---|
Double[,] | m | data matrix derived from the subband of a spectrogram. |
Double | xThreshold | Minimum acceptable value to be considered part of a harmonic. |
Int32 | smoothingWindow |
Returns
Type | Description |
---|---|
Tuple<Double[], Double[], Int32[]> | three arrays: dBArray, intensity, maxIndexArray. |
GetComponentsWithHarmonics(SpectrogramStandard, HarmonicParameters, Nullable<Double>, TimeSpan, String)
Declaration
public static (List<EventCommon> SpectralEvents, List<Plot> DecibelPlots) GetComponentsWithHarmonics(SpectrogramStandard spectrogram, HarmonicParameters hp, double? decibelThreshold, TimeSpan segmentStartOffset, string profileName)
Parameters
Type | Name | Description |
---|---|---|
SpectrogramStandard | spectrogram | |
HarmonicParameters | hp | |
Nullable<Double> | decibelThreshold | |
TimeSpan | segmentStartOffset | |
String | profileName |
Returns
Type | Description |
---|---|
(T1, T2)<List<EventCommon>, List<Plot>> |
GetHarmonicEvents(SpectrogramStandard, Int32, Int32, Int32, Double, Double, Double, Double, Int32, Int32, TimeSpan)
Declaration
public static (List<EventCommon> SpectralEvents, double[] AmplitudeArray, double[] HarmonicIntensityScores) GetHarmonicEvents(SpectrogramStandard spectrogram, int minHz, int maxHz, int smoothingWindow, double decibelThreshold, double dctThreshold, double minDuration, double maxDuration, int minFormantGap, int maxFormantGap, TimeSpan segmentStartOffset)
Parameters
Type | Name | Description |
---|---|---|
SpectrogramStandard | spectrogram | |
Int32 | minHz | |
Int32 | maxHz | |
Int32 | smoothingWindow | |
Double | decibelThreshold | |
Double | dctThreshold | |
Double | minDuration | |
Double | maxDuration | |
Int32 | minFormantGap | |
Int32 | maxFormantGap | |
TimeSpan | segmentStartOffset |
Returns
Type | Description |
---|---|
(T1, T2, T3)<List<EventCommon>, Double[], Double[]> |