Acoustic signal classification

Project overview

I developed a ML model classifier in MATLAB that reconstructs the sequence of notes that were played in the melody. I created 3 possible solutions for the problem: SVM model, Bayes classifier and my own approach with matrix multiplication in frequency domain with harmonics extraction. The best solution got accuracy of 98.2% of correct note classification.

Business Value

When a music band is trying to create a new composition, they may do experiments where they are playing by intuition. In case they want to remember the composition they just accidentally created they record the session. Finally, when they captured the melody they created and enjoyed – they have an audio track which that must listen again and again to memorise, analyse and reconstruct.

With model like mine they could pass the audio file to it and get all notes that were played. Visual analysis of the notes is much faster that audio analysis of the track.

Technical details

To create a sequence of notes played in melody I applied windowed method which is based on dividing the whole melody into small pieces and classified which note is played in this spall range of time. There are only 86 notes that could be played in the melody corresponding to 12 notes for each octave. Therefore, I am trying to solve 86-class classification for each time-window of the music. So, I have a sequence of classifications.

Then, I had gone deep into physics of notes. Any note is a combination of oscillation on frequencies corresponding to harmonics. And timbre of a musical instrument is defined by its harmonic-amplitude distribution

Therefore, to analyse which note is playing I have to define frequencies of harmonics and it make sense to work in frequency domain rather than time domain. So, I applied Furie Transform to get distribution of frequencies from the signal and I applied my techniques to it.

First technique based on my prior knowledge. I constructed my own method to classify a note. First, I found all local maximums on spectrum. Then, I filtered maximums that were not around actual note frequencies. That way I can get an array of frequencies and amplitudes. Below you can see extracted harmonics, their frequencies and amplitudes.

Then I amp this vector to 2D matrix, where one axis corresponds to octaves and other to notes. In this way I can visualise which notes have highest amplitudes. Then I apply manually built matrix multiplication and matrix manipulation which results in the 2D probability matrix, where each amplitude represents probability of this note playing in the melody. Below you can see a Figure representing note probabilities.

Note probabilities matrix

Note classification is always a highly probabilistic task. Even when one note is playing, it might be difficult to distinguish which octave is original and which is active due to resonance. One of the tricks is that frequency might be powered from any direction - both from higher and lower frequencies. I might discuss details on this in a separate blog post.

Matrix of statistical parameters for Bayes Classifier

Then I tried a Bayes Classifier. It learned statistical parameters like mean and variance for each frequency in frequency domain signal for each of the notes. That is how it can give probabilistic output on each note which could be played in the time window. In Figure below you see variance of each frequency sample of each of 86 notes. This 2D matrix visualise the parameters Bayes classifier learnt.

Finally, I tried SVM classifier. It assumed each time-window signal spectrum as a dot in N-dimensional space and it learnt how to separate dots that belong to one note or other. This model showed best results in accuracy and was tested with real melody from Mario.

Challenges

Non-separable harmonics. When I cut the big signal into a small time-windowed signal, I reduce number of signal samples. That causes big issues in spectrum field as now spectrum has lowed sampling frequency. Therefore, if two harmonics had a little margin, with lower sampling frequency they can become one. That means spectrum disturbed a lot and analysis can be degraded significantly. 

Solution. I tried different methods of increasing sampling frequency. While interpolation of time-domain signal didn’t success, adding zeros after the time-domain signal actually fixed the problem. I realised that this cut signal with zeros after it is actually what is expected after applying Dirichlet window or Rectangular Window. That is how I found that instead of rectangle I could use other window functions to reduce spectrum distribution even more and I have chosen Hamming Window at the end.

Results

I created a set of models to generate a sequence of notes that were played in the melody. My models divided the signal into mall time-windowed signal and performed 86-class classification to define which note was playing in the exact time frame. From experiments the best solution was SVM model; it classified notes played in Mario Solo with 97% accuracy. In Figure below you can see comparison of SVM predicted sequence of notes in Mario Solo (left) to the original sequence (right). The numbers to the left correspond to note number counting from Contra Octave. Horizontal axes represent time.

Future Works

The project has many fields of further development. But the most prominent are ways where the project is migrated from MATLAB to Python and best SVM model is replaced with Neural Networks, as it was proven that they outperform SVM since 2011. There are a lot of possible architectures in Neural Networks to create a sequence out of sequence like seq2seq, apply recurrent neural networks or base network on transformers.

Previous
Previous

Multithreading in FreeRTOS [writing…]

Next
Next

Smart Light Alarm