Acoustic signal classification
Project overview
I developed a ML model classifier in MATLAB that reconstructs the sequence of notes that were played in the melody. I created 3 possible solutions for the problem: SVM model, Bayes classifier and my own approach with matrix multiplication in frequency domain with harmonics extraction. The best solution got accuracy of 98.2% of correct note classification.
Business Value
When a music band is trying to create a new composition, they may do experiments where they are playing by intuition. In case they want to remember the composition they just accidentally created they record the session. Finally, when they captured the melody they created and enjoyed – they have an audio track which that must listen again and again to memorise, analyse and reconstruct.
With model like mine they could pass the audio file to it and get all notes that were played. Visual analysis of the notes is much faster that audio analysis of the track.
Technical details
To create a sequence of notes played in melody I applied windowed method which is based on dividing the whole melody into small pieces and classified which note is played in this spall range of time. There are only 86 notes that could be played in the melody corresponding to 12 notes for each octave. Therefore, I am trying to solve 86-class classification for each time-window of the music. So, I have a sequence of classifications.
Then, I had gone deep into physics of notes. Any note is a combination of oscillation on frequencies corresponding to harmonics. And timbre of a musical instrument is defined by its harmonic-amplitude distribution
Therefore, to analyse which note is playing I have to define frequencies of harmonics and it make sense to work in frequency domain rather than time domain. So, I applied Furie Transform to get distribution of frequencies from the signal and I applied my techniques to it.
First technique based on my prior knowledge. I constructed my own method to classify a note. First, I found all local maximums on spectrum. Then, I filtered maximums that were not around actual note frequencies. That way I can get an array of frequencies and amplitudes. Below you can see extracted harmonics, their frequencies and amplitudes.
Then I amp this vector to 2D matrix, where one axis corresponds to octaves and other to notes. In this way I can visualise which notes have highest amplitudes. Then I apply manually built matrix multiplication and matrix manipulation which results in the 2D probability matrix, where each amplitude represents probability of this note playing in the melody. Below you can see a Figure representing note probabilities.
Finally, I tried SVM classifier. It assumed each time-window signal spectrum as a dot in N-dimensional space and it learnt how to separate dots that belong to one note or other. This model showed best results in accuracy and was tested with real melody from Mario.
Challenges
Non-separable harmonics. When I cut the big signal into a small time-windowed signal, I reduce number of signal samples. That causes big issues in spectrum field as now spectrum has lowed sampling frequency. Therefore, if two harmonics had a little margin, with lower sampling frequency they can become one. That means spectrum disturbed a lot and analysis can be degraded significantly.
Solution. I tried different methods of increasing sampling frequency. While interpolation of time-domain signal didn’t success, adding zeros after the time-domain signal actually fixed the problem. I realised that this cut signal with zeros after it is actually what is expected after applying Dirichlet window or Rectangular Window. That is how I found that instead of rectangle I could use other window functions to reduce spectrum distribution even more and I have chosen Hamming Window at the end.
Results
I created a set of models to generate a sequence of notes that were played in the melody. My models divided the signal into mall time-windowed signal and performed 86-class classification to define which note was playing in the exact time frame. From experiments the best solution was SVM model; it classified notes played in Mario Solo with 97% accuracy. In Figure below you can see comparison of SVM predicted sequence of notes in Mario Solo (left) to the original sequence (right). The numbers to the left correspond to note number counting from Contra Octave. Horizontal axes represent time.
Future Works
The project has many fields of further development. But the most prominent are ways where the project is migrated from MATLAB to Python and best SVM model is replaced with Neural Networks, as it was proven that they outperform SVM since 2011. There are a lot of possible architectures in Neural Networks to create a sequence out of sequence like seq2seq, apply recurrent neural networks or base network on transformers.