This is a bit of a learning project about how to determine the pitch of sound from a given sound file. The primary goal of this project is to determine the notes being played in wav file and map it to it’s musical note.
This is a rough overview of some code I had written a few years ago using Haskell in this repository. In this project there examples of
MIDI manipulation,
FFT using FFTW,
A custom heatmap chart was created for Charts to view the spectrogram.
The code is rough and isn’t particularely performant, but it works.
Finding the note
The most basic sound possible is a basic since wave. Musically, 440 Hz (middle A) as it is the note most used for tuning instruments. The first test is
Given a sine wave at 440 Hz, the output should be the note A4
The first attempt is perform a Fourier transform of the whole signal, and find the frequency peaks. The number of samples in one wave at 440Hz is
So plotting the first ~200 samples gives us an idea of what the signal looks like:
To make things easier to interpret we will convert the x-axis to seconds:
\[t = \frac{440 \text{ Hz}}{N} \cdot x\]
where \(N\) is the number of samples, \(x\) is the frame number, \(t\) is the time in seconds.
Next we perform a DTFT across the whole signal to find the dominant frequencies of the signal which shows the expected 440 Hz:
The sample we are starting with is just a single sine wave, but the following code works if we have multiple notes. Pretending that the signal changes over time, we will perform a \(\text{STFT}\) on the signal which will produce a list of tuples \((\text{start time}, \text{note})\). The code in the source lives in the function sftfSection which I will omit here since it’s a bit of a mess, but the inputs are:
A window function to apply to the signal (Hann Window in this case).
To filter out noise of low quality signals, for each windowed STFT the frequencies are filtered to find the maxima per window.
Most of the code to perform the STFT looks like this:
Here are the resulting spectrograms (using custom code for Charts):
As we can see the pure sine wave produces a really clear signal at 440Hz but the piano sample has other signals at higher frequencies, particularely overtones which occur at multiples of 440Hz.
Midi library example
Now that we have the frequencies of the signal, we can generate a Midi file with the midi package.
Before using the Midi package, we need to learn how to use it. Here’s a quick example that plays two notes within 100 frames after each other:
Converting notes to MIDI
To produce a MIDI, we transform this into a list of MIDIEvents at time t, then concatenate that list to producing a single MIDIEvent:
Learnings
This general flow works easily enough for very simple audio, but once polyphony or even regular instruments get introduced the results become very messy due to the wide range of signal produced. Adjusting the window length has some noticible changes to the spectrograms, and highlighted that there isn’t a universal window length that works for most cases.
I don’t feel like Haskell provided much value in writing this. The ability to create types denoting note values was quite useful, but the transformations between the many different datatypes to go from the FFT double to a MIDI double was more trouble than it was worth and produces fairly noisey code. Using something out of the box be easier, but not as much of a learning experience.
About Me
I do software development, DevOps, and AI engineering.