Chord Intelligence mkIV: We Need More Input!

• Chris Liscio

Just as Johnny 5 demonstrated in the 80s, you need as much data as possible to effectively train a deep neural network. Compared to previous iterations, we more than doubled the number of songs in our training set.

There are a number of data sets available for chord detection research, but they are distributed only as a collection of labels—a list of chords with timestamps to indicate when in the recording the chord is active:

0.000000    1.753764    N
1.753764    15.662880   F
15.662880   17.773673   Eb
17.773673   18.450249   F
18.450249   20.526054   Eb

Unfortunately you can’t just go out and grab a package with the source audio that goes with the labels. Even if such a download were made available, I’d not touch it with a ten-foot pole—the recordings are all protected by copyright.

Long before I started work on this deep learning project, I kicked off a many months-long side project of acquiring a butt-load of new (to us) music to match the collection of labels that we had access to.

Armed with the lists of label files, (and the company Visa,) Shelley went off and purchased a few hundred songs—most digitally, and some on physical CDs. As they came in, I ripped, collected, and paired them up with their labels.

Unfortunately the challenge doesn’t end there. If you compared a digital download of a song to a CD-ripped version of the same, the two audio files can end up with different amounts of silence at the start. Comparing songs in my data set to their label files, the differences ranged anywhere from a few milliseconds (insignificant) to a second or so (yikes!)

If this was left uncorrected, my chord detector might learn that sometimes a G in the recording should get marked as a C. This could hinder the training process, and it affects my ability to measure the chord detector's accuracy. For example, a correctly-identified G chord could get penalized because the mis-aligned “ground truth” label says it’s actually a C.

To fix this problem, I had to manually re-align the labels to the audio and verify that the correction lined up. In some rare cases, the labels didn’t line up at all (a radio version of a hard-to-find live recording, perhaps) and those songs needed to get purged from the training data.

Naturally, I wrote a few software tools to help me make the process go more quickly. Still, it required that I listen carefully to portions of each of the problematic songs at least twice—once to identify the need for re-alignment, and one or more times to verify the adjustment that was made. I did most of this (tedious!) correction work in batches, but it took a few days worth of my time to get this all done.

After performing the alignment on a subset of the songs, I was pleased to see that it had a measured improvement on the chord detector’s accuracy. That kept me motivated to keep pushing through the rest of the data set.

In the next post, I’ll share some details about the training environment that I developed for the new chord detector, and the challenge of feeding all this newly-acquired data to the GPU quickly. Stay tuned!