Authors: Tung Nguyen
Advisor: Edward M. Reingold

Details

Fast Fourier Transform has been the backbone of signal processing. Under professor Edward M. Reingold, I conducted an independent research study on Fast Fourier Transform and its application on audio processing.

The goal of the project is to create a music genre-conversion model. Given an audio file, the model would algorithmatically modify the song to a different genre. The problem statement would then be:

Create a Folk version of Nirvana's Smells Like Teen Spirit using Deep Neural Network

Approach

  • Created an adversarial network including one conversion model and an adversarial music genre classifier.
  • Using Short Time Fourier Transform, vectorized audio files in GTZAN Dataset.
  • Using vectorize audio as input, created a music genre classification model.
  • Trained a music generation model for each genre using GTZAN as our training data. These models are called genre blueprints.

Results

Folk -> Rock conversion on Rock blueprints achieved 60% accuracy on genre classifier.

Original Folk song
An example of Folk -> Rock