GitHub Repository: farzanaanjum/Music-Genre-Classification-with-Python
Path: blob/master/Music_genre_classification.ipynb
¹³³ views

Kernel: Python 3

Music genre classification notebook

Importing Libraries

In [0]:

# feature extractoring and preprocessing data
import librosa
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import os
from PIL import Image
import pathlib
import csv

# Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler

#Keras
import keras

import warnings
warnings.filterwarnings('ignore')

Extracting music and features

Dataset

We use GTZAN genre collection dataset for classification.

The dataset consists of 10 genres i.e

Blues
Classical
Country
Disco
Hiphop
Jazz
Metal
Pop
Reggae
Rock

Each genre contains 100 songs. Total dataset: 1000 songs

Extracting the Spectrogram for every Audio

In [0]:

cmap = plt.get_cmap('inferno')

plt.figure(figsize=(10,10))
genres = 'blues classical country disco hiphop jazz metal pop reggae rock'.split()
for g in genres:
    pathlib.Path(f'img_data/{g}').mkdir(parents=True, exist_ok=True)     
    for filename in os.listdir(f'./MIR/genres/{g}'):
        songname = f'./MIR/genres/{g}/{filename}'
        y, sr = librosa.load(songname, mono=True, duration=5)
        plt.specgram(y, NFFT=2048, Fs=2, Fc=0, noverlap=128, cmap=cmap, sides='default', mode='default', scale='dB');
        plt.axis('off');
        plt.savefig(f'img_data/{g}/{filename[:-3].replace(".", "")}.png')
        plt.clf()

All the audio files get converted into their respective spectrograms .WE can noe easily extract features from them.

Extracting features from Spectrogram

We will extract

Mel-frequency cepstral coefficients (MFCC)(20 in number)
Spectral Centroid,
Zero Crossing Rate
Chroma Frequencies
Spectral Roll-off.

In [0]:

header = 'filename chroma_stft rmse spectral_centroid spectral_bandwidth rolloff zero_crossing_rate'
for i in range(1, 21):
    header += f' mfcc{i}'
header += ' label'
header = header.split()

Writing data to csv file

We write the data to a csv file

In [0]:

file = open('data.csv', 'w', newline='')
with file:
    writer = csv.writer(file)
    writer.writerow(header)
genres = 'blues classical country disco hiphop jazz metal pop reggae rock'.split()
for g in genres:
    for filename in os.listdir(f'./MIR/genres/{g}'):
        songname = f'./MIR/genres/{g}/{filename}'
        y, sr = librosa.load(songname, mono=True, duration=30)
        chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr)
        spec_cent = librosa.feature.spectral_centroid(y=y, sr=sr)
        spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
        rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
        zcr = librosa.feature.zero_crossing_rate(y)
        mfcc = librosa.feature.mfcc(y=y, sr=sr)
        to_append = f'{filename} {np.mean(chroma_stft)} {np.mean(rmse)} {np.mean(spec_cent)} {np.mean(spec_bw)} {np.mean(rolloff)} {np.mean(zcr)}'    
        for e in mfcc:
            to_append += f' {np.mean(e)}'
        to_append += f' {g}'
        file = open('data.csv', 'a', newline='')
        with file:
            writer = csv.writer(file)
            writer.writerow(to_append.split())

The data has been extracted into a data.csv file.

Analysing the Data in Pandas

In [6]:

data = pd.read_csv('data.csv')
data.head()

Out[6]:

In [7]:

data.shape

Out[7]:

(1000, 28)

In [0]:

# Dropping unneccesary columns
data = data.drop(['filename'],axis=1)

Encoding the Labels

In [0]:

genre_list = data.iloc[:, -1]
encoder = LabelEncoder()
y = encoder.fit_transform(genre_list)

Scaling the Feature columns

In [0]:

scaler = StandardScaler()
X = scaler.fit_transform(np.array(data.iloc[:, :-1], dtype = float))

Dividing data into training and Testing set

In [0]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [12]:

len(y_train)

Out[12]:

800

In [13]:

len(y_test)

Out[13]:

200

In [14]:

X_train[10]

Out[14]:

array([-0.9149113 ,  0.18294103, -1.10587131, -1.3875197 , -1.14640873,
       -0.97232926, -0.29174214,  1.20078936, -0.68458101, -0.55849017,
       -1.27056582, -0.88176926, -0.74844069, -0.40970382,  0.49685952,
       -1.12666045,  0.59501437, -0.39783853,  0.29327275, -0.72916871,
        0.63015786, -0.91149976,  0.7743942 , -0.64790051,  0.42229852,
       -1.01449461])

Classification with Keras

Building our Network

In [0]:

from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(256, activation='relu', input_shape=(X_train.shape[1],)))

model.add(layers.Dense(128, activation='relu'))

model.add(layers.Dense(64, activation='relu'))

model.add(layers.Dense(10, activation='softmax'))

In [0]:

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [19]:

history = model.fit(X_train,
                    y_train,
                    epochs=20,
                    batch_size=128)

Out[19]:

Epoch 1/20
800/800 [==============================] - 1s 811us/step - loss: 2.1289 - acc: 0.2400
Epoch 2/20
800/800 [==============================] - 0s 39us/step - loss: 1.7940 - acc: 0.4088
Epoch 3/20
800/800 [==============================] - 0s 37us/step - loss: 1.5437 - acc: 0.4450
Epoch 4/20
800/800 [==============================] - 0s 38us/step - loss: 1.3584 - acc: 0.5413
Epoch 5/20
800/800 [==============================] - 0s 38us/step - loss: 1.2220 - acc: 0.5750
Epoch 6/20
800/800 [==============================] - 0s 41us/step - loss: 1.1187 - acc: 0.6288
Epoch 7/20
800/800 [==============================] - 0s 37us/step - loss: 1.0326 - acc: 0.6550
Epoch 8/20
800/800 [==============================] - 0s 44us/step - loss: 0.9631 - acc: 0.6713
Epoch 9/20
800/800 [==============================] - 0s 47us/step - loss: 0.9143 - acc: 0.6913
Epoch 10/20
800/800 [==============================] - 0s 37us/step - loss: 0.8630 - acc: 0.7125
Epoch 11/20
800/800 [==============================] - 0s 36us/step - loss: 0.8095 - acc: 0.7263
Epoch 12/20
800/800 [==============================] - 0s 37us/step - loss: 0.7728 - acc: 0.7700
Epoch 13/20
800/800 [==============================] - 0s 36us/step - loss: 0.7433 - acc: 0.7563
Epoch 14/20
800/800 [==============================] - 0s 45us/step - loss: 0.7066 - acc: 0.7825
Epoch 15/20
800/800 [==============================] - 0s 43us/step - loss: 0.6718 - acc: 0.7787
Epoch 16/20
800/800 [==============================] - 0s 36us/step - loss: 0.6601 - acc: 0.7913
Epoch 17/20
800/800 [==============================] - 0s 36us/step - loss: 0.6242 - acc: 0.7963
Epoch 18/20
800/800 [==============================] - 0s 44us/step - loss: 0.5994 - acc: 0.8038
Epoch 19/20
800/800 [==============================] - 0s 42us/step - loss: 0.5715 - acc: 0.8125
Epoch 20/20
800/800 [==============================] - 0s 39us/step - loss: 0.5437 - acc: 0.8250

In [20]:

test_loss, test_acc = model.evaluate(X_test,y_test)

Out[20]:

200/200 [==============================] - 0s 244us/step

In [21]:

print('test_acc: ',test_acc)

Out[21]:

test_acc:  0.68

Tes accuracy is less than training dataa accuracy. This hints at Overfitting

Validating our approach

Let's set apart 200 samples in our training data to use as a validation set:

In [0]:

x_val = X_train[:200]
partial_x_train = X_train[200:]

y_val = y_train[:200]
partial_y_train = y_train[200:]

Now let's train our network for 20 epochs:

In [37]:

model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(X_train.shape[1],)))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(partial_x_train,
          partial_y_train,
          epochs=30,
          batch_size=512,
          validation_data=(x_val, y_val))
results = model.evaluate(X_test, y_test)

Out[37]:

Train on 600 samples, validate on 200 samples
Epoch 1/30
600/600 [==============================] - 1s 1ms/step - loss: 2.3074 - acc: 0.0950 - val_loss: 2.1857 - val_acc: 0.2850
Epoch 2/30
600/600 [==============================] - 0s 65us/step - loss: 2.1126 - acc: 0.3783 - val_loss: 2.0936 - val_acc: 0.2400
Epoch 3/30
600/600 [==============================] - 0s 59us/step - loss: 1.9535 - acc: 0.3633 - val_loss: 1.9966 - val_acc: 0.2600
Epoch 4/30
600/600 [==============================] - 0s 58us/step - loss: 1.8082 - acc: 0.3833 - val_loss: 1.8713 - val_acc: 0.3250
Epoch 5/30
600/600 [==============================] - 0s 59us/step - loss: 1.6663 - acc: 0.4083 - val_loss: 1.7302 - val_acc: 0.3450
Epoch 6/30
600/600 [==============================] - 0s 52us/step - loss: 1.5329 - acc: 0.4550 - val_loss: 1.6233 - val_acc: 0.3700
Epoch 7/30
600/600 [==============================] - 0s 62us/step - loss: 1.4236 - acc: 0.4850 - val_loss: 1.5402 - val_acc: 0.3950
Epoch 8/30
600/600 [==============================] - 0s 57us/step - loss: 1.3250 - acc: 0.5117 - val_loss: 1.4655 - val_acc: 0.3800
Epoch 9/30
600/600 [==============================] - 0s 52us/step - loss: 1.2338 - acc: 0.5633 - val_loss: 1.3927 - val_acc: 0.4650
Epoch 10/30
600/600 [==============================] - 0s 61us/step - loss: 1.1577 - acc: 0.5983 - val_loss: 1.3338 - val_acc: 0.5500
Epoch 11/30
600/600 [==============================] - 0s 64us/step - loss: 1.0981 - acc: 0.6317 - val_loss: 1.3111 - val_acc: 0.5550
Epoch 12/30
600/600 [==============================] - 0s 52us/step - loss: 1.0529 - acc: 0.6517 - val_loss: 1.2696 - val_acc: 0.5400
Epoch 13/30
600/600 [==============================] - 0s 52us/step - loss: 0.9994 - acc: 0.6567 - val_loss: 1.2480 - val_acc: 0.5400
Epoch 14/30
600/600 [==============================] - 0s 65us/step - loss: 0.9673 - acc: 0.6633 - val_loss: 1.2384 - val_acc: 0.5700
Epoch 15/30
600/600 [==============================] - 0s 58us/step - loss: 0.9286 - acc: 0.6633 - val_loss: 1.1953 - val_acc: 0.5800
Epoch 16/30
600/600 [==============================] - 0s 59us/step - loss: 0.8849 - acc: 0.6783 - val_loss: 1.2000 - val_acc: 0.5550
Epoch 17/30
600/600 [==============================] - 0s 61us/step - loss: 0.8621 - acc: 0.6850 - val_loss: 1.1743 - val_acc: 0.5850
Epoch 18/30
600/600 [==============================] - 0s 61us/step - loss: 0.8195 - acc: 0.7150 - val_loss: 1.1609 - val_acc: 0.5750
Epoch 19/30
600/600 [==============================] - 0s 62us/step - loss: 0.7976 - acc: 0.7283 - val_loss: 1.1238 - val_acc: 0.6150
Epoch 20/30
600/600 [==============================] - 0s 63us/step - loss: 0.7660 - acc: 0.7650 - val_loss: 1.1604 - val_acc: 0.5850
Epoch 21/30
600/600 [==============================] - 0s 65us/step - loss: 0.7465 - acc: 0.7650 - val_loss: 1.1888 - val_acc: 0.5700
Epoch 22/30
600/600 [==============================] - 0s 65us/step - loss: 0.7099 - acc: 0.7517 - val_loss: 1.1563 - val_acc: 0.6050
Epoch 23/30
600/600 [==============================] - 0s 68us/step - loss: 0.6857 - acc: 0.7683 - val_loss: 1.0900 - val_acc: 0.6200
Epoch 24/30
600/600 [==============================] - 0s 67us/step - loss: 0.6597 - acc: 0.7850 - val_loss: 1.0872 - val_acc: 0.6300
Epoch 25/30
600/600 [==============================] - 0s 67us/step - loss: 0.6377 - acc: 0.7967 - val_loss: 1.1148 - val_acc: 0.6200
Epoch 26/30
600/600 [==============================] - 0s 64us/step - loss: 0.6070 - acc: 0.8200 - val_loss: 1.1397 - val_acc: 0.6150
Epoch 27/30
600/600 [==============================] - 0s 66us/step - loss: 0.5991 - acc: 0.8167 - val_loss: 1.1255 - val_acc: 0.6300
Epoch 28/30
600/600 [==============================] - 0s 62us/step - loss: 0.5656 - acc: 0.8333 - val_loss: 1.0955 - val_acc: 0.6350
Epoch 29/30
600/600 [==============================] - 0s 66us/step - loss: 0.5513 - acc: 0.8300 - val_loss: 1.1030 - val_acc: 0.6050
Epoch 30/30
600/600 [==============================] - 0s 56us/step - loss: 0.5498 - acc: 0.8233 - val_loss: 1.0869 - val_acc: 0.6250
200/200 [==============================] - 0s 65us/step

In [38]:

results

Out[38]:

[1.2261371064186095, 0.65]

Predictions on Test Data

In [0]:

predictions = model.predict(X_test)

In [26]:

predictions[0].shape

Out[26]:

(10,)

In [27]:

np.sum(predictions[0])

Out[27]:

1.0

In [28]:

np.argmax(predictions[0])

Out[28]:

8

In [0]: