Before you start¶
- Go to "File" --> "Save a copy in Drive"
- Open that copy (might open automatically)
- Then continue below
Machine Listening Seminar 3: Sound event classification¶
What we are going to do:
- Implement a full processing pipeline for sound event classification
- Use a traditional classification technique (and compare with deep-learning based methods later).
1. Import libraries¶
- We need a number of libraries. Import them once to use throughout the document.
In [ ]:
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import IPython
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from tqdm import tqdm
2. Fetch the Dataset¶
- ESC-50: a dataset for Environmental Sound Classification (,
- 50 classes, 40 files per class, 5s clips
- Download & unzip the dataset running the cell below. This will take a minute. You will see the new files on the left (folder icon).
In [ ]:
3. Metadata and analysis I¶
- Use pandas to read the csv file in ESC-50-master/meta/
- Print the first elements of the csv. Pandas has a standard function for this.
- Print the list of unique class labels in the dataset, and check whether there really are 50 of them
In [ ]:
fn_csv = 'ESC-50-master/meta/esc50.csv'
df = pd. ... # pd = pandas dataframe
unique_classes = df... # you may use unique()
Expected output:
filename fold target category esc10 src_file take
0 1-100032-A-0.wav 1 0 dog True 100032 A
1 1-100038-A-14.wav 1 14 chirping_birds False 100038 A
2 1-100210-A-36.wav 1 36 vacuum_cleaner False 100210 A
3 1-100210-B-36.wav 1 36 vacuum_cleaner False 100210 B
4 1-101296-A-19.wav 1 19 thunderstorm False 101296 A
Unique classes: ['dog' 'chirping_birds' ... ]
Count: 50
4. Metadata and analysis II¶
- View and listen to some examples in the dataset to get a "feeling" for the sound classes.
In [ ]:
# Setup some filepaths
path = 'ESC-50-master/audio/'
file0 = path + df['filename'][2] # We use indices [2-4] here, feel free to choose other files
file1 = path + df['filename'][3]
file2 = path + df['filename'][4]
# Show audio player for each file
# Plot mel specs
files = [file0, file1, file2]
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(18, 6))
for i in range(3):
y, sr = librosa.load(files[i])
D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)
img = librosa.display.specshow(D, y_axis='linear', x_axis='time', sr=sr, ax=ax[i])
ax[i].set(title='Power spec')
fig.colorbar(img, ax=ax, format="%+2.f dB")
5. ESC-5: Curation¶
Let's select 5 classes (our_classes) from ESC-50 to make things a bit faster.
- Collect all files that belong to our_classes.
- Put the files and their respective classes in separate lists. Make sure their indices are equal (meaning: the value at index 3 of list A is related to the value at index 3 of list B).
- Idea 1: Use df.values to iterate over the rows of the csv
- Idea 2: Use df.query('category in @our_classes')
- Print the first 5 elements of each list as (file, class)-tuples. Also, print the overall lengths of the lists.
In [ ]:
our_classes = ['crying_baby', 'dog', 'rain', 'rooster', 'sneezing'] # Note: This is also a class map for later.
esc5_X = [] # File list
esc5_y = [] # Class list
fn_csv = 'ESC-50-master/meta/esc50.csv'
df = ...
for ... in df.values: # This is one way, not an ideal way: This loop aims to find files for each class in our_classes.
if ... :
esc5_X.append( ... ) # filename column of df
... # class column of df
print( ...(esc5_X[:5], esc5_y[:5])... )
print(f'Lengths: ...')
### END CODING ###
Expected output:
[('1-100032-A-0.wav', 'dog'), ('1-110389-A-0.wav', 'dog'), ('1-17367-A-10.wav', 'rain'), ('1-187207-A-20.wav', 'crying_baby'), ('1-211527-A-20.wav', 'crying_baby')]
Lengths: esc5_X: 200, esc5_y: 200
6. ESC-5: Dataset splitting¶
- Split the dataset into train and test subsets: split ratio is 80%/20%, and random state 1337.
- Use a suitable straight-forward function from sklearn.
- Print the first 3 elements of the resulting X_train.
- Print the overall lengths of the resulting lists. Are they aligned with the ratio?
Result: ESC-5 is ready. We have a train and test set consisting of file lists and their respective classes.
In [ ]:
X_train, X_test, ... = ...(..., random_state=1337)
print(f'X: ...; y: ...')
### END CODING ###
Expected output:
['5-203128-A-0.wav', '4-181286-A-10.wav', '3-157615-A-10.wav']
X: 160, 40; y: 160, 40
7. ESC-5: Create mel spectrograms¶
We need to compute features and corresponding labels for each file in our ESC-5.
- Define a function that does the following (in this order!):
- takes input parameters: X_train (list of filenames), y_train (list of classes)
- loops over X_train (hint: enumerate it), and loads each file (.wav) using librosa
- creates the mel spectrogram from the loaded .wav samples
- normalizes the mel spec by dividing it through the number of given mel_bands.
- transposes the mel spec
- appends the features (mel spec) to a feature tensor
- creates a target vector consisting of as many values as there are frames
- hint: use .shape to see which value you need
- each value inside the vector must correspond to the index of the class in our_classes
- hint: remember numpy.ones(...) ?
- hint: use .index(...) here. Not ideal, but works here.
- appends the targets to a target tensor
- stacks the large feature and target lists appropriately
- returns the tensors
- Finally, print the shapes of all 4 arrays.
In [ ]:
def extract_mel_spec(...):
X = [] # feature tensor
y = [] # target tensor
mel_bands = 128
for ..., ... in tqdm(enumerate(...)): # tqdm simply displays a progress bar.
wav_data, sr = librosa... # Use the wav file's sample rate.
# Features (2D)
mel_spec = librosa... # Create mel spectrogram. Output shape: (128, 216) (n_mels, frames)
mel_spec = mel_spec/... # Normalization
mel_spec = mel_spec... # Transposition. Output shape: (216, 128)
X.append( ... ) # Append to feature tensor
# Targets == class_name
targets = np.ones( mel_spec... ) # Create a PLACEHOLDER target vector. Output shape: (216) (Note: silent frames are not going to be labeled as "silent")
targets = targets * our_classes... # Convert values to actual class-index from 'our_classes'
... # Append to target tensor
# Stack tensors
X = np.vstack(X)
y = np.hstack(y)
# Return the tensors
return ...
# Call the function on our previously generated lists
X_train_ready, y_train_ready = extract_mel_spec(...)
X_test_ready, y_test_ready = ...
print(f'\nShapes: X_train_ready: {X_train_ready.shape}, y_train_ready: {y_train_ready.shape}')
print(f'Shapes: X_test_ready: {X_test_ready.shape}, y_test_ready: {y_test_ready.shape}')
Expected output:
Shapes: X_train_ready: (34560, 128), y_train_ready: (34560,)
Shapes: X_test_ready: (8640, 128), y_test_ready: (8640,)
8. ESC-5: Train a nearest neighbor classifier¶
- Standardize the features from step 7 using sklearn.
- Use the features and targets to train (fit) a kNN-classifier from sklearn, with 5 neighbors and uniform weighting.
- Print the scores on the train set and test set, rounded to 4 decimals. (This will take some time!)
In [ ]:
# Feature scaling / Data standardization / Normalization
scaler = StandardScaler()
X_train_ready = scaler.transform(...) # Normalize the features here for both train and test
X_test_ready = ...
model = ...(n_neighbors=..., weights=...) # Call the kNN-classifier. Look at your imports again for a hint.
model... # Fit/Train the classifier using our generated tensors.
print(f'Train score: {np.round(model.score(...), decimals=...)}')
print(f'Test score: ...')
Expected output (might differ slightly):
Train score: 0.7624
Test score: 0.5812
9. ESC-5: Plot the confusion matrix¶
We want to learn more about our classifier. How well does it perform per class?
- Using scikit-learn, create a confusion matrix of our classifier over the test set
- Normalize the rows, use 'our_classes' as axes tick values
- Display the plot
In [ ]:
...(model, ..., ..., normalize=...) # Call the confusion matrix plot function. Look at your imports again for a hint.
plt.xticks(ticks=np.arange(...), labels=...) # hint: np.arange(5) = (0, 1, 2, 3, 4)
Expected output:
- a coloured confusion matrix
- each row should add up to 1
- labels from our_classes on x-axis and y-axis
- similar to this: