import glob
import os
import librosa
import tensorflow
from sklearn.metrics import accuracy_score, confusion_matrix
C:\Users\abr\Miniconda3\envs\alfpaka_env\lib\site-packages\requests\__init__.py:104: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (5.0.0)/charset_normalizer (2.0.4) doesn't match a supported version! RequestsDependencyWarning)
We'll use the CSV file https://github.com/karolpiczak/ESC-50/blob/master/meta/esc50.csv from the ESC-50 as example to show how to import it using the pandas library.
fn_csv = 'https://raw.githubusercontent.com/karolpiczak/ESC-50/master/meta/esc50.csv'
import pandas as pd
# by default, pandas reads the first line as column "headers", which makes sense here.
data_frame = pd.read_csv(fn_csv)
# let's look at the first 5 rows of the data frame
print(data_frame.head())
# you can access data from a data frame as:
n_files = data_frame.shape[0]
print("===")
print("We have {} files".format(n_files))
# Let's look at the first 5 files
for n in range(5):
print("---")
print("Filename: ", data_frame["filename"][n])
print("Class ID: ", data_frame["target"][n])
print("Class label: ", data_frame["category"][n])
# You can iterate over all files easily, extract features etc...
filename fold target category esc10 src_file take 0 1-100032-A-0.wav 1 0 dog True 100032 A 1 1-100038-A-14.wav 1 14 chirping_birds False 100038 A 2 1-100210-A-36.wav 1 36 vacuum_cleaner False 100210 A 3 1-100210-B-36.wav 1 36 vacuum_cleaner False 100210 B 4 1-101296-A-19.wav 1 19 thunderstorm False 101296 A === We have 2000 files --- Filename: 1-100032-A-0.wav Class ID: 0 Class label: dog --- Filename: 1-100038-A-14.wav Class ID: 14 Class label: chirping_birds --- Filename: 1-100210-A-36.wav Class ID: 36 Class label: vacuum_cleaner --- Filename: 1-100210-B-36.wav Class ID: 36 Class label: vacuum_cleaner --- Filename: 1-101296-A-19.wav Class ID: 19 Class label: thunderstorm
This example shows how to take an arbitrary audio file and cut in into segments of a fixed duration (e.g. 1s). This way, you can use collections of multiple audio files with different durations, and create a dataset for a machine learning model:
Audio File 1 (34 s) -> 34 audio segments (à 1s)
Audio File 2 (12 s) -> 12 audio segments (à 1s)
...
In total: 46 audio segments.
Note: Here, we have no overlap between the segments. Using overlapping segments is also possible and can potentially give you more data.
import numpy as np
# let's take an example from the ESC-50 dataset
fn_wav = 'piano.wav'
x, fs = librosa.load(fn_wav)
len_s = len(x) / fs
print('Duration of the audio file in seconds: {}'.format(len_s))
segement_len_s = 1
n_seg = int(np.floor(len_s / segement_len_s))
print('We get {} segments of {} s durations'.format(n_seg, segement_len_s))
segment_len_samples = int(segement_len_s*fs)
print("The segments are {} samples long".format(segment_len_samples))
# now let's collect the audio samples for each segment
segment_x = []
for s in range(n_seg):
start_sample_index = segment_len_samples*s
end_sample_index = start_sample_index + segment_len_samples
segment_x.append(x[start_sample_index : end_sample_index])
segment_x = np.array(segment_x)
print("Now we have a matrix of shape {} which contains the audio samples for each segment in different rows.".format(segment_x.shape))
Duration of the audio file in seconds: 3.8458049886621315 We get 3 segments of 1 s durations The segments are 22050 samples long Now we have a matrix of shape (3, 22050) which contains the audio samples for each segment in different rows.
Easy example: Consider we have 100 songs, we will first shuffle them (to random order) and then take the first 80% as training data and the second 20% as test data.
n_files = 100
song_id = np.arange(n_files)
print("All songs have a unique ID (number):", song_id)
# random shuffle
np.random.shuffle(song_id)
print("Now the IDs are randomized:", song_id)
percentage_training_set = 0.8 # we use 80% as training data
n_files_train = int(percentage_training_set*n_files)
print("We take {} files as training data".format(n_files_train))
song_id_train = song_id[:n_files_train]
# and the remaining ones as test:
song_id_test = song_id[n_files_train:]
print("Song IDs for the training set:", song_id_train)
print("Song IDs for the test set:", song_id_test)
# Now we can use the IDs to split our feature matrix (assuming we have 23 features for each of the 100 files)
# and class_id vectors into training and test set
feat_mat = np.random.randn(100, 23)
feat_mat_train = feat_mat[song_id_train, :]
feat_mat_test = feat_mat[song_id_test, :]
# dummy example: 5 classes
class_id = np.round(5*np.random.rand(100)).astype(int)
print("Class ID", class_id)
class_id_train = class_id[song_id_train]
class_id_test = class_id[song_id_test]
print("Let's check the final shapes")
print(feat_mat_train.shape)
print(feat_mat_test.shape)
print(class_id_train.shape)
print(class_id_test.shape)
All songs have a unique ID (number): [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99] Now the IDs are randomized: [53 35 78 81 36 7 76 91 3 60 86 8 93 56 32 26 61 82 0 11 37 21 31 67 24 19 42 89 98 79 4 1 49 94 85 28 27 75 74 41 44 71 14 96 20 99 6 83 68 43 65 47 45 63 90 64 92 62 46 16 73 66 12 38 29 40 88 54 69 18 34 48 22 70 15 55 77 57 95 84 5 87 80 10 33 97 13 52 58 30 39 51 2 23 9 50 25 59 17 72] We take 80 files as training data Song IDs for the training set: [53 35 78 81 36 7 76 91 3 60 86 8 93 56 32 26 61 82 0 11 37 21 31 67 24 19 42 89 98 79 4 1 49 94 85 28 27 75 74 41 44 71 14 96 20 99 6 83 68 43 65 47 45 63 90 64 92 62 46 16 73 66 12 38 29 40 88 54 69 18 34 48 22 70 15 55 77 57 95 84] Song IDs for the test set: [ 5 87 80 10 33 97 13 52 58 30 39 51 2 23 9 50 25 59 17 72] Class ID [0 2 2 1 3 4 1 5 2 5 1 3 1 3 4 5 4 1 1 2 5 4 2 4 3 1 1 1 4 1 4 4 1 4 0 2 0 2 4 2 1 4 0 3 4 0 1 4 3 4 2 4 3 0 2 2 0 4 4 1 4 0 5 2 4 2 1 3 4 5 1 3 4 2 3 1 5 2 2 3 2 0 3 1 4 4 1 2 1 3 4 5 1 3 1 3 0 2 0 1] Let's check the final shapes (80, 23) (20, 23) (80,) (20,)
Tempogram representation (Tempo vs. Time)
Onset detection
Beat and tempo
Section | Purpose / Content |
---|---|
Abstract | Very compact summary of your research report (what is the topic / research field? |
give a brief motivation. which methods were applied / compared? | |
what are the main results and conclusions? | |
Introduction | - classify the research topic in a superordinate field of research |
- introduce / motivate problem, mention possible application scenarios | |
- what makes the problem challenging / hard / interesting to look into? | |
- briefly list / summarize the main contributions of the paper | |
Related Work | - summarize and cluster (multiple) related publications (journal articles, conference papers, books) by outlining the main underlying research approaches to solve the existing problem (don't just go through paper 1, paper 2, etc.) |
- compare and contrast (explain how other approaches differ from your approach, which other approaches your work builds upon | |
Proposed Method | - explain your proposed method in details, think about presenting a flow-chart which summarizes the overall workflow |
- (individual steps of your flowchart can guide the choice of subsections) | |
Evaluation | - possible first subsection: Dataset / Annotation (explain source and content and type of annotations) |
- explain evaluation procedure (dataset split, evaluation metrics) | |
- (if you perform multiple experiments, this can guide the choice of subsections | |
Results | - summarize the main results (tables, figures) |
Conclusions | - summarize the overall result of your paper (list again the contributions, main findings from the experiments) |
- optional: provide an outlook on future work |