Skip to content

Latest commit

 

History

History
59 lines (53 loc) · 2.28 KB

readme.md

File metadata and controls

59 lines (53 loc) · 2.28 KB

Artist20 Singer Prediction

Description

This repository includes the project for the first homework of the course "Deep Learning for Music Analysis and Generation" lectured by Prof. Yang at the National Taiwan University. The main goals of this work is to train a singer classification model on the Artist20 dataset. Given an audio segment, the model should predict the top 3 highest similarity artist within the 20 different singers in the dataset.

Create Environment

conda create env -f environment.yml
conda activate sing_id

Preprocess Data with Demucs

Please follow the instructions in the 'Calling From Another Python Program' section to process all the audios. https://github.com/facebookresearch/demucs#calling-from-another-python-program

The file structure should be something similar like this:

./dataset
    |- test/
        |- 0001/
            |- vocals.mp3
        |- 0002/
            |- vocals.mp3
        |- 0003/
            |- vocals.mp3
        ...
    | - train/
        |- aerosmith/
            |- Aerosmith/
                |- 01-Make_it/
                    |- vocals.mp3
        |- $singer_n/
            |- $album_n/
                |- $song_n/
                    |- $audios_n
        ...
    | - valid/
        |- $singer_n/
            |- $album_n/
                |- $song_n/
                    |- $audios_n
        ...
        

Inference the Audios

  • Please download the model weights from Google Drive: Link
  • Please download the singer anchors from Google Drive: Link
  • Inference the model with the following command:
python -m inference \
   --anchor_path="./singer_samples.pickle" \ # the path to singer anchors
   --weight_path="./model_weight.pt" \ # the path to the weights
   --out_path="./output.csv" \ # the path to dump the inference results
   --root_dir="./dataset/test/" \ # the root folder path
   --glob_exp="*/vocals.mp3" \ # the glob expression to search the root folder
   --duration=20 \ # the duration of audio segment to inference 
   --batch_size=32 # the maximum batch size