Skip to content

lucacoma/DiffTransfer

Repository files navigation

DiffTransfer

TIMBRE TRANSFER USING IMAGE-TO-IMAGE DENOISING DIFFUSION IMPLICIT MODELS

Accompanying code to the paper Timbre transfer using image-to-image denoising diffusion implicit models [1].

For any question, please write at luca.comanducci@polimi.it.

Dependencies

Tensorflow (>2.11), librosa, pretty_midi, os, numpy, essentia, frechet_audio_distance

Data generation

The model is trained using the StarNet dataset, freely available on Zenodo link

Network training

  • audio_utils.py --> contains shared audio utilities and functions
  • params.py --> Contains parameters shared along scripts
  • network_lib_attention.py --> Contains Denoising Diffusion Implicit Model Implementation
  • DiffTransfer.py --> Actually runs the training, takes the following arguments:
    • dataset_train_path: String, path to training data
    • desired_instrument: String, name of desired output instrument
    • conditioning_instrument: String, name of input instrument
    • GPU: number of GPU, in case you have multiple ones

Results computation

  • compute_eval_tracks_mixture.py
  • compute_eval_tracks_separate.py
  • compute_frechet.py
  • compute_jaccard.py
  • compute_listening_test_results.py
  • preprocess_tracks_listening_test.py

References

[1] Comanducci, Luca, Fabio Antonacci, and Augusto Sarti. "Timbre transfer using image-to-image denoising diffusion models. ISMIR International Society for Music Information Retrieval Conference arXiv

About

Timbre Transfer using Denoising Diffusion Implicit Models (ISMIR 2023)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages