Paper: Transferable Models for Bioacoustics with Human Language Supervision arxiv
Use the model on HuggingFace! davidrrobinson/BioLingual
BioLingual is a language-audio model for bioacoustics, useful for zero-shot audio classification and sound detection, text-to-audio search, or for fine-tuning on new bioacoustic tasks.
AnimalSpeak is a large-scale language-audio dataset used to train BioLingual, created by captioning bioacoustic archives including Xeno-canto and iNaturalist.
To recreate the BEANS benchmarking results from the paper:
pip install -r requirements.txt
cd beans
Follow instructions in beans/README.MD to download the datasets
python run_benchmark.py
The AnimalSpeak dataset is released at https://huggingface.co/datasets/davidrrobinson/AnimalSpeak
We express our gratitude to the authors of CLAP and beans, which much of this repository is based on, for making their code open-source.