Calculates and visualizes the temporal domain and frequency domain mean squared error of ffmpeg audio filters.
- Install ffmpeg
- Install required Python packages (
pip3 install -r requirements.txt
) - Create
original_audio
directory - Put audio in
original_audio
directory
- Run
convert.sh
if audio files are not in .wav format - Run
apply_filters.py
to create filtered audio files - Run
mse.py
to calculate the mean square error for each filter - Run
visualize.py
to visualize the results as bar graphs
Converts any audio files in original_audio
into a .wav file using default ffmpeg conversion and deletes originals.
./convert.sh
JSON formatted list of ffmpeg audio filters
Configuration file for apply_filters.py
and mse.py
key | default | description |
---|---|---|
filters_filename | filters.json | filename of JSON formatted filters list in |
segment_len | 262144 | number of audio samples in each analyzed segment |
sample_skips | 262144 | number of samples skipped between beginnings of analyzed segments |
bit_depth | 16 | bit depth of analyzed audio |
original_audio_dir | original_audio | relative path to search for original audio |
filtered_audio_dir | filtered_audio | relative path of filtered audio |
output_filename | output.json | filename of JSON formatted mean square error output |
Defines CONFIG_FILENAME
, Config
class, and associated JSON loader function (load_config
).
Loads configuration from CONFIG_FILENAME
, applies list of ffmpeg audio filters from filters_filename
to .wav files in original_audio_dir
and writes resulting audio files to filtered_audio_dir
.
python3 apply_filters.py
Loads configuration from CONFIG_FILENAME
and calculates the average MSE of sequences of length sequence_len
in the temporal domain and frequency domain (DCT-II) between original audio segments and their filtered counterparts. Resulting MSEs are dumped to output_filename
in JSON format.
python3 mse.py
Loads configuration from CONFIG_FILENAME
, read MSE outputs from output_filename
and plot the results as bar graphs.
The following results were calculated from 3 hours of audio extracted from a Twitch VOD.
filter | MSE (temporal domain) | MSE (frequency domain) |
---|---|---|
acompressor | 419.5447047722049 | 769325.3616135248 |
acrusher | 128.31195087665463 | 788.4744883700115 |
aecho | 1973.808181613829 | 11476890.952585308 |
aphaser | 2140.157159476164 | 7514830.79328153 |
alimiter | 1589.4807644937096 | 33103402.4035865 |
- Add more audio filters
- Add better documentation for example results