You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
10 seconds for a very simple test file with about 1,000 notes (test.py, included in repository);
467 seconds (almost 8 minutes!) for the Note Block Megacollab file with 250k+ notes.
See screenshots below for a snakeviz profiling graph for these two operations (the .prof files out of cProfile are also attached here: nbswave_profile.zip):
Test file (1k notes):
Megacollab (250k notes):
This can be made a heck lot better.
Through the above screenshots, you can see that, when there aren't many notes to place, most of the time is spent loading the sound files. And, when the bulk of the operation becomes placing notes, a lot of time is spent in the audio manipulation operations, particularly on panning and volume (which, as we'll see, are simply array multiplications). This indicates that there are potential optimizations to make both in loading sounds, as well as on the mixing steps themselves.
Reason
Looking at jiaaro/pydub#725, many operations in pydub are implemented using the now deprecated, to-be-removed audioop module. Although it requires no external dependencies, it's extremely inefficient -- and, no wonder, takes up most of the export time.
nbswave already bypasses pydub on the mixing implementation -- we implement our own here using numpy operations since it's a lot more efficient than the alternative implemented by pydub (see my 2021 issue about this: jiaaro/pydub#550)
The audio engine implementation done for the future Python NBS rewrite has also shown that many operations nbswave relies on are really slow in pydub. As such, the library was entirely replaced in the audio module with other tools. In the next section, we'll discuss those implementations briefly and how they could be brought here to make the export performance much better. Most of them leverage numpy, which is already a dependency of this package. If we can rely on it enough to bypass pydub operations, it's possible to even remove it completely from the dependencies of nbswave.
Optimizations to make
Loading sounds
Current solution:pydub.AudioSegment.from_file
Proposed solution:soundfile package
Reason: The former launches a ffmpeg subprocess and takes seconds, while the latter calls libsoundfile via CFFI, which is capable of loading all sounds in a fraction of a second. Implemented here.
Volume
Current solution:pydub.AudioSegment.apply_gain -> audioop.mul
Proposed solution:numpy
Reason: One array multiplication with numpy does the trick. Implemented here.
Panning
Current solution:pydub.AudioSegment.pan -> audioop.tostereo and audioop.mul
Proposed solution: numpy
Reason: Requires two array slice multiplications, one for each channel. It's really easy to calculate the gain boost and cut of each channel from the panning value; we've implemented it here.
Pitch
Current solution:pydub.AudioSegment._spawn -> audioop.ratecv
Proposed solution:libsamplerate
Reason: There are entire libraries dedicated to resampling audio while retaining quality, some with the goal of real-time processing (e.g. OpenAL); others not (e.g. librosa etc.). But audioop is miserable at this.
This article presents a comparison between a few of them. In my own research, I've concluded that resampy and samplerate excel at this. resampy uses scipy and numba to accelerate processing, while samplerate uses the widely-known "Secret Rabbit Code", implemented in C++, using pybind11 to interface with it directly (meaning: it is FAST). There's also librosa with its resample function; though its overhead is much larger; and scipy.signal.resample, but I'd rather not include the entirety of scipy to use one function out of it :D
Here is an implementation using libsamplerate, which should be ported here. The implementation prior to this commit used the real-time API to process slices of each playing sound on-demand, but our implementation here doesn't need this -- it's literally one function call, no callbacks or any of that monstrosity.
Order of operations
When this package was made, it was assumed that resampling (necessary to apply pitch) would be the most computationally-expensive operation, since it requires running costly signal interpolation filters.
That would most likely be true if the other operations (panning and pitch) were optimized as much as they could, since they consist entirely of basic array multiplications -- but in its current state, they aren't. To take advantage of this (non-)fact, the implementation applies pitch (resampling) first, and then caches the result to reuse it when applying panning and velocity. Since they are simple multiplication operations, they aren't expected to take long; alas, here we are.
f"The sound file for instrument {ins_name} was not found: {ins_file}"
)
else:
continue
ifsound1isNone: # Sound file not assigned
continue
sound1=audio.sync(sound1)
ifkey!=last_key:
last_vol=None
last_pan=None
pitch=audio.key_to_pitch(key)
sound2=audio.change_speed(sound1, pitch)
ifvol!=last_vol:
last_pan=None
gain=audio.vol_to_gain(vol)
sound3=sound2.apply_gain(gain)
ifpan!=last_pan:
sound4=sound3.pan(pan)
sound=sound4
last_ins=ins
last_key=key
last_vol=vol
last_pan=pan
So the slowness of the panning and gain functions are amplified by this design decision. After implementing the other optimizations, it's wise to check if the avoidances are working as intended and really reducing the exported time (as opposed to applying all operations to all notes). Although, I believe its potential will really shine when resampling becomes the most costly operation, as originally expected.
Summary
All of the operations to be replaced were already implemented in a past version of the NewNBS audio engine, before OpenAL was used. Their respective source code was presented here in each section, so it's only a matter of bringing the implementations here.
Finally, here's the entire history of the audio.py module -- it's so precious to see how many iterations we've gone through to just land on OpenAL at the end!! The good thing is, we can use everything we learned there to make audio processing more efficient here, so it's a win-win :)
With these implementations, I estimate nbswave can export up to 60–80% faster than it can now. :)
The content you are editing has changed. Please copy your edits and refresh the page.
Issue
As of v0.4.0, exporting seems to take about:
test.py
, included in repository);See screenshots below for a
snakeviz
profiling graph for these two operations (the.prof
files out ofcProfile
are also attached here: nbswave_profile.zip):Test file (1k notes):
Megacollab (250k notes):
This can be made a heck lot better.
Through the above screenshots, you can see that, when there aren't many notes to place, most of the time is spent loading the sound files. And, when the bulk of the operation becomes placing notes, a lot of time is spent in the audio manipulation operations, particularly on panning and volume (which, as we'll see, are simply array multiplications). This indicates that there are potential optimizations to make both in loading sounds, as well as on the mixing steps themselves.
Reason
Looking at jiaaro/pydub#725, many operations in
pydub
are implemented using the now deprecated, to-be-removedaudioop
module. Although it requires no external dependencies, it's extremely inefficient -- and, no wonder, takes up most of the export time.nbswave
already bypassespydub
on the mixing implementation -- we implement our own here usingnumpy
operations since it's a lot more efficient than the alternative implemented bypydub
(see my 2021 issue about this: jiaaro/pydub#550)The audio engine implementation done for the future Python NBS rewrite has also shown that many operations
nbswave
relies on are really slow inpydub
. As such, the library was entirely replaced in the audio module with other tools. In the next section, we'll discuss those implementations briefly and how they could be brought here to make the export performance much better. Most of them leveragenumpy
, which is already a dependency of this package. If we can rely on it enough to bypasspydub
operations, it's possible to even remove it completely from the dependencies ofnbswave
.Optimizations to make
Loading sounds
pydub.AudioSegment.from_file
soundfile
packageffmpeg
subprocess and takes seconds, while the latter callslibsoundfile
via CFFI, which is capable of loading all sounds in a fraction of a second. Implemented here.Volume
pydub.AudioSegment.apply_gain
->audioop.mul
numpy
numpy
does the trick. Implemented here.Panning
pydub.AudioSegment.pan
->audioop.tostereo
andaudioop.mul
numpy
Pitch
pydub.AudioSegment._spawn
->audioop.ratecv
libsamplerate
audioop
is miserable at this.This article presents a comparison between a few of them. In my own research, I've concluded that
resampy
andsamplerate
excel at this.resampy
usesscipy
andnumba
to accelerate processing, whilesamplerate
uses the widely-known "Secret Rabbit Code", implemented in C++, using pybind11 to interface with it directly (meaning: it is FAST). There's alsolibrosa
with itsresample
function; though its overhead is much larger; andscipy.signal.resample
, but I'd rather not include the entirety ofscipy
to use one function out of it :DHere is an implementation using
libsamplerate
, which should be ported here. The implementation prior to this commit used the real-time API to process slices of each playing sound on-demand, but our implementation here doesn't need this -- it's literally one function call, no callbacks or any of that monstrosity.Order of operations
When this package was made, it was assumed that resampling (necessary to apply pitch) would be the most computationally-expensive operation, since it requires running costly signal interpolation filters.
That would most likely be true if the other operations (panning and pitch) were optimized as much as they could, since they consist entirely of basic array multiplications -- but in its current state, they aren't. To take advantage of this (non-)fact, the implementation applies pitch (resampling) first, and then caches the result to reuse it when applying panning and velocity. Since they are simple multiplication operations, they aren't expected to take long; alas, here we are.
Here's the bit code that does this:
nbswave/nbswave/main.py
Lines 155 to 209 in 8b6f4a1
So the slowness of the panning and gain functions are amplified by this design decision. After implementing the other optimizations, it's wise to check if the avoidances are working as intended and really reducing the exported time (as opposed to applying all operations to all notes). Although, I believe its potential will really shine when resampling becomes the most costly operation, as originally expected.
Summary
All of the operations to be replaced were already implemented in a past version of the NewNBS audio engine, before OpenAL was used. Their respective source code was presented here in each section, so it's only a matter of bringing the implementations here.
Finally, here's the entire history of the
audio.py
module -- it's so precious to see how many iterations we've gone through to just land on OpenAL at the end!! The good thing is, we can use everything we learned there to make audio processing more efficient here, so it's a win-win :)With these implementations, I estimate
nbswave
can export up to 60–80% faster than it can now. :)Tasks
The text was updated successfully, but these errors were encountered: