You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Function par_interpolate has weird behavior for small domain sizes. In particular, it is faster when (some of) its subroutines are sequential.
There is a lot of potential for optimization here. In general it is okay to rely on dispatcher methods that choose the asymptotically superior or concretely superior algorithm depending on some threshold, but in the context of parallel hardware we ideally want hardcoded thresholds to be independent of the number of cores/threads. It is allowable to call available_parallelism and make a decision based on that. This task involves finding the optimal cascade of specialized functions and the optimal dispatch criteria.
The text was updated successfully, but these errors were encountered:
Adjust benchmark size to reveal the asymptotic benefit of using
par_batch_evaluate over naive parallelization over the domain.
See
#227
Co-authored-by: Alan Szepieniec <alan@neptune.cash>
Function
par_interpolate
has weird behavior for small domain sizes. In particular, it is faster when (some of) its subroutines are sequential.There is a lot of potential for optimization here. In general it is okay to rely on dispatcher methods that choose the asymptotically superior or concretely superior algorithm depending on some threshold, but in the context of parallel hardware we ideally want hardcoded thresholds to be independent of the number of cores/threads. It is allowable to call
available_parallelism
and make a decision based on that. This task involves finding the optimal cascade of specialized functions and the optimal dispatch criteria.The text was updated successfully, but these errors were encountered: