Optimize Parallel Version of Interpolate #227

aszepieniec · 2024-08-02T12:28:03Z

Function par_interpolate has weird behavior for small domain sizes. In particular, it is faster when (some of) its subroutines are sequential.

There is a lot of potential for optimization here. In general it is okay to rely on dispatcher methods that choose the asymptotically superior or concretely superior algorithm depending on some threshold, but in the context of parallel hardware we ideally want hardcoded thresholds to be independent of the number of cores/threads. It is allowable to call available_parallelism and make a decision based on that. This task involves finding the optimal cascade of specialized functions and the optimal dispatch criteria.

The text was updated successfully, but these errors were encountered:

Adjust benchmark size to reveal the asymptotic benefit of using par_batch_evaluate over naive parallelization over the domain. See #227 Co-authored-by: Alan Szepieniec <alan@neptune.cash>

Sword-Smith · 2024-08-02T13:32:49Z

It seems that par_interpolate can be made faster for some domain sizes, e.g. $2^{10}$, if evaluation uses "naive parallelism", i.e.:

impl Polynomial<FF: FiniteField> {
    fn parallel_naive_evaluate(&self, domain: &[FF]) -> Vec<FF> {
        domain.par_iter().map(|x| self.evaluate(x)).collect_vec()
    }
}

See commit message in fd5add7 for more info.

aszepieniec added the prio: low label Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Parallel Version of Interpolate #227

Optimize Parallel Version of Interpolate #227

aszepieniec commented Aug 2, 2024

Sword-Smith commented Aug 2, 2024 •

edited

Loading

Optimize Parallel Version of Interpolate #227

Optimize Parallel Version of Interpolate #227

Comments

aszepieniec commented Aug 2, 2024

Sword-Smith commented Aug 2, 2024 • edited Loading

Sword-Smith commented Aug 2, 2024 •

edited

Loading