You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fiat-crypto has neat formally verified/auto-generated field arithmetic that can be used to replace the fiddly bits of the internal/field package. It would be nice to be able to use it, since it would let me feel less scared of exposing the package, and it's a nice checkbox to have.
Actually doing the switch requires the fiat-crypto performance to not be horrifically bad, since the current code works (and has been reviewed externally for correctness). Benchmarks from Novi's curve25519-dalek fork (that has been upstreamed) documentation indicate that a 8-15% performance degradation is to be expected, but the actual observed results are significantly worse.
Write an initial naive experimental branch to get ballpark benchmarks.
Be really really really sad about performance.
Try to figure out why performance is, significantly worse, when the rust code is just "slightly worse".
CarryMul is ridiculously slow because of addcarryxU64
CarrySquare is ridiculously slow because of addcarryxU64
Go's shit inliner strikes again. The rust code #[inline]s fiat_25519_carry_mul among other things. Carry is over the inliner budget.
Try to reduce the performance penalty further
Switch CarrySquare over to CarryPow2k
Add Add/Sub/Opp + Carry
Add Add/Sub + CarryMul/CarrySquare
Fix X25519 performance
Benchmark on 32-bit ARM or something.
Current status (2021/08/09)
The upstream fiat-crypto developers were kind enough to fold in some performance related changes (CarryMul/CarrySquare performance fixed, Add/Sub/Opp + Carry added).
My branch still adds CarryPow2k and Add/Sub + CarryMul/CarrySquare, which I suppose I can ask about.
Since the branch is close enough to what I envision a switch would look like, these are the current rough benchmarks:
Note: The massive regression for X25519 ScalarMult is due to the removal of assembly. Sufficiently recent x/crypto/curve25519 does away with it as well, and clocks in at ~107us/op (~161us/op purego).
This is getting to "an acceptable slowdown" if people think that using the fiat code is better over the code that came from dalek, but the issue of having to ship modified routines from the 64/curve25519 implementation remains.
The text was updated successfully, but these errors were encountered:
fiat-crypto has neat formally verified/auto-generated field arithmetic that can be used to replace the fiddly bits of the internal/field package. It would be nice to be able to use it, since it would let me feel less scared of exposing the package, and it's a nice checkbox to have.
Actually doing the switch requires the fiat-crypto performance to not be horrifically bad, since the current code works (and has been reviewed externally for correctness). Benchmarks from Novi's curve25519-dalek fork (that has been upstreamed) documentation indicate that a 8-15% performance degradation is to be expected, but the actual observed results are significantly worse.
CarryMul
is ridiculously slow because ofaddcarryxU64
CarrySquare
is ridiculously slow because ofaddcarryxU64
#[inline]
sfiat_25519_carry_mul
among other things.Carry
is over the inliner budget.CarrySquare
over toCarryPow2k
Add
/Sub
/Opp
+Carry
Add
/Sub
+CarryMul
/CarrySquare
Current status (2021/08/09)
The upstream fiat-crypto developers were kind enough to fold in some performance related changes (
CarryMul
/CarrySquare
performance fixed,Add
/Sub
/Opp
+Carry
added).My branch still adds
CarryPow2k
andAdd
/Sub
+CarryMul
/CarrySquare
, which I suppose I can ask about.Since the branch is close enough to what I envision a switch would look like, these are the current rough benchmarks:
With AVX2 used where it exists:
Note: The massive regression for X25519 ScalarMult is due to the removal of assembly. Sufficiently recent
x/crypto/curve25519
does away with it as well, and clocks in at ~107us/op (~161us/op purego).purego:
This is getting to "an acceptable slowdown" if people think that using the fiat code is better over the code that came from dalek, but the issue of having to ship modified routines from the
64/curve25519
implementation remains.The text was updated successfully, but these errors were encountered: