-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic differentiation (AD) not supported #17
Comments
Also, you should look at https://github.com/cgeoga/BesselK.jl. It allows for autodiff of |
Just making a note of this here. There was some interest from from Slack about getting derivatives to work with respect to both argument and order here. SpecialFunctions.jl only provides rules for argument derivatives. BesselK.jl shows that this is possible for orders with some care. This should be possible once we get the arbitrary orders done, but this seems like functionality people are interested in so I'll work on this after we finish the arbitrary orders. Though, historically |
Commenting here since the The advantage of this is you can also do analytic derivatives with respect to order and argument. Here are three functions that can calculate function besselk(v, x::T) where T
MaxIter = 5000
tol = eps(T)
h = T(0.1)
t = zero(T)
out = 0.5*exp(-x)
for _ in 1:MaxIter
t += h
tmp = cosh(v*t)*exp(-x*cosh(t))
out += tmp
tmp / out < tol && break
end
return out * h
end
function dxbesselk(v, x::T) where T
MaxIter = 5000
tol = eps(T)
h = T(0.1)
t = zero(T)
out = -0.5*exp(-x)
for _ in 1:MaxIter
t += h
tmp = −cosh(t)*cosh(v*t)*exp(−cosh(t)*x)
out += tmp
tmp / out < tol && break
end
return out * h
end
function dvbesselk(v, x::T) where T
MaxIter = 5000
tol = eps(T)
h = T(0.1)
t = zero(T)
out = zero(T)
for _ in 1:MaxIter
t += h
tmp = t*exp(−cosh(t)*x)*sinh(t*v)
out += tmp
tmp / out < tol && break
end
return out * h
end [1] Schwartz, Charles. "Numerical calculation of Bessel functions." International Journal of Modern Physics C 23.12 (2012): 1250084. |
Just was passing through and was curious how stuff here was going, and this is pretty neat! From a little tinkering it seems like when the argument
Maybe something that just has one branch based on the size of Cool stuff. |
in floating point, all multiplies retain full accuracy (unless they underflow or overflow). Also, |
Ah, for some reason I thought that would be an issue. Also interesting about exp and cosh being fast. Out of curiosity, what is the explanation for my version that computes the |
They're fast, but not free. |
Fair enough, I guess that's the power of micro-optimization at this level. I wouldn't have guessed they were in the cost range of 2 divisions. That's pretty neat. Anyways, will definitely be keeping an eye on this! I would still think that for small enough and big enough |
Many thanks @cgeoga for looking at and optimizing that code block! I found it about 2.5-3x faster than my original implementation. The additions in the optimized version do increase the relative error slightly compared to the all multiply version as Oscar mentioned which retains really good accuracy (though slower). On some quick tests it looked like the ULP increased from ~5 to ~15 which is still pretty good considering it's much faster. But yes! I always want numerical integration to work for this as they are so much easier and flexible but the convergence is variable. As you mentioned they are slower converging for small x but also for increasing v. The small x is not as concerning because you use power series there and the large v is also not concerning because you use the uniform asymptotic expansion there. One other concern is how to choose So these may actually have a really good application window. I've been spending my current time on On the other hand, I'm currently exploring just using a Chebyshev basis for this region and writing a really custom evaluation scheme to generate this..... the numerical integral for Edit: And yes this is definitely going to be slow as the other asymptotic expansions I feel like can just be optimized so much more (https://github.com/heltonmc/Bessels.jl/pull/23). One thing to consider is a different contour to integrate along (maybe some type of Hankel contour).... but even if we improved convergence operating in the complex domain might take away a lot of that gain. |
That makes a lot of sense. That region of x \in [7,20] or so was by far the most painful part for me too. Keeping Apologies if this is a bit too off-topic for this issue, and if so I'm happy to open a new issue to discuss it, but I played around a little with using a continued fraction representation to get Again apologies if a big code dump here is not good issue etiquette, but here is my basic code:
I don't really know about continued fractions and have never taken a numerical analysis course, so some of the parameter choices here are probably not good. But even with my uniform asymptotic function, this did a little better than what I currently use in that region but was a little bit slower. What are your thoughts about that strategy? |
Yes fully 2-d. It looks like you can get 16 digits of accuracy with about 20 Chebyshev polynomials which can be evaluated quite fast. This is a fine place to discuss! One issue to be careful with is that backward recurrence with We do currently employ a continued fraction part which I will say is by far the slowest code in our entire code base right now. We use it though to go from Edit: And honestly the debye expansion is so fast I can probably optimize it some more so we can produce both K{v} and K{v+1} faster than computing the continued fraction to get the ratio and in constant time Edit2: It looks like we would need about a (18, 30) chebshy grid which isn't as efficient as I thought... |
Ah, right, I forgot about the backwards recurrence being bad. That would explain why this didn't work for me for larger offsets, just as you predict. Oof. You know, it's funny, I really only ever need it for I hear you about the Debye expansion thing. This is really pretty exciting. Do you have a direct series implementation that will work for integer |
I don't have a power series yet for |
Ok - after diving into this problem a little bit I think I have a working solution. I'm not for sure why I never came back to this but so far the best method is to use hypergeometric series here. @oscardssmith mentioned this https://github.com/heltonmc/Bessels.jl/issues/16 and it was very easy to set up but see the comments there about convergence and honestly I couldn't guarantee accuracy for a variety of inputs (essentially you can't really guarantee accuracy for those functions over a large input range). At the time I was looking for an algorithm for when x~nu and it converted slow..... Flash forward to this discussion and it seems like a very good fit if we consider the right input ranges. @cgeoga although your code you shared probably won't work with debye expansions using backward recurrence it could work if we could generate Ok so now the trick is we need something to give us very accurate results for small orders that converges reasonable fast... well that is the hypergeometric series. So piecing all the code together from @cgeoga code block above https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1195726642, our recurrence discussion in #25, and some code from HypergeomtricFunctions.jl we can get a somewhat cumbersome source code for this. function besselk_mid_arguments(v, x)
v_low = v - floor(v)
k0 = besselk_hypergeometric(v_low, x)
k1 = besk_cf_ratio_opt(v_low, x)*k0
k2 = k1
x2 = 2 / x
arr = range(start=v_low+1, stop=v, step=1)
for n in arr
a = x2 * n
k2 = muladd(a, k1, k0)
k0 = k1
k1 = k2
end
return k0
end
function besselk_hypergeometric(v, x)
a = v+1/2; b = 2v+1; c = 2x
return drummond2F0(a, a-b+1, -1/c)*c^(-a)*sqrt(pi)/exp(x)*(2x)^v
end
# taken from https://github.com/JuliaMath/HypergeometricFunctions.jl/blob/master/src/drummond.jl
# ₂F₀(α,β;z)
using LinearAlgebra
@inline errcheck(x, y, tol) = isfinite(x) && isfinite(y) && (norm2(x-y) > max(norm2(x), norm2(y))*tol)
@inline norm2(x) = norm(x)
function drummond2F0(α::T1, β::T2, z::T3; kmax::Int = 10_000) where {T1, T2, T3}
T = promote_type(T1, T2, T3)
absα = abs(T(α))
absβ = abs(T(β))
if norm(z) < eps(real(T)) || norm(α*β) < eps(absα*absβ)
return one(T)
end
ζ = inv(z)
Nlo = ζ/(α*β)
Dlo = ζ/(α*β)
Tlo = Nlo/Dlo
Nmid = (2ζ-(α+1)*(β+1))*Nlo + 2ζ
Dmid = (2ζ-(α+1)*(β+1))*Dlo
Tmid = Nmid/Dmid
if norm((α+1)*(β+1)) < eps((absα+1)*(absβ+1))
return Tmid
end
Nmid /= (α+1)*(β+1)
Dmid /= (α+1)*(β+1)
Nhi = (3ζ-(α+2)*(β+2)-(α+β+3))*Nmid - (α+β+3-ζ)*Nlo + ζ
Dhi = (3ζ-(α+2)*(β+2)-(α+β+3))*Dmid - (α+β+3-ζ)*Dlo
Thi = Nhi/Dhi
if norm((α+2)*(β+2)) < eps((absα+2)*(absβ+2))
return Thi
end
Nhi /= (α+2)*(β+2)
Dhi /= (α+2)*(β+2)
k = 2
while k < kmax && errcheck(Tmid, Thi, 10eps(real(T)))
Nhi, Nmid, Nlo = ((k+2)*ζ-(α+k+1)*(β+k+1)-k*(α+β+2k+1))*Nhi - k*(α+β+3k-ζ)*Nmid - k*(k-1)*Nlo, Nhi, Nmid
Dhi, Dmid, Dlo = ((k+2)*ζ-(α+k+1)*(β+k+1)-k*(α+β+2k+1))*Dhi - k*(α+β+3k-ζ)*Dmid - k*(k-1)*Dlo, Dhi, Dmid
Thi, Tmid, Tlo = Nhi/Dhi, Thi, Tmid
if norm((α+k+1)*(β+k+1)) < eps((absα+k+1)*(absβ+k+1))
return Thi
end
Nhi /= (α+k+1)*(β+k+1)
Dhi /= (α+k+1)*(β+k+1)
k += 1
end
return isfinite(Thi) ? Thi : isfinite(Tmid) ? Tmid : Tlo
end
# taken from Chris Geoga https://github.com/cgeoga
# https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1195726642
function besk_cf_ratio_opt(v, z, nmax=100, tol=1e-16) # this takes about 60ns on my machine
# do the modified Lentz method:
(hn, Dn, Cn) = (1e-50, zero(v), 1e-50)
onev = one(v)
twov = onev+onev
jf = onev
qi = inv(twov+twov)
vv = v*v
for j in 1:nmax
an = (vv - ((twov*jf-onev)^2)*qi)
bn = 2*(z+jf)
Cn = bn+an/Cn # skipping zero check...
Dn = inv(bn+an*Dn) # skipping zero check...
del = Dn*Cn
hn *= del
(abs(del-1)<tol) && break
jf += onev
end
(twov*v+onev+twov*z)/(twov*z) + hn/z
end Ok so now let's look at the accuracy in some problematic regions x in [7,20] and nu around 15. julia> 1 - besselk_mid_arguments(15.0, 10.0)/SpecialFunctions.besselk(15.0, 10.0)
-1.1102230246251565e-15
julia> 1 - besselk_mid_arguments(16.2, 13.4)/SpecialFunctions.besselk(16.2, 13.4)
1.1102230246251565e-16 So you can see this approach is very accurate providing at least 15 digits of accuracy I'm not for sure how accurate the SpecialFunctions.jl implementation to distinguish past that. And now the perfomance checks. I will say I have not spent any time optimizing this but I would imagine we can do better. julia> @benchmark besselk_mid_arguments(15.2, x) setup=(x=15.0+rand())
BenchmarkTools.Trial: 10000 samples with 446 evaluations.
Range (min … max): 231.128 ns … 511.025 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 238.509 ns ┊ GC (median): 0.00%
Time (mean ± σ): 243.026 ns ± 10.436 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
██▁ ▆▄▆▄ ▆▄ ▂▃▁▄▁ ▃▁ ▂▄▃ ▂▁ ▂
█▆▁▁▁▁▁▄▄▃███▇▇▆▇██████▇██▇█████▇████▇███▇▆▅██▅▆▆██▅▆▆▅▆▅▅▆▆█ █
231 ns Histogram: log(frequency) by time 267 ns <
Memory estimate: 0 bytes, allocs estimate: 0. Now I wouldn't say this is blazing fast by any means but I think it is very good as our general fallback for problematic regions! It works for integer and decimal nu and gives very high accuracy results (also works for complex arguments). I'm sure we can spend some time optimizing this a bit and would love to probably have a version of this in this here in the future (@cgeoga if your willing to contribute your code block here) |
This looks like it could be made a lot faster by factoring out small bits. It looks like there is a lot of duplicated work. |
@oscardssmith absolutely. I think we should edit the source code from HypergeomtricFunctions.jl and have a version of it here or submit PR over there. I really haven't had time to look into it too carefully to see where we can make up some performance. |
Wow. That is some seriously impressive composition. I also played a lot with Not to derail, but one more idle thought: I tinkered a bit this morning with trying to use the forward recursion, but starting from big negative values of Honestly my understanding of numerical stability isn't very good, so I don't have a good intuition about the potential problems here. Is there any way that a trick like this might work, or is doing the forward recursion but starting with giant numbers that get smaller and then get bigger again the same problem as backwards recursion? |
Yes. I found the same thing that as a general tool it is just way too slow but I think if you can limit the order to be small and use recurrence this can be a pretty good tool considering finding another algorithm in this range is difficult (though the numerical integration in this range is powerful too....) so maybe we lean on that? I don't know that will take some comparisons.... though I like the simplicity of the numerical integration a lot. But I like the idea of really limiting the order between 0 and 1 to increase convergence and then using the forward recurrence.... Ya that would be kind of tricky, but I don't think it will be stable. In general julia> besselk(1.0, 10.0)
1.864877345382559e-5
julia> besselk(10.0, 10.0)
0.0016142553003906702
julia> besselk(100.0, 10.0)
4.5966740842695555e85 You are going to have stable recurrence without loss of precision.... but what about the other way julia> besselk(-100, 10.0)
4.5966740842695555e85
julia> besselk(-10, 10.0)
0.0016142553003906702
julia> besselk(-1, 10.0)
1.864877345382559e-5 You can see that the value is decreasing (obviously v=-v for besselk) so this recurrence will be unstable. |
Nice, that's a very helpful gestalt, thanks! It makes good sense intuitively too. Guess I can't pull a fast one on floats like that. Oh well. I agree with you about that argument range being so rough though. AMOS and the Temme paper both talk about the Miller recursion algorithm, but I tinkered with that very early in our project and kind of dismissed it for some reason. Maybe that was incorrect, but I see your point about how there aren't a ton of other options for this part of the domain. |
Ya I think it comes down to do we either use 1. hypergeometric series or 2. numerical integration in this region. Both are pretty good for this region. I believe the AMOS and Temme paper use the Miller algorithm to solve the hypergeometric series? I remember briefly looking at that paper and found the explanation.... confusing? So I would imagine these approaches are very very similar in that they are using the hypergeometric form and then using forward recurrence. I think it's a really powerful approach but we might be able to optimize the numerical integral step size better to get better performance with way less code.... I really liked your version you shared originally that is more optimized but loses a digit or two. I'm wondering if we can just focus on increasing the accuracy of that version..... On a side note... |
I actually got that continued fraction form from this book. Chapter 17 pertains to Bessel functions and appears pretty light on information for With regard to squeaking out just a few extra digits using a simpler method, there is one other option in that vein I think. The exponential-improvement corrections to the asymptotic expansion (NIST 10.40(iv)) really do a lot, and for I have that implementation here, although I apologize in advance for how unreadable it is. If that is potentially appealing, I'd be happy to try and put together a readable version that uses some of the performance tools I've learned from our discussions here. Unlike the gamma function, the upper incomplete gamma function is easy to write exactly and very quickly in pure Julia, so it's pretty zippy even though the formulae in NIST 10.40(iv) would make you think it really drags. EDIT: An example of the function call is
where that last false to say not to compute |
Many thanks for including this! I'm starting to look at the modified versions again so looking into this now. I'd ideally like to reuse as much code as possible so if you would like to submit a PR please do. One thing to keep in mind though for the modified version we are really only interested in the relative tolerances for these functions because they do not cross the axis so we should be able to provide relative tolerances for all values. I really was hoping to provide ~14.5-15 digits of accuracy for these functions which I think is possible. julia> (v, x) = 0.3, 10.1
julia> 1 - SpecialFunctions.besselk(v, x) / BesselK._besselk_as(v, x, 10, true, false)
4.6033177270032866e-12
julia> 1 - SpecialFunctions.besselk(v, x) / BesselK._besselk_as(v, x, 20, true, false)
7.205347429817266e-14 So even with an order of 20 we are almost getting 14 digits which SpecialFunctions.jl gives a relative tolerance of Though, it seems like for this regime the expansions just converge too slowly.... julia> (v, x) = 0.3, 80.1
julia> 1 - SpecialFunctions.besselk(v, x) / BesselK._besselk_as(v, x, 6, true, false)
4.440892098500626e-16 So we will just have to pick where we want to actually employ this where it converges relatively quickly! |
Currently, implementations don't support automatic differentiation with packages such as
ForwardDiff.jl
.Probably best to give ChainRules such as in https://github.com/JuliaMath/SpecialFunctions.jl/blob/master/src/chainrules.jl or take those implementations directly as this package matures.
The text was updated successfully, but these errors were encountered: