You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The "core" of x/optym is the convention def thing(fg: callable), where fg returns (cost, grad) based on the parameter vector x.
This is in a way restrictive, since gradient-less optimizers will just do f, _ = fg(x), and the computation of g will have been wasteful. There are also some circumstances where a linesearcher or similar may want only the gradient; in these scenarios the computation of f will have been wasteful. Of course, when using backprop, f is free along the way to computing g, but sometimes the gradient is known or compute-able without f (for example the rosenbrock function).
It is a greater burden on the user, but it may be superior to change fg to something like optimizeable, which is of the sense
Then each optimizer can just check if not hasattr(o, 'g'): raise ValueError('<myoptimizer> requires the gradient'). In principle we could fall back to finite differences, but I think that just leads to unhappy or misunderstanding users who do finite differences for problems with ~a dozen dimensions, then view it as impossible for something like a million dimensions when it would have been perfectly doable with backprop. Forcing the user to opt in with a forward_differences(f, x0, eps=1e-9) and central_differences(f, x0, eps=1e-9) set of functions could help abate this
I think this would be preferable to enable something like Nelder-Meade for functions that for example do not strictly have a gradient. In principle we could also look for h_j_prod(vector) vector but I sincerely hope I never implement optimizers that want the hessian jacobian product
Something I don't really understand - why would you want the gradient for an optimizer that doesn't require one (e.g. Nelder-Mead)?
The intent is actually to modify the interface so that the gradient is optional in the most general sense, but a gradient-based optimizer would error if it's not available.
The "core" of
x/optym
is the conventiondef thing(fg: callable)
, wherefg
returns(cost, grad)
based on the parameter vectorx
.This is in a way restrictive, since gradient-less optimizers will just do
f, _ = fg(x)
, and the computation of g will have been wasteful. There are also some circumstances where a linesearcher or similar may want only the gradient; in these scenarios the computation off
will have been wasteful. Of course, when using backprop,f
is free along the way to computingg
, but sometimes the gradient is known or compute-able withoutf
(for example the rosenbrock function).It is a greater burden on the user, but it may be superior to change
fg
to something likeoptimizeable
, which is of the senseThen each optimizer can just check
if not hasattr(o, 'g'): raise ValueError('<myoptimizer> requires the gradient')
. In principle we could fall back to finite differences, but I think that just leads to unhappy or misunderstanding users who do finite differences for problems with ~a dozen dimensions, then view it as impossible for something like a million dimensions when it would have been perfectly doable with backprop. Forcing the user to opt in with aforward_differences(f, x0, eps=1e-9)
andcentral_differences(f, x0, eps=1e-9)
set of functions could help abate thisi.e., one might do
I think this would be preferable to enable something like Nelder-Meade for functions that for example do not strictly have a gradient. In principle we could also look for
h_j_prod(vector) vector
but I sincerely hope I never implement optimizers that want the hessian jacobian productThoughts @Jashcraf ?
The text was updated successfully, but these errors were encountered: